Looking for the best AI code-review tool? An AI code-review tool is software that uses machine learning to help you get more done — it can save you hours every week by automating repetitive work. Most options offer a generous free tier, with paid plans unlocking higher limits, faster processing, and team features. Whether you are a beginner or a pro, the right AI code-review tool slots into your workflow and pays for itself fast. This guide breaks down the top picks, their pros and cons, and who each one is best for.
Microsoft Forms
Microsoft Forms (formerly Office 365 Forms) is an online survey creator, part of Microsoft 365. == Usage == Forms allows users to create surveys and quizzes with automatic marking. The data can be exported to Microsoft Excel, Power BI dashboards and viewed live using the Present feature. == Phishing and fraud == Due to a wave of phishing attacks utilizing Microsoft 365 in early 2021, Microsoft uses algorithms to automatically detect and block phishing attempts with Microsoft Forms. Also, Microsoft advises Forms users not to submit personal information, such as passwords, in a form or survey. It also place a similar advisory underneath the “Submit” button in every form created with Forms, warning users not to give out their password.
Enumeration algorithm
In computer science, an enumeration algorithm is an algorithm that enumerates the answers to a computational problem. Formally, such an algorithm applies to problems that take an input and produce a list of solutions, similarly to function problems. For each input, the enumeration algorithm must produce the list of all solutions, without duplicates, and then halt. The performance of an enumeration algorithm is measured in terms of the time required to produce the solutions, either in terms of the total time required to produce all solutions, or in terms of the maximal delay between two consecutive solutions and in terms of a preprocessing time, counted as the time before outputting the first solution. This complexity can be expressed in terms of the size of the input, the size of each individual output, or the total size of the set of all outputs, similarly to what is done with output-sensitive algorithms. == Formal definitions == An enumeration problem P {\displaystyle P} is defined as a relation R {\displaystyle R} over strings of an arbitrary alphabet Σ {\displaystyle \Sigma } : R ⊆ Σ ∗ × Σ ∗ {\displaystyle R\subseteq \Sigma ^{}\times \Sigma ^{}} An algorithm solves P {\displaystyle P} if for every input x {\displaystyle x} the algorithm produces the (possibly infinite) sequence y {\displaystyle y} such that y {\displaystyle y} has no duplicate and z ∈ y {\displaystyle z\in y} if and only if ( x , z ) ∈ R {\displaystyle (x,z)\in R} . The algorithm should halt if the sequence y {\displaystyle y} is finite. == Common complexity classes == Enumeration problems have been studied in the context of computational complexity theory, and several complexity classes have been introduced for such problems. A very general such class is EnumP, the class of problems for which the correctness of a possible output can be checked in polynomial time in the input and output. Formally, for such a problem, there must exist an algorithm A which takes as input the problem input x, the candidate output y, and solves the decision problem of whether y is a correct output for the input x, in polynomial time in x and y. For instance, this class contains all problems that amount to enumerating the witnesses of a problem in the class NP. Other classes that have been defined include the following. In the case of problems that are also in EnumP, these problems are ordered from least to most specific: Output polynomial, the class of problems whose complete output can be computed in polynomial time. Incremental polynomial time, the class of problems where, for all i, the i-th output can be produced in polynomial time in the input size and in the number i. Polynomial delay, the class of problems where the delay between two consecutive outputs is polynomial in the input (and independent from the output). Strongly polynomial delay, the class of problems where the delay before each output is polynomial in the size of this specific output (and independent from the input or from the other outputs). The preprocessing is generally assumed to be polynomial. Constant delay, the class of problems where the delay before each output is constant, i.e., independent from the input and output. The preprocessing phase is generally assumed to be polynomial in the input. == Common techniques == Backtracking: The simplest way to enumerate all solutions is by systematically exploring the space of possible results (partitioning it at each successive step). However, performing this may not give good guarantees on the delay, i.e., a backtracking algorithm may spend a long time exploring parts of the space of possible results that do not give rise to a full solution. Flashlight search: This technique improves on backtracking by exploring the space of all possible solutions but solving at each step the problem of whether the current partial solution can be extended to a partial solution. If the answer is no, then the algorithm can immediately backtrack and avoid wasting time, which makes it easier to show guarantees on the delay between any two complete solutions. In particular, this technique applies well to self-reducible problems. Closure under set operations: If we wish to enumerate the disjoint union of two sets, then we can solve the problem by enumerating the first set and then the second set. If the union is non disjoint but the sets can be enumerated in sorted order, then the enumeration can be performed in parallel on both sets while eliminating duplicates on the fly. If the union is not disjoint and both sets are not sorted then duplicates can be eliminated at the expense of a higher memory usage, e.g., using a hash table. Likewise, the cartesian product of two sets can be enumerated efficiently by enumerating one set and joining each result with all results obtained when enumerating the second step. == Examples of enumeration problems == The vertex enumeration problem, where we are given a polytope described as a system of linear inequalities and we must enumerate the vertices of the polytope. Enumerating the minimal transversals of a hypergraph. This problem is related to monotone dualization and is connected to many applications in database theory and graph theory. Enumerating the answers to a database query, for instance a conjunctive query or a query expressed in monadic second-order. There have been characterizations in database theory of which conjunctive queries could be enumerated with linear preprocessing and constant delay. The problem of enumerating maximal cliques in an input graph, e.g., with the Bron–Kerbosch algorithm Listing all elements of structures such as matroids and greedoids Several problems on graphs, e.g., enumerating independent sets, paths, cuts, etc. Enumerating the satisfying assignments of representations of Boolean functions, e.g., a Boolean formula written in conjunctive normal form or disjunctive normal form, a binary decision diagram such as an OBDD, or a Boolean circuit in restricted classes studied in knowledge compilation, e.g., NNF. == Connection to computability theory == The notion of enumeration algorithms is also used in the field of computability theory to define some high complexity classes such as RE, the class of all recursively enumerable problems. This is the class of sets for which there exist an enumeration algorithm that will produce all elements of the set: the algorithm may run forever if the set is infinite, but each solution must be produced by the algorithm after a finite time.
Research data archiving
Research data archiving is the long-term storage of scholarly research data, including the natural sciences, social sciences, and life sciences. The various academic journals have differing policies regarding how much of their data and methods researchers are required to store in a public archive, and what is actually archived varies widely between different disciplines. Similarly, the major grant-giving institutions have varying attitudes towards public archiving of data. In general, the tradition of science has been for publications to contain sufficient information to allow fellow researchers to replicate and therefore test the research. In recent years this approach has become increasingly strained as research in some areas depends on large datasets which cannot easily be replicated independently. Data archiving is more important in some fields than others. In a few fields, all of the data necessary to replicate the work is already available in the journal article. In drug development, a great deal of data is generated and must be archived so researchers can verify that the reports the drug companies publish accurately reflect the data. Often used interchangeably, Data preservation and data archiving are both about protecting data for the long term, but they serve different purposes. Data preservation focuses on preventing data from being lost, damaged, or destroyed by creating backups, storing data in secure locations, and ensuring it remains accessible when needed. Data archiving, on the other hand, involves moving data that is no longer actively used to a separate storage location for long-term keeping. Archived data is often combined and compressed, and while it can still be accessed, it is not intended for regular use or frequent updates. The requirement of data archiving is a recent development in the history of science. It was made possible by advances in information technology allowing large amounts of data to be stored and accessed from central locations. For example, the American Geophysical Union (AGU) adopted their first policy on data archiving in 1993, about three years after the beginning of the WWW. This policy mandates that datasets cited in AGU papers must be archived by a recognised data center; it permits the creation of "data papers"; and it establishes AGU's role in maintaining data archives. But it makes no requirements on paper authors to archive their data. Prior to organized data archiving, researchers wanting to evaluate or replicate a paper would have to request data and methods information from the author. The academic community expects authors to share supplemental data. This process was recognized as wasteful of time and energy and obtained mixed results. Information could become lost or corrupted over the years. In some cases, authors simply refuse to provide the information. The need for data archiving and due diligence is greatly increased when the research deals with health issues or public policy formation. == Selected policies by journals == === Biotropica === Biotropica requires, as a condition for publication, that the data supporting the results in the paper and metadata describing them must be archived in an appropriate public archive such as Dryad, Figshare, GenBank, TreeBASE, or NCBI. Authors may elect to make the data publicly available as soon as the article is published or, if the technology of the archive allows, embargo access to the data up to three years after article publication. A statement describing Data Availability will be included in the manuscript as described in the instructions to authors. Exceptions to the required archiving of data may be granted at the discretion of the Editor-in-Chief for studies that include sensitive information (e.g., the location of endangered species). Our Editorial explaining the motivation for this policy can be found here. A more comprehensive list of data repositories is available here. Promoting a culture of collaboration with researchers who collect and archive data: The data collected by tropical biologists are often long-term, complex, and expensive to collect. The Board of Editors of Biotropica strongly encourages authors who re-use data archives archived data sets to include as fully engaged collaborators the scientists who originally collected them. We feel this will greatly enhance the quality and impact of the resulting research by drawing on the data collector’s profound insights into the natural history of the study system, reducing the risk of errors in novel analyses, and stimulating the cross-disciplinary and cross-cultural collaboration and training for which the ATBC and Biotropica are widely recognized. NB: Biotropica is one of only two journals that pays the fees for authors depositing data at Dryad. === The American Naturalist === The American Naturalist requires authors to deposit the data associated with accepted papers in a public archive. For gene sequence data and phylogenetic trees, deposition in GenBank or TreeBASE, respectively, is required. There are many possible archives that may suit a particular data set, including the Dryad repository for ecological and evolutionary biology data. All accession numbers for GenBank, TreeBASE, and Dryad must be included in accepted manuscripts before they go to Production. If the data is deposited somewhere else, please provide a link. If the data is culled from published literature, please deposit the collated data in Dryad for the convenience of your readers. Any impediments to data sharing should be brought to the attention of the editors at the time of submission so that appropriate arrangements can be worked out. === Journal of Heredity === The primary data underlying the conclusions of an article are critical to the verifiability and transparency of the scientific enterprise, and should be preserved in usable form for decades in the future. For this reason, Journal of Heredity requires that newly reported nucleotide or amino acid sequences, and structural coordinates, be submitted to appropriate public databases (e.g., GenBank; the EMBL Nucleotide Sequence Database; DNA Database of Japan; the Protein Data Bank; and Swiss-Prot). Accession numbers must be included in the final version of the manuscript. For other forms of data (e.g., microsatellite genotypes, linkage maps, images), the Journal endorses the principles of the Joint Data Archiving Policy (JDAP) in encouraging all authors to archive primary datasets in an appropriate public archive, such as Dryad, TreeBASE, or the Knowledge Network for Biocomplexity. Authors are encouraged to make data publicly available at time of publication or, if the technology of the archive allows, opt to embargo access to the data for a period up to a year after publication. The American Genetic Association also recognizes the vast investment of individual researchers in generating and curating large datasets. Consequently, we recommend that this investment be respected in secondary analyses or meta-analyses in a gracious collaborative spirit. === Molecular Ecology === Molecular Ecology expects that data supporting the results in the paper should be archived in an appropriate public archive, such as GenBank, Gene Expression Omnibus, TreeBASE, Dryad, the Knowledge Network for Biocomplexity, your own institutional or funder repository, or as Supporting Information on the Molecular Ecology web site. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species. === Nature === Such material must be hosted on an accredited independent site (URL and accession numbers to be provided by the author), or sent to the Nature journal at submission, either uploaded via the journal's online submission service, or if the files are too large or in an unsuitable format for this purpose, on CD/DVD (five copies). Such material cannot solely be hosted on an author's personal or institutional web site. Nature requires the reviewer to determine if all of the supplementary data and methods have been archived. The policy advises reviewers to consider several questions, including: "Should the authors be asked to provide supplementary methods or data to accompany the paper online? (Such data might include source code for modelling studies, detailed experimental protocols or mathematical derivations.) === Science === Science supports the efforts of databases that aggregate published data for the use of the scientific community. Therefore, before publication, large data sets (including microarray data, protein or DNA sequences, and atomic c
Metadata management
Metadata management involves managing metadata about other data, whereby this "other data" is generally referred to as content data. The term is used most often in relation to digital media, but older forms of metadata are catalogs, dictionaries, and taxonomies. For example, the Dewey Decimal Classification is a metadata management system developed in 1876 for libraries. == Metadata schema == Metadata management goes by the end-to-end process and governance framework for creating, controlling, enhancing, attributing, defining and managing a metadata schema, model or other structured aggregation system, either independently or within a repository and the associated supporting processes (often to enable the management of content). For web-based systems, URLs, images, video etc. may be referenced from a triples table of object, attribute and value. == Scope == With specific knowledge domains, the boundaries of the metadata for each must be managed, since a general ontology is not useful to experts in one field whose language is knowledge-domain specific. == Metadata Manager == In the process of developing a knowledge management solution, creating a metadata schema, and a system in which metadata is managed, a dedicated resource may be appointed to maintain adherence to metadata standards as defined by data owners as well as general best practice. This person is responsible for curation of the business and technical layers of the metadata schema, and commonly involved with strategy and implementation. A metadata manager is not required to master all aspects, or be involved with everything concerning the solution, but an understanding of as much of the process as possible to ensure a relevant schema is developed. == Metadata management over time == Managing the metadata in a knowledge management solution is an important step in a metadata strategy. It is part of the strategy to make sure that the metadata are complete, current and correct at any given time. Managing a metadata project is also about making sure that users of the system are aware of the possibilities allowed by a well-designed metadata system and how to maximize the benefits of metadata. Regularly monitoring the metadata to ensure that the schema remains relevant is advised. === Wikipedia metadata === Wikipedia is a project that actively manages metadata for its articles and files. For example, volunteer editors carefully curate new biographical articles based on the notability (claim to fame), name, birth, and/or death dates. Similarly, volunteer editors carefully curate new architectural articles based on name, municipality, or geo coordinates. When new articles with a valid alternate spelling are added to Wikipedia that match up to existing articles based on metadata, these are then manually checked and if needed, tagged for merging. When new articles are added that are considered out of scope or otherwise unfit for Wikipedia, these are nominated for deletion. To help keep track of metadata on Wikipedia, the new Wikimedia project Wikidata was established in 2012. Click on the pictures to view more metadata about these images:
Kernel-phase
Kernel-phases are observable quantities used in high resolution astronomical imaging used for superresolution image creation. It can be seen as a generalization of closure phases for redundant arrays. For this reason, when the wavefront quality requirement are met, it is an alternative to aperture masking interferometry that can be executed without a mask while retaining phase error rejection properties. The observables are computed through linear algebra from the Fourier transform of direct images. They can then be used for statistical testing, model fitting, or image reconstruction. == Prerequisites == In order to extract kernel-phases from an image, some requirements must be met: Images are nyquist-sampled (at least 2 pixels per resolution element ( λ D {\displaystyle {\frac {\lambda }{D}}} )) Images are taken in near monochromatic light Exposure time is shorter than the timescale of aberrations Strehl ratio is high (good adaptive optics) Linearity of the pixel response (i.e. no saturation) Deviations from these requirements are known to be acceptable, but lead to observational bias that should be corrected by the observation of calibrators. == Definition == The method relies on a discrete model of the instrument's pupil plane and the corresponding list of baselines to provide corresponding vectors φ {\displaystyle \varphi } of pupil plane errors and Φ {\displaystyle \Phi } of image plane Fourier Phases. When the wavefront error in the pupil plane is small enough (i.e. when the Strehl ratio of the imaging system is sufficiently high), the complex amplitude associated to the instrumental phase in one point of the pupil φ k {\displaystyle \varphi _{k}} , can be approximated by e i φ k ≈ 1 + i φ k {\displaystyle e^{i\varphi _{k}}\approx 1+{\mathit {i}}\varphi _{k}} . This permits the expression of the pupil-plane phase aberrations φ {\displaystyle \varphi } to the image plane Fourier phase as a linear transformation described by the matrix A {\displaystyle A} : Φ = Φ 0 + A ⋅ φ {\displaystyle \Phi =\Phi _{0}+A\cdot \varphi } Where Φ 0 {\displaystyle \Phi _{0}} is the theoretical Fourier phase vector of the object. In this formalism, singular value decomposition can be used to find a matrix K {\displaystyle K} satisfying K ⋅ A = 0 {\displaystyle K\cdot A=0} . The rows of K {\displaystyle K} constitute a basis of the kernel of A T {\displaystyle A^{T}} . K ⋅ Φ = K ⋅ Φ 0 + K ⋅ A ⋅ φ {\displaystyle K\cdot \Phi =K\cdot \Phi _{0}+{\cancel {K\cdot A\cdot \varphi }}} The vector K . Φ {\displaystyle K.\Phi } is called the kernel-phase vector of observables. This equation can be used for model-fitting as it represents the interpretation of a sub-space of the Fourier phase that is immune to the instrumental phase errors to the first order. == Applications == The technique was first used in the re-analysis of archival images from the Hubble Space Telescope where it enabled the discovery of a number of brown dwarf in close binary systems. The technique is used as an alternative to aperture masking interferometry, especially for fainter stars because it does not require the use of masks that typically block 90% of the light, and therefore allows higher throughput. It is also considered to be an alternative to coronagraphy for direct detection of exoplanets at very small separations (below 2 λ D {\displaystyle 2{\frac {\lambda }{D}}} ) where coronagraphs are limited by the wavefront errors of adaptive optics. The same framework can be used for wavefront sensing. In the case of an asymmetric aperture, a pseudo-inverse of A {\displaystyle A} can be used to reconstruct the wavefront errors directly from the image. A Python library called xara is available on GitHub and maintained by Frantz Martinache to facilitate the extraction and interpretation of kernel-phases. The KERNEL project, has received funding from the European Research Council to explore the potential of these observables for a number of use-cases, including direct detection of exoplanets, image reconstruction, and image plane wavefront sensing for adaptive optics.
Investigative Data Warehouse
Investigative Data Warehouse (IDW) is a searchable database operated by the FBI. It was created in 2004. Much of the nature and scope of the database is classified. The database is a centralization of multiple federal and state databases, including criminal records from various law enforcement agencies, the U.S. Department of the Treasury's Financial Crimes Enforcement Network (FinCEN), and public records databases. According to Michael Morehart's testimony before the House Committee on Financial Services in 2006, the "IDW is a centralized, web-enabled, closed system repository for intelligence and investigative data. This system, maintained by the FBI, allows appropriately trained and authorized personnel throughout the country to query for information of relevance to investigative and intelligence matters." == Overview == In 2004, according to a government solicitation for bids to manage the project, it was approximately 10TB in size. In 2005, according to one FBI official, the IDW contained approximately 100 million documents. In 2006 it contained more than 560 million documents and was accessible by more than 12,000 individuals. According to the FBI's website, as of August 22, 2007, the database contained 700 million records from 53 databases and was accessible by 13,000 individuals around the world. As of 2007, the FBI was the subject of a lawsuit brought by the EFF (Electronic Frontier Foundation) because of a lack of public notice describing the database and the criteria for including personal information, as required by the Privacy Act of 1974. The lawsuits were a result of two Freedom of Information Act requests filed by the EFF in 2006. It was built in part by Chiliad corporation, the FBI Office of the Chief Technology Officer, and others. Companies listed on the FOIA files include Northrop Grumman . == Purpose == Investigative Data Warehouse–Secret (IDW-S) "provides data and data processing/analysis services to FBI agents and analysts as they perform counter-terrorism, counter-intelligence, and law enforcement missions". The core subsystem supports the Counter-Terrorism Division (CTD), the Special Event Unit, and via DOCLAB-S, the Joint Intelligence Committee Investigation (JICI) and IntelPlus. According to a 2005 email, "IDW will also be used for criminal and other authorized non-CT investigations as it evolves." (CT being counter terrorism) == Subsystems == Within the system, there were subsystems named IDW-S Core, SPT, and DOCLAB-S The special projects team (SPT): allows for the rapid import of new specialized data sources. These data sources are not made available to the general IDW users but instead are provided to a small group of users who have a demonstrated "need-to-know". The SPT System is similar in function to the IDW-S system, with the main difference is a different set of data sources. The SPT System allows its users to access not only the standard IDW Data Store but the specialized SPT Data Store. == Privacy == According to internal emails, the FBI performed several Privacy Impact Assessments (PIAs) of the IDW system. They worked with lawyers from their National Security Law Branch (NSLB) to attempt to make sure their system was complying with various laws regarding sharing of information and secrecy (for example, rule 6e of the Federal Rules of Criminal Procedure, regarding the secrecy of Grand Jury material ). The Information Sharing Policy Group (ISPG) formed a Discretionary Access Control Team (DACT), to work on "approval of data sets" and "access control requirements" for IDW and DataMart, and responding to other Intelligence Community agencies requesting access. The EFF FOIA IDW website states "Despite the vast amount of personal information contained in the IDW, the FBI has never published a Privacy Act notice describing the system or explaining the ways in which the records might be used." There was also a 2005 email from someone on the Office of General Council (OGC) about "preliminary staff musings that maybe we should limit FBI PIA requirements to non-NS systems" (NS being National Security). There was also an email from 2006 saying that 'national security systems are exempt from E-Gov', apparently referring to the E-Government Act of 2002, which has a section that deals with privacy. == Data sources == The IDW used many data sources. The FOIA documents from EFF are heavily redacted, but some of the sources are as follows: FBI Automated Case Support system (ACS), subset of the Electronic Case File (ECF) system Joint Intelligence Committee Investigation documents (JICI), with OCR text "Open Source News" (public websites, such as the Washington Post and others) Secure Automated Messaging Network (SAMNet) Violent Gang and Terrorist Organizing File (VGTOF) DARPA TIDES program ('open source news' that has been organized and collected) IntelPlus Filerooms, with OCR text FBI National Crime Information Center (NCIC) FBI Records Management Division (RMD), Document Laboratory (DocLab), FBIHQ MiTAP (collects data from public sources, websites, etc.) SPT-Specific data sources (partial list, FOIA files have large parts redacted): Unified Name Index (UNI) extracts Financial Center (FinCen), including Bank Secrecy Act data "Various Sources", including the Transportation Security Administration FBI Counterterrorism Division (CTD) Telephone numbers / addresses from ACS Case data from ACS Terrorist Watch List (TWL) "Other NJTTF data" DoS ... Lost/Stolen Passport data No Fly List, from TSA Selectee list, from TSA ACS/ECF with some case types excluded CIA non-TS/non-SCI Technical Discussions (TDs) and Intelligence Information Reports (IIRs) from 1978 to the May 2004 There was also talk of linking the FTTTF "Data Mart" with IDW. The data in IDW is classified at the 'Secret' level or lower. Higher classifications are not allowed, and can be removed