Real-time computer graphics or real-time rendering is the sub-field of computer graphics focused on producing and analyzing images in real time. The term can refer to anything from rendering an application's graphical user interface (GUI) to real-time image analysis, but is most often used in reference to interactive 3D computer graphics, typically using a graphics processing unit (GPU). One example of this concept is a video game that rapidly renders changing 3D environments to produce an illusion of motion. Computers have been capable of generating 2D images such as simple lines, images and polygons in real time since their invention. However, quickly rendering detailed 3D objects is a daunting task for traditional Von Neumann architecture-based systems. An early workaround to this problem was the use of sprites, 2D images that could imitate 3D graphics. Different techniques for rendering now exist, such as ray-tracing and rasterization. Using these techniques and advanced hardware, computers can now render images quickly enough to create the illusion of motion while simultaneously accepting user input. This means that the user can respond to rendered images in real time, producing an interactive experience. == Principles of real-time 3D computer graphics == The goal of computer graphics is to generate computer-generated images, or frames, using certain desired metrics. One such metric is the number of frames generated in a given second. Real-time computer graphics systems differ from traditional (i.e., non-real-time) rendering systems in that non-real-time graphics typically rely on ray tracing. In this process, millions or billions of rays are traced from the camera to the world for detailed rendering—this expensive operation can take hours or days to render a single frame. Real-time graphics systems must render each image in less than 1/30th of a second. Ray tracing is far too slow for these systems; instead, they employ the technique of z-buffer triangle rasterization. In this technique, every object is decomposed into individual primitives, usually triangles. Each triangle gets positioned, rotated and scaled on the screen, and rasterizer hardware (or a software emulator) generates pixels inside each triangle. These triangles are then decomposed into atomic units called fragments that are suitable for displaying on a display screen. The fragments are drawn on the screen using a color that is computed in several steps. For example, a texture can be used to "paint" a triangle based on a stored image, and then shadow mapping can alter that triangle's colors based on line-of-sight to light sources. === Video game graphics === Real-time graphics optimizes image quality subject to time and hardware constraints. GPUs and other advances increased the image quality that real-time graphics can produce. GPUs are capable of handling millions of triangles per frame, and modern DirectX/OpenGL class hardware is capable of generating complex effects, such as shadow volumes, motion blurring, and triangle generation, in real-time. The advancement of real-time graphics is evidenced in the progressive improvements between actual gameplay graphics and the pre-rendered cutscenes traditionally found in video games. Cutscenes are typically rendered in real-time—and may be interactive. Although the gap in quality between real-time graphics and traditional off-line graphics is narrowing, offline rendering remains much more accurate. === Advantages === Real-time graphics are typically employed when interactivity (e.g., player feedback) is crucial. When real-time graphics are used in films, the director has complete control of what has to be drawn on each frame, which can sometimes involve lengthy decision-making. Teams of people are typically involved in the making of these decisions. In real-time computer graphics, the user typically operates an input device to influence what is about to be drawn on the display. For example, when the user wants to move a character on the screen, the system updates the character's position before drawing the next frame. Usually, the display's response-time is far slower than the input device—this is justified by the immense difference between the (fast) response time of a human being's motion and the (slow) perspective speed of the human visual system. This difference has other effects too: because input devices must be very fast to keep up with human motion response, advancements in input devices (e.g., the current Wii remote) typically take much longer to achieve than comparable advancements in display devices. Another important factor controlling real-time computer graphics is the combination of physics and animation. These techniques largely dictate what is to be drawn on the screen—especially where to draw objects in the scene. These techniques help realistically imitate real world behavior (the temporal dimension, not the spatial dimensions), adding to the computer graphics' degree of realism. Real-time previewing with graphics software, especially when adjusting lighting effects, can increase work speed. Some parameter adjustments in fractal generating software may be made while viewing changes to the image in real time. == Rendering pipeline == The graphics rendering pipeline ("rendering pipeline" or simply "pipeline") is the foundation of real-time graphics. Its main function is to render a two-dimensional image in relation to a virtual camera, three-dimensional objects (an object that has width, length, and depth), light sources, lighting models, textures and more. === Architecture === The architecture of the real-time rendering pipeline can be divided into conceptual stages: application, geometry and rasterization. === Application stage === The application stage is responsible for generating "scenes", or 3D settings that are drawn to a 2D display. This stage is implemented in software that developers optimize for performance. This stage may perform processing such as collision detection, speed-up techniques, animation and force feedback, in addition to handling user input. Collision detection is an example of an operation that would be performed in the application stage. Collision detection uses algorithms to detect and respond to collisions between (virtual) objects. For example, the application may calculate new positions for the colliding objects and provide feedback via a force feedback device such as a vibrating game controller. The application stage also prepares graphics data for the next stage. This includes texture animation, animation of 3D models, animation via transforms, and geometry morphing. Finally, it produces primitives (points, lines, and triangles) based on scene information and feeds those primitives into the geometry stage of the pipeline. === Geometry stage === The geometry stage manipulates polygons and vertices to compute what to draw, how to draw it and where to draw it. Usually, these operations are performed by specialized hardware or GPUs. Variations across graphics hardware mean that the "geometry stage" may actually be implemented as several consecutive stages. ==== Model and view transformation ==== Before the final model is shown on the output device, the model is transformed onto multiple spaces or coordinate systems. Transformations move and manipulate objects by altering their vertices. Transformation is the general term for the four specific ways that manipulate the shape or position of a point, line or shape. ==== Lighting ==== In order to give the model a more realistic appearance, one or more light sources are usually established during transformation. However, this stage cannot be reached without first transforming the 3D scene into view space. In view space, the observer (camera) is typically placed at the origin. If using a right-handed coordinate system (which is considered standard), the observer looks in the direction of the negative z-axis with the y-axis pointing upwards and the x-axis pointing to the right. ==== Projection ==== Projection is a transformation used to represent a 3D model in a 2D space. The two main types of projection are orthographic projection (also called parallel) and perspective projection. The main characteristic of an orthographic projection is that parallel lines remain parallel after the transformation. Perspective projection utilizes the concept that if the distance between the observer and model increases, the model appears smaller than before. Essentially, perspective projection mimics human sight. ==== Clipping ==== Clipping is the process of removing primitives that are outside of the view box in order to facilitate the rasterizer stage. Once those primitives are removed, the primitives that remain will be drawn into new triangles that reach the next stage. ==== Screen mapping ==== The purpose of screen mapping is to find out the coordinates of the primitives during the clipping stage. ==== Rasterizer stage ==== The rasterizer
Just This Once
Just This Once is a 1993 romance novel written in the style of Jacqueline Susann by a Macintosh IIcx computer named "Hal" in collaboration with its programmer, Scott French. French reportedly spent $40,000 and 8 years developing an artificial intelligence program to analyze Susann's works and attempt to create a novel that Susann might have written. A legal dispute between the estate of Jacqueline Susann and the publisher resulted in a settlement to split the profits, and the book was referenced in several legal journal articles about copyright laws. The book had two small print runs totaling 35,000 copies, receiving mixed reviews. == Creation == The novel's creation spanned the fields of artificial intelligence, expert systems, and natural language processing. Scott French first scanned and analyzed portions of two books by Jacqueline Susann, Valley of the Dolls and Once Is Not Enough, to determine constituents of Susann's writing style, which French stated was the most difficult task. This analysis extracted several hundred components including frequency and type of sexual acts and sentence structure. "Once you're there, the writer's style emerges, part of her actual personality comes out, and the computer can be programmed to make a story." French also created several thousand rules to govern tone, plotting, scenes, and characters. The text generated by Hal, the computer, was intended to mimic what Susann might have written, although the output required significant editing. French credits Hal's work with "almost 100% of the plot, 100% of the theme and style." French estimates that he wrote 10% of the prose, the computer Hal wrote about 25% of the prose, and the remaining two-thirds was more of a collaboration between the two. A typical scenario to write a scene would involve Hal asking questions that French would answer (for example, Hal might ask about the "cattiness factor" involved in a meeting between two key female characters, and French would reply with a range of 1 to 10), and the computer would then generate a few sentences to which French would make minor edits. The process would repeat for the next few sentences until the scene was written. == Legal issues == Jacqueline Susann's publisher was skeptical of the legality of Just This Once, although French doubted that an author's thought processes could be copyrighted. Susann's estate reportedly threatened to sue Scott French but the parties settled out of court; the settlement involved splitting profits between the parties but the terms of the settlement were not disclosed. The publication of Just This Once raised questions in the legal profession concerning how copyright law applies to computer-generated works derived from an analysis of other copyrighted works, and whether the generation of such works infringes on copyright. The publications on this topic suggested that the copyright laws of the time were ill-equipped to deal with computer-generated creative works. == Reception == The book's publisher Steven Shragis of Carol Group said of the novel, "I'm not going to say this is a great literary work, but it's every bit as good as anything out in this field, and better than an awful lot." The novel received some positive early reviews. In USA Today, novelist Thomas Gifford compared Just This Once to another novel in the same genre, American Star by Jackie Collins. Gifford concluded: "If you do like this stuff, you'd be much, much better off with the one written by the computer." The Dead Jackie Susann Quarterly declared that Susann "would be proud. Lots of money, sleaze, disease, death, oral sex, tragedy and the good girl gone bad." Other reviews were mixed. Publishers Weekly wrote, "If the books of Jacqueline Susann and Harold Robbins seem formulaic, this debut novel of sin and success in Las Vegas outdoes them all. And that, in a way, is the point.... All novelty rests in the conceit of computer authorship, not in the story itself." Library Journal stated "French invested eight years and $50,000 in a scheme to use artificial intelligence to fulfill his authentic, if dubious, desire to generate a trashy novel a la Jacqueline Susann. Shallow, beautiful-people characters are flatly conceived and randomly accessed in a formulaic plot ... a sexy, boring morality tale. Of possible interest to computer buffs for its use of Expert Systems and the virtual promise of more worthy possibilities; others should read Susann." Kirkus Reviews wrote: "The deal here is that author French is not the author, he's just the midwife, having allegedly programmed his computer to write about our times just the way Susann would... almost perfectly capturing glamorous Jackie's turgid but E-Z reading prose style and ultrareliable mix of sex, glitz, dope 'n' despair.... One wonders, though, if French's tale spinning PC will do as well on the talkshows as Jackie did. The computer weenies have been trying to tell us for years, garbage in-garbage out."
Visual descriptor
In computer vision, visual descriptors or image descriptors are descriptions of the visual features of the contents in images, videos, or algorithms or applications that produce such descriptions. They describe elementary characteristics such as the shape, the color, the texture or the motion, among others. == Introduction == As a result of the new communication technologies and the massive use of Internet in our society, the amount of audio-visual information available in digital format is increasing considerably. Therefore, it has been necessary to design some systems that allow us to describe the content of several types of multimedia information in order to search and classify them. The audio-visual descriptors are in charge of the contents description. These descriptors have a good knowledge of the objects and events found in a video, image or audio and they allow the quick and efficient searches of the audio-visual content. This system can be compared to the search engines for textual contents. Although it is relatively easy to find text with a computer, it is much more difficult to find concrete audio and video parts. For instance, imagine somebody searching a scene of a happy person. The happiness is a feeling and it is not evident its shape, color and texture description in images. The description of the audio-visual content is not a superficial task and it is essential for the effective use of this type of archives. The standardization system that deals with audio-visual descriptors is the MPEG-7 (Motion Picture Expert Group - 7). == Types == Descriptors are the first step to find out the connection between pixels contained in a digital image and what humans recall after having observed an image or a group of images after some minutes. Visual descriptors are divided in two main groups: General information descriptors: contain low level descriptors which give a description about color, shape, regions, textures and motion. Specific domain information descriptors: give information about objects and events in the scene. A concrete example would be face recognition. === General information descriptors === General information descriptors consist of a set of descriptors that covers different basic and elementary features like: color, texture, shape, motion, location and others. This description is automatically generated by means of signal processing. ==== Color ==== It's the most basic quality of visual content. Five tools are defined to describe color. The three first tools represent the color distribution and the last ones describe the color relation between sequences or group of images: Dominant color descriptor (DCD) Scalable color descriptor (SCD) Color structure descriptor (CSD) Color layout descriptor (CLD) Group of frame (GoF) or group-of-pictures (GoP) ==== Texture ==== It's an important quality in order to describe an image. The texture descriptors characterize image textures or regions. They observe the region homogeneity and the histograms of these region borders. The set of descriptors is formed by: Homogeneous texture descriptor (HTD) Texture browsing descriptor (TBD) Edge histogram descriptor (EHD) ==== Shape ==== It contains important semantic information due to human's ability to recognize objects through their shape. However, this information can only be extracted by means of a segmentation similar to the one that the human visual system implements. Nowadays, such a segmentation system is not available yet, however there exists a serial of algorithms which are considered to be a good approximation. These descriptors describe regions, contours and shapes for 2D images and for 3D volumes. The shape descriptors are the following ones: Region-based shape descriptor (RSD) Contour-based shape descriptor (CSD) 3-D shape descriptor (3-D SD) ==== Motion ==== It's defined by four different descriptors which describe motion in video sequence. Motion is related to the objects motion in the sequence and to the camera motion. This last information is provided by the capture device, whereas the rest is implemented by means of image processing. The descriptor set is the following one: Motion activity descriptor (MAD) Camera motion descriptor (CMD) Motion trajectory descriptor (MTD) Warping and parametric motion descriptor (WMD and PMD) ==== Location ==== Elements location in the image is used to describe elements in the spatial domain. In addition, elements can also be located in the temporal domain: Region locator descriptor (RLD) Spatio temporal locator descriptor (STLD) === Specific domain information descriptors === These descriptors, which give information about objects and events in the scene, are not easily extractable, even more when the extraction is to be automatically done. Nevertheless, they can be manually processed. As mentioned before, face recognition is a concrete example of an application that tries to automatically obtain this information. == Descriptors applications == Among all applications, the most important ones are: Multimedia documents search engines and classifiers. Digital library: visual descriptors allow a very detailed and concrete search of any video or image by means of different search parameters. For instance, the search of films where a known actor appears, the search of videos containing the Everest mountain, etc. Personalized electronic news service. Possibility of an automatic connection to a TV channel broadcasting a soccer match, for example, whenever a player approaches the goal area. Control and filtering of concrete audiovisual content, like violent or pornographic material. Also, authorization for some multimedia content.
Glossary of machine vision
The following are common definitions related to the machine vision field. General related fields Machine vision Computer vision Image processing Signal processing == 0-9 == 1394. FireWire is Apple Inc.'s brand name for the IEEE 1394 interface. It is also known as i.Link (Sony's name) or IEEE 1394 (although the 1394 standard also defines a backplane interface). It is a personal computer (and digital audio/digital video) serial bus interface standard, offering high-speed communications and isochronous real-time data services. 1D. One-dimensional. 2D computer graphics. The computer-based generation of digital images—mostly from two-dimensional models (such as 2D geometric models, text, and digital images) and by techniques specific to them. 3D computer graphics. 3D computer graphics are different from 2D computer graphics in that a three-dimensional representation of geometric data is stored in the computer for the purposes of performing calculations and rendering 2D images. Such images may be for later display or for real-time viewing. Despite these differences, 3D computer graphics rely on many of the same algorithms as 2D computer vector graphics in the wire frame model and 2D computer raster graphics in the final rendered display. In computer graphics software, the distinction between 2D and 3D is occasionally blurred; 2D applications may use 3D techniques to achieve effects such as lighting, and primarily 3D may use 2D rendering techniques. 3D scanner. This is a device that analyzes a real-world object or environment to collect data on its shape and possibly color. The collected data can then be used to construct digital, three dimensional models useful for a wide variety of applications. == A == Aberration. Optically, defocus refers to a translation along the optical axis away from the plane or surface of best focus. In general, defocus reduces the sharpness and contrast of the image. What should be sharp, high-contrast edges in a scene become gradual transitions. Algebraic distance or algebraic error. The algebraic distance from a point xi to a curve or surface defined by f ( x , β ) = 0 {\displaystyle f(x,\beta )=0} is the value of f ( x i , β ) {\displaystyle f(x_{i},\beta )} , i.e. the residual in the least squares problem with data point (xi, 0) and model function f. This term is mainly used in computer vision.[1][2] Aperture. In context of photography or machine vision, aperture refers to the diameter of the aperture stop of a photographic lens. The aperture stop can be adjusted to control the amount of light reaching the film or image sensor. aspect ratio (image). The aspect ratio of an image is its displayed width divided by its height (usually expressed as "x:y"). Angular resolution. Describes the resolving power of any image forming device such as an optical or radio telescope, a microscope, a camera, or an eye. Automated optical inspection. == B == Barcode. A barcode (also bar code) is a machine-readable representation of information in a visual format on a surface. Blob discovery. Inspecting an image for discrete blobs of connected pixels (e.g. a black hole in a grey object) as image landmarks. These blobs frequently represent optical targets for machining, robotic capture, or manufacturing failure. Bitmap. A raster graphics image, digital image, or bitmap, is a data file or structure representing a generally rectangular grid of pixels, or points of color, on a computer monitor, paper, or other display device. == C == Camera. A camera is a device used to take pictures, either singly or in sequence. A camera that takes pictures singly is sometimes called a photo camera to distinguish it from a video camera. Camera Link. Camera Link is a serial communication protocol designed for computer vision applications based on the National Semiconductor interface Channel-link. It was designed for the purpose of standardizing scientific and industrial video products including cameras, cables and frame grabbers. The standard is maintained and administered by the Automated Imaging Association, or AIA, the global machine vision industry's trade group. Charge-coupled device. A charge-coupled device (CCD) is a sensor for recording images, consisting of an integrated circuit containing an array of linked, or coupled, capacitors. CCD sensors and cameras tend to be more sensitive, less noisy, and more expensive than CMOS sensors and cameras. CIE 1931 Color Space. In the study of the perception of color, one of the first mathematically defined color spaces was the CIE XYZ color space (also known as CIE 1931 color space), created by the International Commission on Illumination (CIE) in 1931. CMOS. CMOS ("see-moss")stands for complementary metal-oxide semiconductor, is a major class of integrated circuits. CMOS imaging sensors for machine vision are cheaper than CCD sensors but more noisy. CoaXPress. CoaXPress (CXP) is an asymmetric high speed serial communication standard over coaxial cable. CoaXPress combines high speed image data, low speed camera control and power over a single coaxial cable. The standard is maintained by JIIA, the Japan Industrial Imaging Association. Color. The perception of the frequency (or wavelength) of light, and can be compared to how pitch (or a musical note) is the perception of the frequency or wavelength of sound. Color blindness. Also known as color vision deficiency, in humans is the inability to perceive differences between some or all colors that other people can distinguish Color temperature. "White light" is commonly described by its color temperature. A traditional incandescent light source's color temperature is determined by comparing its hue with a theoretical, heated black-body radiator. The lamp's color temperature is the temperature in kelvins at which the heated black-body radiator matches the hue of the lamp. Color vision. CV is the capacity of an organism or machine to distinguish objects based on the wavelengths (or frequencies) of the light they reflect or emit. computer vision. The study and application of methods which allow computers to "understand" image content. Contrast. In visual perception, contrast is the difference in visual properties that makes an object (or its representation in an image) distinguishable from other objects and the background. C-Mount. Standardized adapter for optical lenses on CCD - cameras. C-Mount lenses have a back focal distance 17.5 mm vs. 12.5 mm for "CS-mount" lenses. A C-Mount lens can be used on a CS-Mount camera through the use of a 5 mm extension adapter. C-mount is a 1" diameter, 32 threads per inch mounting thread (1"-32UN-2A.) CS-Mount. Same as C-Mount but the focal point is 5 mm shorter. A CS-Mount lens will not work on a C-Mount camera. CS-mount is a 1" diameter, 32 threads per inch mounting thread. == D == Data matrix. A two dimensional Barcode. Depth of field. In optics, particularly photography and machine vision, the depth of field (DOF) is the distance in front of and behind the subject which appears to be in focus. Depth perception. DP is the visual ability to perceive the world in three dimensions. It is a trait common to many higher animals. Depth perception allows the beholder to accurately gauge the distance to an object. Diaphragm. In optics, a diaphragm is a thin opaque structure with an opening (aperture) at its centre. The role of the diaphragm is to stop the passage of light, except for the light passing through the aperture. == E == Edge detection. ED marks the points in a digital image at which the luminous intensity changes sharply. It also marks the points of luminous intensity changes of an object or spatial-taxon silhouette. Electromagnetic interference. Radio Frequency Interference (RFI) is electromagnetic radiation which is emitted by electrical circuits carrying rapidly changing signals, as a by-product of their normal operation, and which causes unwanted signals (interference or noise) to be induced in other circuits. == F == FireWire. FireWire (also known as i. Link or IEEE 1394) is a personal computer (and digital audio/video) serial bus interface standard, offering high-speed communications. It is often used as an interface for industrial cameras. Fixed-pattern noise. Flat-field correction. Frame grabber. An electronic device that captures individual, digital still frames from an analog video signal or a digital video stream. Fringe Projection Technique. 3D data acquisition technique employing projector displaying fringe pattern on a surface of measured piece, and one or more cameras recording image(s). Field of view. The field of view (FOV) is the part which can be seen by the machine vision system at one moment. The field of view depends from the lens of the system and from the working distance between object and camera. Focus. An image, or image point or region, is said to be in focus if light from object points is converged about as well as possible in the image; conversely, it is out of focus if light is not w
List of large language models
A large language model (LLM) is a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text. == List == For the training cost column, 1 petaFLOP-day equals 1 petaFLOP/sec × 1 day, or 8.64×1019 FLOP (floating point operations). Only the cost of the largest model is shown. The number of parameters is measured in billions, and the training cost is measured in petaFLOP-days. === 2018 === === 2019 === === 2020 === === 2021 === === 2022 === === 2023 === === 2024 === === 2025 === === 2026 ===
Pixel aspect ratio
A pixel aspect ratio (PAR) is a mathematical ratio that describes how the width of a pixel in a digital image compares to the height of that pixel. Most digital imaging systems display an image as a grid of tiny, square pixels. However, some imaging systems, especially those that must be compatible with standard-definition television motion pictures, display an image as a grid of rectangular pixels, in which the pixel width and height are different. Pixel aspect ratio describes this difference. Use of pixel aspect ratio mostly involves pictures pertaining to standard-definition television and some other exceptional cases. Most other imaging systems, including those that comply with SMPTE standards and practices, use square pixels. PAR is also known as sample aspect ratio and abbreviated SAR, though it can be confused with storage aspect ratio. == Introduction == The ratio of the width to the height of an image is known as the aspect ratio, or more precisely the display aspect ratio (DAR) – the aspect ratio of the image as displayed; for TV, DAR was traditionally 4:3 (a.k.a. fullscreen), with 16:9 (a.k.a. widescreen) now the standard for HDTV. In digital images, there is a distinction with the storage aspect ratio (SAR), which is the ratio of pixel dimensions. If an image is displayed with square pixels, then these ratios agree; if not, then non-square, "rectangular" pixels are used, and these ratios disagree. The aspect ratio of the pixels themselves is known as the pixel aspect ratio (PAR) – for square pixels this is 1:1 – and these are related by the identity: Rearranging (solving for PAR) yields: For example: A 640 × 480 VGA image has a SAR of 640/480 = 4:3, and if displayed on a 4:3 display (DAR = 4:3) has square pixels, hence a PAR of 1:1. By contrast, a 720 × 576 D-1 PAL image has a SAR of 720/576 = 5:4, but if displayed on a 4:3 display (DAR = 4:3) the PAR is 4/3 : 5/4 = 16:15 ≈ 1.066. This means that the pixels of the PAL picture must be "stretched" by this amount to fit in the 4:3 display. In analog images such as film there is no notion of pixel, nor notion of SAR or PAR, but in the digitization of analog images the resulting digital image has pixels, hence SAR (and accordingly PAR, if displayed at the same aspect ratio as the original). Non-square pixels arise often in early digital TV standards, related to digitalization of analog TV signals – whose vertical and "effective" horizontal resolutions differ and are thus best described by non-square pixels – and also in some digital video cameras and computer display modes, such as Color Graphics Adapter (CGA). Today they arise also in transcoding between resolutions with different SARs. Actual displays do not generally have non-square pixels, though digital sensors might; they are rather a mathematical abstraction used in resampling images to convert between resolutions. There are several complicating factors in understanding PAR, particularly as it pertains to digitization of analog video: First, analog video does not have pixels, but rather a raster scan, and thus has a well-defined vertical resolution (the lines of the raster), but not a well-defined horizontal resolution, since each line is an analog signal. However, by a standardized sampling rate, the effective horizontal resolution can be determined by the sampling theorem, as is done below. Second, due to overscan, some of the lines at the top and bottom of the raster are not visible, as are some of the possible image on the left and right – see Overscan: Analog to digital resolution issues. Also, the resolution may be rounded (DV NTSC uses 480 lines, rather than the 486 that are possible). Third, analog video signals are interlaced – each image (frame) is sent as two "fields", each with half the lines. Thus either the pixels are twice as tall as they would be without interlacing, or the image is deinterlaced. == Background == Video is presented as a sequential series of images called video frames. Historically, video frames were created and recorded in analog form. As digital display technology, digital broadcast technology, and digital video compression evolved separately, it resulted in video frame differences that must be addressed using pixel aspect ratio. Digital video frames are generally defined as a grid of pixels used to present each sequential image. The horizontal component is defined by pixels (or samples), and is known as a video line. The vertical component is defined by the number of lines, as in 480 lines. Standard-definition television standards and practices were developed as broadcast technologies and intended for terrestrial broadcasting, and were therefore not designed for digital video presentation. Such standards define an image as an array of well-defined horizontal "Lines", well-defined vertical "Line Duration" and a well-defined picture center. However, there is not a standard-definition television standard that properly defines image edges or explicitly demands a certain number of picture elements per line. Furthermore, analog video systems such as NTSC 480i and PAL 576i, instead of employing progressively displayed frames, employ fields or interlaced half-frames displayed in an interwoven manner to reduce flicker and double the image rate for smoother motion. === Analog-to-digital conversion === As a result of computers becoming powerful enough to serve as video editing tools, video digital-to-analog converters and analog-to-digital converters were made to overcome this incompatibility. To convert analog video lines into a series of square pixels, the industry adopted a default sampling rate at which luma values were extracted into pixels. The luma sampling rate for 480i pictures was 12+3⁄11 MHz and for 576i pictures was 14+3⁄4 MHz. The term pixel aspect ratio was first coined when ITU-R BT.601 (commonly known as Rec. 601) specified that standard-definition television pictures are made of lines of exactly 720 non-square pixels. ITU-R BT.601 did not define the exact pixel aspect ratio but did provide enough information to calculate the exact pixel aspect ratio based on industry practices: The standard luma sampling rate of precisely 13+1⁄2 MHz. Based on this information: The pixel aspect ratio for 480i would be 10:11 as: 12 3 11 ÷ 13 1 2 = 10 11 {\displaystyle 12{\tfrac {3}{11}}\div 13{\tfrac {1}{2}}={\tfrac {10}{11}}} The pixel aspect ratio for 576i would be 59:54 as: 14 3 4 ÷ 13 1 2 = 59 54 {\displaystyle 14{\tfrac {3}{4}}\div 13{\tfrac {1}{2}}={\tfrac {59}{54}}} SMPTE RP 187 further attempted to standardize the pixel aspect ratio values for 480i and 576i. It designated 177:160 for 480i or 1035:1132 for 576i. However, due to significant difference with practices in effect by industry and the computational load that they imposed upon the involved hardware, SMPTE RP 187 was simply ignored. SMPTE RP 187 information annex A.4 further suggested the use of 10:11 for 480i. As of this writing, ITU-R BT.601-6, which is the latest edition of ITU-R BT.601, still implies that the pixel aspect ratios mentioned above are correct. === Digital video processing === As stated above, ITU-R BT.601 specified that standard-definition television pictures are made of lines of 720 non-square pixels, sampled with a precisely specified sampling rate. A simple mathematical calculation reveals that a 704 pixel width would be enough to contain a 480i or 576i standard 4:3 picture: A 4:3 480-line picture, digitized with the Rec. 601-recommended sampling rate, would be 704 non-square pixels wide. x 480 × 10 11 = 4 3 ⇒ x = 480 × 11 × 4 10 × 3 = 704 {\displaystyle {\frac {x}{480}}\times {\frac {10}{11}}={\frac {4}{3}}\Rightarrow x={\frac {480\times 11\times 4}{10\times 3}}=704} A 4:3 576-line picture, digitized with the Rec. 601-recommended sampling rate, would be 702+54⁄59 non-square pixels wide. x 576 × 59 54 = 4 3 ⇒ x = 576 × 54 × 4 59 × 3 = 702 54 59 {\displaystyle {\frac {x}{576}}\times {\frac {59}{54}}={\frac {4}{3}}\Rightarrow x={\frac {576\times 54\times 4}{59\times 3}}=702{\tfrac {54}{59}}} Unfortunately, not all standard TV pictures are exactly 4:3: As mentioned earlier, in analog video, the center of a picture is well-defined but the edges of the picture are not standardized. As a result, some analog devices (mostly PAL devices but also some NTSC devices) generated motion pictures that were horizontally (slightly) wider. This also proportionately applies to anamorphic widescreen (16:9) pictures. Therefore, to maintain a safe margin of error, ITU-R BT.601 required sampling 16 more non-square pixels per line (8 more at each edge) to ensure saving all video data near the margins. This requirement, however, had implications for PAL motion pictures. PAL pixel aspect ratios for standard (4:3) and anamorphic wide screen (16:9), respectively 59:54 and 118:81, were awkward for digital image processing, especially for mixing PAL and NTSC video clips. Therefore, video editing products chose the almost equivalent value
Lose It!
Lose It! is an American health and wellness mobile app developed by FitNow, Inc. The app generates calorie budgets for users by tracking weight, exercise, food and calorie intake, and personal goals, primarily to assist them in achieving weight loss. == History == Lose It! was developed in Boston and debuted in 2008. The app and its associated company were founded by J.J. Allaire, Charles Teague and Paul Dicristina. Prior to founding Lose It!, Teague and Allaire had founded the online research tool Onfolio, which was acquired by Microsoft in 2006. The Lose It! app was originally released as an iOS app before being released as a website in 2010 and an Android app in 2011. In 2015, Lose It! announced plans to release the app internationally. Lose It! was also available as an app for Apple Watch at its launch in 2015. The app’s “Snap It” feature, which allows users to approximate calorie counts by taking pictures of their daily meals and snacks, was released in beta in 2016. Snap It was named an Innovation Awards Honoree at the 2017 Consumer Electronics Show in Las Vegas. In 2020, Patrick Wetherille, one of the company’s earliest employees, was appointed chief executive officer. == App == Lose It! is weight loss app. The app allows users to set goals such as increasing strength, overall health/maintenance, and weight loss. It provides users recommended calorie budgets based on data such as their current weight and their desired weight. Lose It! also tracks data such as exercise/activity level and food consumption and allows users to track calories consumed by scanning barcodes for food products then retrieving calorie information for products. The app can also estimate the amount of calories in a food products. Lose It! has integration features connecting it to other apps such as Fitbit and Runkeeper. It also has social features such as joining groups and sharing progress with friends. The Premium version of the app allows users to track foods according to specific diets like keto, heart healthy or Mediterranean.