Abstract: Data structures, methods and tiling engines for storing tiling data in memory wherein the tiles are grouped into tile groups and the primitives are grouped into primitive blocks. The methods include, for each tile group: determining, for each tile in the tile group, which primitives of each primitive block intersect that tile; storing in memory a variable length control data block for each primitive block that comprises at least one primitive that intersects at least one tile of the tile group; and storing in memory a control stream comprising a fixed sized primitive block entry for each primitive block that comprises at least one primitive that intersects at least one tile of the tile group, each primitive block entry identifying a location in memory of the control data block for the corresponding primitive block. Each primitive block entry may comprise valid tile information identifying which tiles of the tile group are valid for the corresponding primitive block.
Type:
Grant
Filed:
February 10, 2021
Date of Patent:
December 20, 2022
Assignee:
Imagination Technologies Limited
Inventors:
Xile Yang, Robert Brigg, Michael John Livesley
Abstract: An apparatus and method for multi-adapter and/or multi-pass encoding on dual graphics processors. For example, one embodiment of a processor comprises: a central processor integrated on a first die, the central processor comprising a plurality of cores to execute instructions and process data; an first graphics processor integrated on the first die, the first graphics processor comprising media processing circuitry to perform one or more preliminary lookahead operations on video content to generate lookahead statistics; an interconnect to couple the first graphics processor to a lookahead buffer, the first graphics processor to transmit the lookahead statistics over the interconnect to the lookahead buffer; wherein the lookahead statistics are to be used by a second graphics processor to encode the video content to generate encoded video.
Abstract: Disclosed are various embodiments for performing a join operation using a graphics processing unit (GPU). The GPU can receive input data including sequences or tuples. The GPU can initialize a histogram in a memory location shared by threads. The GPU can build the histogram of hash values for the sequences. The GPU can reorder the sequences based on the histogram. The GPU can probe partitions and store the results in a buffer pool. The GPU can output the results of the join.
Abstract: A system may include a graphics processing unit including a command counter. The system may also include a general-purpose processor to: in response to a detection of a timing signal, determine a count value of the command counter included in the graphics processing unit; determine a first threshold range of a plurality of threshold ranges that matches the determined count value of the command counter; select, based on the determined first threshold range, a first configuration profile of a plurality of configuration profiles for the graphics processing unit; and cause the graphics processing unit to use the selected first configuration profile. Other embodiments are described and claimed.
Abstract: A method of casting a source device display screen to a target device includes, by an application on the source device, storing information about the target device in a shared memory and issuing a request to an operating system to initiate capturing and casting for the source device display screen. The operating system responds to the request by launching a casting extension and supplying a content stream containing content of the source device display screen. Upon being launched, the casting extension (1) obtains the information about the target device from the shared memory and uses the information to establish a display connection with the target device, and (2) forwards the content stream to the target device on the display connection.
Abstract: The present disclosure advantageously provides a hardware accelerator for a natural language processing application including a first memory, a second memory, and a computing engine (CE). The first memory is configured to store a configurable NLM and a set of NLM fixed weights. The second memory is configured to store an ANN model, a set of ANN weights, a set of NLM delta weights, input data and output data. The set of NLM delta weights may be smaller than the set of NLM fixed weights, and each NLM delta weight corresponds to an NLM fixed weight. The CE is configured to execute the NLM, based on the input data, the set of NLM fixed weights and the set of NLM delta weights, to generate intermediate output data, and execute the ANN model, based on the intermediate output data and the set of ANN weights, to generate the output data.
Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations. The functional units can also include circuitry to analyze statistics for output values of the tensor computations, determine a target format to convert the output values, the target format determined based on the statistics for the output values and a precision associated with a second layer of the neural network, and convert the output values to the target format.
Abstract: One embodiment provides for a machine-learning accelerator device a multiprocessor to execute parallel threads of an instruction stream, the multiprocessor including a compute unit, the compute unit including a set of functional units, each functional unit to execute at least one of the parallel threads of the instruction stream. The compute unit includes compute logic configured to execute a single instruction to scale an input tensor associated with a layer of a neural network according to a scale factor, the input tensor stored in a floating-point data type, the compute logic to scale the input tensor to enable a data distribution of data of the input tensor to be represented by a 16-bit floating point data type.
Abstract: Techniques are disclosed relating to dynamically adjusting buffering for distributing compute work in a graphics processor. In some embodiments, the graphics processor includes shader circuitry configured to process compute work from a compute kernel, multiple distributed workload parser circuits configured to send compute work to the shader circuitry, primary workload parser circuitry configured to send, via a communications fabric, compute work from the compute kernel to the distributed workload parser circuits, and buffer circuitry configured to buffer compute work received by one or more of the distributed workload parser circuits from the primary workload parser circuitry. In some embodiments, the graphics processor is configured to dynamically adjust a limit on the number of entries used in the buffer circuitry based on information indicating complexity of the compute kernel. This may advantageously maintain launch rates while reducing or avoiding workload imbalances, in some embodiments.
Abstract: An electronic device includes: a host configured to output image data and a timing control signal in response to an image intended to be displayed; a display driver IC coupled to the host through a first interface and a second interface and configured to output a data signal based on the image data; and a display configured to display an image based on the data signal, wherein the host is configured to output the image data and the timing control signal through any one of the first interface and the second interface depending on a display mode.
Abstract: In general, the disclosure describes techniques for creating runtime-throttleable neural networks (TNNs) that can adaptively balance performance and resource use in response to a control signal. For example, runtime-TNNs may be trained to be throttled via a gating scheme in which a set of disjoint components of the neural network can be individually “turned off” at runtime without significantly affecting the accuracy of NN inferences. A separate gating neural network may be trained to determine which trained components of the NN to turn off to obtain operable performance for a given level of resource use of computational, power, or other resources by the neural network. This level can then be specified by the control signal at runtime to adapt the NN to operate at the specified level and in this way balance performance and resource use for different operating conditions.
Abstract: A graphics processing unit (GPU) includes one or more processor cores adapted to execute a software-implemented shader program, and one or more hardware-implemented ray tracing units (RTU) adapted to traverse an acceleration structure to calculate intersections of rays with bounding volumes and graphics primitives asynchronously with shader operation. The RTU implements traversal logic to traverse the acceleration structure including transformation of rays as needed to account for variations in coordinate space between levels, stack management, and other tasks to relieve burden on the shader, communicating intersections to the shader which then calculates whether the intersection hit a transparent or opaque portion of the object intersected. Thus, one or more processing cores within the GPU perform accelerated ray tracing by offloading aspects of processing to the RTU, which traverses the acceleration structure within which the 3D environment is represented.
Abstract: The present disclosure relates to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may modify at least one texture memory object to support a data structure for one or more tensor objects. The apparatus may also determine one or more supported memory layouts for the one or more tensor objects based on the modified at least one texture memory object. Additionally, the apparatus may access data associated with the one or more tensor objects based on the one or more supported memory layouts, the data for each of the one or more tensor objects corresponding to at least one data instruction. The apparatus may also execute the at least one data instruction based on the accessed data associated with the one or more tensor objects.
Type:
Grant
Filed:
February 11, 2021
Date of Patent:
October 25, 2022
Assignee:
QUALCOMM Incorporated
Inventors:
Elina Kamenetskaya, Liang Li, Andrew Evan Gruber, Jeffrey Leger, Balaji Calidas, Ruihao Zhang
Abstract: An apparatus, method, and computer readable medium that access a frame buffer of a graphics processing unit (GPU), analyze, in the frame buffer, a frame representing displayed data, based on the analyzed frame, identify a reference patch that includes an instruction to retrieve content, generate an overlay including an augmentation layer which includes the content, superimpose the overlay onto the displayed data such that the content is viewable while a portion of the base layer is obscured, detect a user input, determine a location of the user input in the augmentation layer, associate the location in the augmentation layer with a target location in the base layer, and associate, within memory, the target location with an operation such that the user input in the augmentation layer activates an input in the base layer.
Type:
Grant
Filed:
February 18, 2022
Date of Patent:
October 18, 2022
Assignee:
Mobeus Industries, Inc.
Inventors:
Dharmendra Etwaru, Michael R. Sutcliff, Aram Andriasyan
Abstract: A method of and apparatus for processing graphics in a tile-based graphics processing system, wherein when preparing primitive lists it is determined, based on a measure of the size of a primitive, whether or not to perform processing of one or more attributes of one or more vertices of the primitive.
Type:
Grant
Filed:
March 30, 2021
Date of Patent:
October 11, 2022
Assignee:
Arm Limited
Inventors:
Olof Henrik Uhrenholt, Michael Martin Klock, Philip Carlos Garcia
Abstract: A graphics processor is described that includes a single instruction, multiple thread (SIMT) architecture including hardware multithreading. The multiprocessor can execute parallel threads of instructions associated with a command stream, where the multiprocessor includes a set of functional units to execute at least one of the parallel threads of the instructions. The set of functional units can include a mixed precision tensor processor to perform tensor computations to generate loss data. The loss data is stored as a floating-point data type and scaled by a scaling factor to enable a data distribution of a gradient tensor generated based on the loss data to be represented by a 16-bit floating point data type.
Abstract: Examples of a light field metrology system for use with a display are disclosed. The light field metrology may capture images of a projected light field, and determine focus depths (or lateral focus positions) for various regions of the light field using the captured images. The determined focus depths (or lateral positions) may then be compared with intended focus depths (or lateral positions), to quantify the imperfections of the display. Based on the measured imperfections, an appropriate error correction may be performed on the light field to correct for the measured imperfections. The display can be an optical display element in a head mounted display, for example, an optical display element capable of generating multiple depth planes or a light field display.
Type:
Grant
Filed:
January 15, 2020
Date of Patent:
September 27, 2022
Assignee:
Magic Leap, Inc.
Inventors:
Lionel E. Edwin, Ivan L. Yeoh, Samuel A. Miller
Abstract: The subject disclosure relates to techniques for inserting of bounding boxes around image objects. A process of the disclosed technology can include steps for receiving, from a first data set, an image comprising a first image object, processing the image to identify a pixel region associated with the first image object, and placing a first bounding box around the first image object based on the identified pixel region. In some aspects, the process further includes steps for receiving a user input comprising an indication of whether the first bounding box is accurately placed around the first image object. Systems and machine-readable media are also provided.
Type:
Grant
Filed:
January 21, 2020
Date of Patent:
September 27, 2022
Assignee:
GM Cruise Holdings LLC
Inventors:
Matthias Wisniowski, Radu Dondera, Bin Ma, Siddharth Raina