Abstract: Rendering system combines point sampling and volume sampling operations to produce rendering outputs. For example, to determine color information for a surface location in a 3-D scene, one or more point sampling operations are conducted in a volume around the surface location, and one or more sampling operations of volumetric light transport data are performed farther from the surface location. A transition zone between point sampling and volume sampling can be provided, in which both point and volume sampling operations are conducted. Data obtained from point and volume sampling operations can be blended in determining color information for the surface location. For example, point samples are obtained by tracing a ray for each point sample, to identify an intersection between another surface and the ray, to be shaded, and volume samples are obtained from a nested 3-D grids of volume elements expressing light transport data at different levels of granularity.
Abstract: A method and apparatus are provided for tessellating patches of surfaces in a tile based three dimensional computer graphics rendering system. For each tile in an image a per tile list of primitive indices is derived for tessellated primitives which make up a patch. Hidden surface removal is then performed on the patch and any domain points which remain after hidden surface removal are derived. The primitives are then shaded for display.
Abstract: A method and encoding unit for encoding a block of pixels into a compressed data structure determines a set of Haar coefficients for a 2×2 quad of pixels of the block of pixels, which set includes a plurality of differential coefficients and an average coefficient. A first portion of the compressed data structure is determined using the differential coefficients and includes a first set of bits which indicates an order of the magnitudes of the differential coefficients, and a second set of bits which indicates a sign and an exponent for each of one or more of the differential coefficients which are non-zero. A second portion of the compressed data structure is determined using the average coefficient determined for the 2×2 quad of pixels. The compressed data structure is stored.
Abstract: Aspects relate to tracing rays in 3-D scenes that comprise objects that are defined by or with implicit geometry. In an example, a trapping element defines a portion of 3-D space in which implicit geometry exist. When a ray is found to intersect a trapping element, a trapping element procedure is executed. The trapping element procedure may comprise marching a ray through a 3-D volume and evaluating a function that defines the implicit geometry for each current 3-D position of the ray. An intersection detected with the implicit geometry may be found concurrently with intersections for the same ray with explicitly-defined geometry, and data describing these intersections may be stored with the ray and resolved.
Type:
Grant
Filed:
December 14, 2022
Date of Patent:
January 2, 2024
Assignee:
Imagination Technologies Limited
Inventors:
Cuneyt Ozdas, Luke Tilman Peterson, Steven Blackmon, Steven John Clohset
Abstract: Methods of memory allocation in which registers referenced by different groups of instances of the same task are mapped to individual logical memories. Other example methods describe the mapping of registers referenced by a task to different banks within a single logical memory and in various examples this mapping may take into consideration which bank is likely to be the dominant bank for the particular task and the allocation for one or more other tasks.
Abstract: Data structures, methods and tiling engines for storing tiling data in memory wherein the tiles are grouped into tile groups and the primitives are grouped into primitive blocks. The methods include, for each tile group: determining, for each tile in the tile group, which primitives of each primitive block intersect that tile; storing in memory a variable length control data block for each primitive block that comprises at least one primitive that intersects at least one tile of the tile group; and storing in memory a control stream comprising a fixed sized primitive block entry for each primitive block that comprises at least one primitive that intersects at least one tile of the tile group, each primitive block entry identifying a location in memory of the control data block for the corresponding primitive block. Each primitive block entry may comprise valid tile information identifying which tiles of the tile group are valid for the corresponding primitive block.
Type:
Grant
Filed:
December 13, 2022
Date of Patent:
January 2, 2024
Assignee:
Imagination Technologies Limited
Inventors:
Xile Yang, Robert Brigg, Michael John Livesley
Abstract: Methods of arbitrating between requestors and a shared resource are described. The method comprises generating a vector with one bit per requestor, each initially set to one. Based on a plurality of select signals (one per decision node in a first layer of a binary decision tree, where each select signal is configured to be used by the corresponding decision node to select one of two child nodes), bits in the vector corresponding to non-selected requestors are set to zero. The method is repeated for each subsequent layer in the binary decision tree, based on the select signals for the decision nodes in those layers. The resulting vector is a priority vector in which only a single bit has a value of one. Access to the shared resource is granted, for a current processing cycle, to the requestor corresponding to the bit having a value of one.
Abstract: A multicore hardware implementation of a deep neural network includes a plurality of layers arranged in plurality of layer groups. The input data to the network comprises a multidimensional tensor including one or more traversed dimensions, being dimensions that are traversed by strides in at least one layer of a first layer group. The hardware implementation is configured to split the input data for the first layer group into at least a first tile and a second tile, along at least one of the traversed dimensions, each tile comprising a plurality of data elements in each of the one or more traversed dimensions. A first core is configured to evaluate multiple layer groups, depth-first, based on the first tile. A second core is configured to evaluate multiple layer groups, depth-first, based on the second tile.
Abstract: Methods and systems for determining whether an infinitely precise result of a reciprocal square root operation performed on an input floating point number is greater than a particular number in a first floating point precision. The method includes calculating the square of the particular number in a second lower floating point precision; calculating an error in the calculated square due to the second floating point precision; calculating a first delta value in the first floating point precision by calculating the square multiplied by the input floating point number less one; calculating a second delta value by calculating the error multiplied by the input floating point number plus the first delta value; and outputting an indication of whether the infinitely precise result of the reciprocal square root operation is greater than the particular number based on the second delta term.
Abstract: In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor.
Abstract: The operation of a GPU is controlled based on one or more deadlines by which one or more GPU tasks must be completed and estimates of the time required to complete the execution of a first GPU task (which is currently being executed) and the time required to execute one or more other GPU tasks (which are not currently being executed). Based on a comparison between the deadline(s) and the estimates, the operating parameters of the GPU may be changed.
Abstract: A method of compressing data is described in which the compressed data is generated by either or both of a primary compression unit or a reserve compression unit in order that a target compression threshold is satisfied. If a compressed data block generated by the primary compression unit satisfies the compression threshold, that block is output. However, if the compressed data block generated by the primary compression unit is too large, such that the compression threshold is not satisfied, a compressed data block generated by the reserve compression unit using a lossy compression technique, is output.
Abstract: A graphics processing system for performing tile-based rendering of a scene that includes safety-related primitives has a plurality of graphics processing units (GPUs), each configured to i) receive tile data identifying one or more protected tiles comprising at least part of a safety-related primitive, ii) process two respective sets of protected tiles, and iii) based on said processing, generate two respective checksums for each respective set of protected tiles. The two respective sets of protected tiles are mutually exclusive, and each respective set and each protected tile being processed by two different GPUs. The system includes a comparison unit configured to compare one or more pairs of checksums, each pair comprising a respective checksum generated based on a same respective set of protected tiles and generated by different GPUs.
Abstract: Adder circuits and associated methods for processing a set of at least three floating-point numbers to be added together include identifying, from among the at least three numbers, at least two numbers that have the same sign—that is, at least two numbers that are both positive or both negative. The identified at least two numbers are added together using one or more same-sign floating-point adders. A same-sign floating-point adder comprises circuitry configured to add together floating-point numbers having the same sign and does not include circuitry configured to add together numbers having different signs.
Type:
Grant
Filed:
January 31, 2022
Date of Patent:
December 19, 2023
Assignee:
Imagination Technologies Limited
Inventors:
Sam Elliott, Jonas Olof Gunnar Kallen, Casper Van Benthem
Abstract: The operation of a GPU is controlled based on one or more deadlines by which one or more GPU tasks must be completed and estimates of the time required to complete the execution of a first GPU task (which is currently being executed) and the time required to execute one or more other GPU tasks (which are not currently being executed). Based on a comparison between the deadline(s) and the estimates, context switching may or may not be triggered.
Abstract: Livelock recovery circuits configured to detect livelock in a processor, and cause the processor to transition to a known safe state when livelock is detected. The livelock recovery circuits include detection logic configured to detect that the processor is in livelock when the processor has illegally repeated an instruction; and transition logic configured to cause the processor to transition to a safe state when livelock has been detected by the detection logic.
Abstract: A motion estimation technique finds first and second candidate bi-directional motion vectors for a first region of an interpolated frame of video content by performing double ended vector motion estimation on the first region. One of these candidate bi-directional motion vectors is selected, and used to identify a remote region of the interpolated frame. This remote region is located at an off-set location from the first region, and is found based on an endpoint of the selected candidate bi-directional motion vector. A remote motion vector for the remote region of the interpolated frame is obtained, and one or more properties of this remote motion vector are used to bias a selection between the first and second candidate vectors.
Abstract: A graphics processing system is configured to render primitives using a rendering space that is sub-divided into sections, wherein the graphics processing system includes assessment logic configured to make an assessment regarding the presence of primitive edges in a section, and determination logic configured to determine an anti-aliasing setting for the section based on the assessment.
Abstract: A system and method for performing intersection testing of rays in a ray tracing system. The ray tracing system uses a hierarchical acceleration structure comprising a plurality of nodes, each identifying one or more elements able to be intersected by a ray. The system makes use of a serial-mode ray intersection process, in which, when a ray intersects a bounding volume, a limited number of new ray requests are generated.