Patents by Inventor Sameer M. Gauria

Sameer M. Gauria has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10846364
    Abstract: An apparatus includes a memory and a circuit coupled to the memory. The memory may be configured as a local buffer to store all or a portion of a first array of values and all or a portion of a second array of values. The circuit may be configured to (i) calculate an intermediate array of values by multiplying a converted version of the first array by a converted version of the second array and (ii) calculate an output array comprising a plurality of output values based on values of the intermediate array and a predefined dimensional reduction.
    Type: Grant
    Filed: July 30, 2019
    Date of Patent: November 24, 2020
    Assignee: Ambarella International LP
    Inventors: Sameer M. Gauria, Peter Verplaetse
  • Patent number: 10620876
    Abstract: An apparatus includes a memory, a first buffer, a second buffer, and a processing circuit. The memory may be configured to store data. The first buffer may be configured to store a plurality of kernel values fetched from the memory and present a first signal communicating the kernel values as stored. The second buffer may be configured to store a plurality of input tiles fetched from the memory and present a second signal communicating the input tiles as stored. The processing circuit may be configured to (i) receive the first signal and the second signal, (ii) calculate a plurality of intermediate values in parallel by multiplying the input tiles with a corresponding one of the kernel values, and (iii) calculate an output tile comprising a plurality of output values based on the intermediate values. The kernel values are generally fetched from the memory to the first buffer slower than the input tiles are fetched from the memory to the second buffer.
    Type: Grant
    Filed: April 22, 2019
    Date of Patent: April 14, 2020
    Assignee: Ambarella International LP
    Inventors: Sameer M. Gauria, Peter Verplaetse
  • Patent number: 10466926
    Abstract: An apparatus comprising a plurality of image sensors configured to capture an image and a processor. The processor may comprise a buffer. The processor may be configured to (i) receive data from the image in a sequential order, (ii) perform cost calculations on the data, (iii) store the data in the buffer in a direction, (iv) when data corresponding to an end of a line of pixels of the image is stored, perform a second cost calculation on the stored data corresponding to the line and (v) reverse the direction of storing the data in the buffer. An order for the second cost calculations on the line of the data may be last in, first out. The data may be stored while the second cost calculations are performed. Data may not be removed from the buffer until the second cost calculation has been performed on the data.
    Type: Grant
    Filed: June 6, 2017
    Date of Patent: November 5, 2019
    Assignee: Ambarella, Inc.
    Inventors: Sri Sailaja Vemu, Sameer M. Gauria
  • Patent number: 10409887
    Abstract: An apparatus includes a memory and a circuit. The memory may be configured to store data. The circuit generally includes a local buffer. The circuit may be configured to (i) fetch all or a portion of a first array of values from the memory to the local buffer, (ii) fetch all or a portion of a second array of values from the memory to the local buffer, (iii) calculate an intermediate array of values by multiplying a converted version of the first array by a converted version of the second array, and (iv) calculate an output array comprising a plurality of output values based on values of the intermediate array and a predefined dimensional reduction.
    Type: Grant
    Filed: February 28, 2017
    Date of Patent: September 10, 2019
    Assignee: Ambarella, Inc.
    Inventors: Sameer M. Gauria, Peter Verplaetse
  • Patent number: 10310768
    Abstract: An apparatus includes a memory and a circuit. The memory may be configured to store data. The circuit generally has a buffer and may be configured to (i) fetch a kernel from the memory, where the kernel may have a plurality of kernel values, (ii) fetch a block from the memory to the buffer, where the block may have a plurality of input tiles and each of the input tiles may have a plurality of input values in multiple dimensions, (iii) calculate a plurality of intermediate values in parallel by multiplying the input tiles read from the buffer with a corresponding one of the kernel values and (iv) calculate an output tile that may have a plurality of output values based on the intermediate values.
    Type: Grant
    Filed: January 11, 2017
    Date of Patent: June 4, 2019
    Assignee: Ambarella, Inc.
    Inventors: Sameer M. Gauria, Peter Verplaetse
  • Patent number: 8237725
    Abstract: A vertex cache within a graphics processor is configured to operate as a conventional round-robin streaming cache when per-vertex state changes are not used and is configured to operate as a random access storage buffer when per-vertex state changes are used. Batches of vertices that define primitives and state changes are output to parallel processing units for processing according to vertex shader program. In addition to allowing per-vertex state changes, the vertex cache is configured to store vertices for primitive topologies that use anchor points, such as triangle strips, line loops, and polygons.
    Type: Grant
    Filed: November 5, 2007
    Date of Patent: August 7, 2012
    Assignee: NVIDA Corporation
    Inventors: James C. Bowman, Dane T. Mrazek, Sameer M. Gauria
  • Patent number: 7796137
    Abstract: Disclosed are an apparatus, a system, a method, a graphics processing unit (“GPU”), a computer device, and a computer medium to implement a pool of independent enhanced tags to, among other things, decouple a dependency between tags and cachelines. In one embodiment, an enhanced tag-based cache structure includes a tag repository configured to maintain a pool of enhanced tags. Each enhanced tag can have a match portion configured to form an association between the enhanced tag and an incoming address. Also, an enhanced tag can have a data locator portion configured to locate a cacheline in the cache in response to the formation of the association. The data locator portion enables the enhanced tag to locate multiple cachelines. Advantageously, the enhanced tag-based cache structure can be formed to adjust the degree of reusability of the enhanced tags independent from the degree of latency tolerance for the cacheline repository.
    Type: Grant
    Filed: October 24, 2006
    Date of Patent: September 14, 2010
    Assignee: NVIDIA Corporation
    Inventors: Dane T. Mrazek, Sameer M. Gauria, James C. Bowman
  • Patent number: 7797258
    Abstract: A graphics system includes a transposer. A read scheduler utilizes a minimum cost analysis to schedule a read transfer order for the transposer to minimize the total number of passes required to process a set of input vectors.
    Type: Grant
    Filed: November 2, 2006
    Date of Patent: September 14, 2010
    Assignee: NVIDIA Corporation
    Inventors: James C. Bowman, Dane T. Mrazek, Sameer M. Gauria
  • Patent number: 7755631
    Abstract: Disclosed are an apparatus, a method, a programmable graphics processing unit (“GPU”), a computer device, and a computer medium to facilitate, among other things, the generation of parallel data streams to effect parallel processing in at least a portion of a graphics pipeline of a GPU. In one embodiment, an input of the apparatus receives graphics elements in a data stream of graphics elements. The graphics pipeline can use the graphics elements to form computer-generated images. The apparatus also can include a transposer configured to produce parallel attribute streams. Each of the parallel attribute streams includes a type of attribute common to the graphics elements. In one embodiment, the transposer can be configured to convert at least a portion of the graphics pipeline from a single data stream to multiple data streams (e.g., executable by multiple threads of execution) while reducing the memory size requirements to implement such a conversion.
    Type: Grant
    Filed: October 24, 2006
    Date of Patent: July 13, 2010
    Assignee: Nvidia Corporation
    Inventors: Dane T. Mrazek, Sameer M. Gauria, James C. Bowman
  • Patent number: 7701459
    Abstract: A graphics system has parallel processing units that do not share vertex information. The graphics system constructs independent batches of work for the parallel processing units in which each batch of work has a list of vertices for a set of primitives.
    Type: Grant
    Filed: November 3, 2006
    Date of Patent: April 20, 2010
    Assignee: NVIDIA Corporation
    Inventors: Dane T. Mrazek, James C. Bowman, Sameer M. Gauria
  • Patent number: 7647467
    Abstract: On the fly tuning of parameters used in an interface between a memory (e.g. high speed memory such as DRAM) and a processor requesting access to the memory. In an operational mode, a memory controller couples the processor to the memory. The memory controller can also inhibit the operational mode to initiate a training mode. In the training mode, the memory controller tunes one or more parameters (voltage references, timing skews, etc.) used in an upcoming operational mode. The access to the memory may be from an isochronous process running on a graphics processor. The memory controller determines whether the isochronous process may be inhibited before entering the training mode. If memory buffers for the isochronous process are such that the training mode will not impact the isochronous process, then the memory controller can enter the training mode to tune the interface parameters without negatively impacting the process.
    Type: Grant
    Filed: December 19, 2006
    Date of Patent: January 12, 2010
    Assignee: NVIDIA Corporation
    Inventors: Brian D. Hutsell, Sameer M. Gauria, Philip R. Manela, John A. Robinson
  • Patent number: 7571281
    Abstract: In one embodiment, an apparatus includes an input port to receive a request to determine whether data units are stored in the cache, as well as an output port to generate look-ups for the pool of tags. The apparatus also includes a look-up filter coupled to the input and output ports, and operates to filter out superfluous look-ups for the data units, thereby forming filtered look-ups. Advantageously, the look-up filter can filter out superfluous look-ups to at least reduce the quantity of look-up operations associated with the request, thereby reducing stalling associated with multiple look-up operations. In a specific embodiment, the look-up filter can include a data unit grouping detector and a look-up suppressor.
    Type: Grant
    Filed: June 2, 2006
    Date of Patent: August 4, 2009
    Assignee: Nvidia Corporation
    Inventor: Sameer M. Gauria