Patents by Inventor Michael C Shebanow

Michael C Shebanow has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10365930
    Abstract: A technique for managing a parallel cache hierarchy that includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.
    Type: Grant
    Filed: May 1, 2017
    Date of Patent: July 30, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: John R. Nickolls, Brett W. Coon, Michael C. Shebanow
  • Patent number: 9922457
    Abstract: A system and method for performing tessellation of three-dimensional surface patches performs some tessellation operations using programmable processing units and other tessellation operations using fixed function units with limited precision. (u,v) parameter coordinates for each vertex are computed using fixed function units to offload programmable processing engines. The (u,v) computation is a symmetric operation and is based on integer coordinates of the vertex, tessellation level of detail values, and a spacing mode.
    Type: Grant
    Filed: December 2, 2013
    Date of Patent: March 20, 2018
    Assignee: NVIDIA CORPORATION
    Inventors: Justin S. Legakis, Emmett M. Kilgariff, Michael C. Shebanow
  • Patent number: 9830161
    Abstract: In one embodiment of the present invention, a streaming multiprocessor (SM) uses a tree of nodes to manage threads. Each node specifies a set of active threads and a program counter. Upon encountering a conditional instruction that causes an execution path to diverge, the SM creates child nodes corresponding to each of the divergent execution paths. Based on the conditional instruction, the SM assigns each active thread included in the parent node to at most one child node, and the SM temporarily discontinues executing instructions specified by the parent node. Instead, the SM concurrently executes instructions specified by the child nodes. After all the divergent paths reconverge to the parent path, the SM resumes executing instructions specified by the parent node. Advantageously, the disclosed techniques enable the SM to execute divergent paths in parallel, thereby reducing undesirable program behavior associated with conventional techniques that serialize divergent paths across thread groups.
    Type: Grant
    Filed: January 21, 2014
    Date of Patent: November 28, 2017
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Michael C. Shebanow
  • Publication number: 20170235581
    Abstract: A technique for managing a parallel cache hierarchy that includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.
    Type: Application
    Filed: May 1, 2017
    Publication date: August 17, 2017
    Inventors: John R. NICKOLLS, Brett W. Coon, Michael C. Shebanow
  • Patent number: 9639479
    Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.
    Type: Grant
    Filed: September 22, 2010
    Date of Patent: May 2, 2017
    Assignee: NVIDIA Corporation
    Inventors: John R. Nickolls, Brett W. Coon, Michael C. Shebanow
  • Patent number: 9536341
    Abstract: One embodiment of the present invention sets forth a technique for parallel distribution of primitives to multiple rasterizers. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives from the multiple geometry units concurrently to multiple rasterizers at rates of multiple primitives per clock. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.
    Type: Grant
    Filed: October 19, 2009
    Date of Patent: January 3, 2017
    Assignee: NVIDIA Corporation
    Inventors: Johnny S. Rhoades, Emmett M. Kilgariff, Michael C. Shebanow, Ziyad S. Hakura, Dale L. Kirkland, James Daniel Kelly
  • Patent number: 9223578
    Abstract: One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.
    Type: Grant
    Filed: September 21, 2010
    Date of Patent: December 29, 2015
    Assignee: NVIDIA Corporation
    Inventors: John R. Nickolls, Steven James Heinrich, Brett W. Coon, Michael C. Shebanow
  • Publication number: 20150205606
    Abstract: In one embodiment of the present invention, a streaming multiprocessor (SM) uses a tree of nodes to manage threads. Each node specifies a set of active threads and a program counter. Upon encountering a conditional instruction that causes an execution path to diverge, the SM creates child nodes corresponding to each of the divergent execution paths. Based on the conditional instruction, the SM assigns each active thread included in the parent node to at most one child node, and the SM temporarily discontinues executing instructions specified by the parent node. Instead, the SM concurrently executes instructions specified by the child nodes. After all the divergent paths reconverge to the parent path, the SM resumes executing instructions specified by the parent node. Advantageously, the disclosed techniques enable the SM to execute divergent paths in parallel, thereby reducing undesirable program behavior associated with conventional techniques that serialize divergent paths across thread groups.
    Type: Application
    Filed: January 21, 2014
    Publication date: July 23, 2015
    Applicant: NVIDIA CORPORATION
    Inventors: John Erik LINDHOLM, Michael C. SHEBANOW
  • Publication number: 20150187043
    Abstract: A method for storage allocation for a graphical processing unit includes maintaining a unified storage structure for the graphical processing unit. Multiple physical storage structures are virtualized in the unified storage structure by dynamically forming multiple logical storage structures from the unified storage structure for the multiple physical storage structures.
    Type: Application
    Filed: December 27, 2013
    Publication date: July 2, 2015
    Applicant: SAMSUNG ELECTRONICS COMPANY, LTD.
    Inventors: Michael C. Shebanow, MAGNUS EKMAN
  • Patent number: 8947444
    Abstract: A data structure that includes pointers to vertex attributes and primitive descriptions is generated and then processed within a general processing cluster. The general processing cluster includes a vertex attribute fetch unit that fetches from memory vertex attributes corresponding to the vertices defined by the primitive descriptions.
    Type: Grant
    Filed: December 9, 2008
    Date of Patent: February 3, 2015
    Assignee: NVIDIA Corporation
    Inventors: Ziyad S. Hakura, Emmett M. Kilgariff, Michael C. Shebanow, James C. Bowman, Philip Browning Johnson, Johnny S. Rhoades, Rohit Gupta
  • Patent number: 8860742
    Abstract: A technique for caching coverage information for edges that are shared between adjacent graphics primitives may reduce the number of times a shared edge is rasterized. Consequently, power consumed during rasterization may be reduced. During rasterization of a first graphics primitive coverage information is generated that (1) indicates cells within a sampling grid that are entirely outside an edge of the first graphics primitive and (2) indicates cells within the sampling grid that are intersected by the edge and are only partially covered by the first graphics primitive. The coverage information for the edge is stored in a cache. When a second graphics primitive is rasterized that shares the edge with the first graphics primitive, the coverage information is read from the cache instead of being recomputed.
    Type: Grant
    Filed: May 2, 2012
    Date of Patent: October 14, 2014
    Assignee: NVIDIA Corporation
    Inventors: Michael C. Shebanow, Anjul Patney
  • Patent number: 8817031
    Abstract: A technique for performing stream output operations in a parallel processing system is disclosed. A stream synchronization unit is provided that enables the parallel processing unit to track batches of vertices being processed in a graphics processing pipeline. A plurality of stream output units is also provided, where each stream output unit writes vertex attribute data to one or more stream output buffers for a portion of the batches of vertices. A messaging protocol is implemented between the stream synchronization unit and the plurality of stream output units that ensures that each of the stream output units writes vertex attribute data for the particular batch of vertices distributed to that particular stream output unit in the same order in the stream output buffers as the order in which the batch of vertices was received from a device driver by the parallel processing unit.
    Type: Grant
    Filed: September 29, 2010
    Date of Patent: August 26, 2014
    Assignee: NVIDIA Corporation
    Inventors: Ziyad S. Hakura, Rohit Gupta, Michael C. Shebanow, Emmett M. Kilgariff
  • Publication number: 20140160126
    Abstract: A system and method for performing tessellation of three-dimensional surface patches performs some tessellation operations using programmable processing units and other tessellation operations using fixed function units with limited precision. (u,v) parameter coordinates for each vertex are computed using fixed function units to offload programmable processing engines. The (u,v) computation is a symmetric operation and is based on integer coordinates of the vertex, tessellation level of detail values, and a spacing mode.
    Type: Application
    Filed: December 2, 2013
    Publication date: June 12, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Justin S. LEGAKIS, Emmett M. KILGARIFF, Michael C. SHEBANOW
  • Patent number: 8704836
    Abstract: One embodiment of the present invention sets forth a technique for parallel distribution of primitives to multiple rasterizers. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives from the multiple geometry units concurrently to multiple rasterizers at rates of multiple primitives per clock. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.
    Type: Grant
    Filed: October 19, 2009
    Date of Patent: April 22, 2014
    Assignee: NVIDIA Corporation
    Inventors: Johnny S. Rhoades, Steven E. Molnar, Emmett M. Kilgariff, Michael C. Shebanow, Ziyad S. Hakura, Dale L. Kirkland, James Daniel Kelly
  • Patent number: 8700877
    Abstract: A method for thread address mapping in a parallel thread processor. The method includes receiving a thread address associated with a first thread in a thread group; computing an effective address based on a location of the thread address within a local window of a thread address space; computing a thread group address in an address space associated with the thread group based on the effective address and a thread identifier associated with a first thread; and computing a virtual address associated with the first thread based on the thread group address and a thread group identifier, where the virtual address is used to access a location in a memory associated with the thread address to load or store data.
    Type: Grant
    Filed: September 24, 2010
    Date of Patent: April 15, 2014
    Assignee: Nvidia Corporation
    Inventors: Michael C. Shebanow, Yan Yan Tang, John R. Nickolls
  • Patent number: 8599202
    Abstract: A system and method for performing tessellation of three-dimensional surface patches performs some tessellation operations using programmable processing units and other tessellation operations using fixed function units with limited precision. (u,v) parameter coordinates for each vertex are computed using fixed function units to offload programmable processing engines. The (u,v) computation is a symmetric operation and is based on integer coordinates of the vertex, tessellation level of detail values, and a spacing mode.
    Type: Grant
    Filed: September 29, 2008
    Date of Patent: December 3, 2013
    Assignee: Nvidia Corporation
    Inventors: Justin S. Legakis, Emmett M. Kilgariff, Michael C. Shebanow
  • Patent number: 8533435
    Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.
    Type: Grant
    Filed: September 3, 2010
    Date of Patent: September 10, 2013
    Assignee: NVIDIA Corporation
    Inventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
  • Patent number: 8522000
    Abstract: A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.
    Type: Grant
    Filed: September 29, 2009
    Date of Patent: August 27, 2013
    Assignee: Nvidia Corporation
    Inventors: Michael C. Shebanow, Jack Choquette, Brett W. Coon, Steven J. Heinrich, Aravind Kalaiah, John R. Nickolls, Daniel Salinas, Ming Y. Siu, Tommy Thorn, Nicholas Wang
  • Patent number: 8458440
    Abstract: One embodiment of the present invention sets forth a technique for computing virtual addresses for accessing thread data. Components of the complete virtual address for a thread group are used to determine whether or not a cache line corresponding to the complete virtual address is not allocated in the cache. Actual computation of the complete virtual address is deferred until after determining that a cache line corresponding to the complete virtual address is not allocated in the cache.
    Type: Grant
    Filed: August 17, 2010
    Date of Patent: June 4, 2013
    Assignee: NVIDIA Corporation
    Inventor: Michael C. Shebanow
  • Publication number: 20120280992
    Abstract: The grid walk sampling technique is an efficient sampling algorithm aimed at optimizing the cost of triangle rasterization for modern graphics workloads. Grid walk sampling is an iterative rasterization algorithm that intelligently tests the intersection of triangle edges with multi-cell grids, determining coverage for a grid cell while identifying other cells in the grid that are either fully covered or fully uncovered by the triangle. Grid walk sampling rasterizes triangles using fewer computations and simpler computations compared with conventional highly parallel rasterizers. Therefore, a rasterizer employing grid walk sampling may compute sample coverage of triangles more efficiently in terms of power and circuitry die area compared with conventional highly parallel rasterizers.
    Type: Application
    Filed: May 1, 2012
    Publication date: November 8, 2012
    Inventors: Michael C. Shebanow, Anjul Patney