Patents by Inventor Michael C Shebanow
Michael C Shebanow has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10365930Abstract: A technique for managing a parallel cache hierarchy that includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.Type: GrantFiled: May 1, 2017Date of Patent: July 30, 2019Assignee: NVIDIA CORPORATIONInventors: John R. Nickolls, Brett W. Coon, Michael C. Shebanow
-
Patent number: 9922457Abstract: A system and method for performing tessellation of three-dimensional surface patches performs some tessellation operations using programmable processing units and other tessellation operations using fixed function units with limited precision. (u,v) parameter coordinates for each vertex are computed using fixed function units to offload programmable processing engines. The (u,v) computation is a symmetric operation and is based on integer coordinates of the vertex, tessellation level of detail values, and a spacing mode.Type: GrantFiled: December 2, 2013Date of Patent: March 20, 2018Assignee: NVIDIA CORPORATIONInventors: Justin S. Legakis, Emmett M. Kilgariff, Michael C. Shebanow
-
Patent number: 9830161Abstract: In one embodiment of the present invention, a streaming multiprocessor (SM) uses a tree of nodes to manage threads. Each node specifies a set of active threads and a program counter. Upon encountering a conditional instruction that causes an execution path to diverge, the SM creates child nodes corresponding to each of the divergent execution paths. Based on the conditional instruction, the SM assigns each active thread included in the parent node to at most one child node, and the SM temporarily discontinues executing instructions specified by the parent node. Instead, the SM concurrently executes instructions specified by the child nodes. After all the divergent paths reconverge to the parent path, the SM resumes executing instructions specified by the parent node. Advantageously, the disclosed techniques enable the SM to execute divergent paths in parallel, thereby reducing undesirable program behavior associated with conventional techniques that serialize divergent paths across thread groups.Type: GrantFiled: January 21, 2014Date of Patent: November 28, 2017Assignee: NVIDIA CorporationInventors: John Erik Lindholm, Michael C. Shebanow
-
Publication number: 20170235581Abstract: A technique for managing a parallel cache hierarchy that includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.Type: ApplicationFiled: May 1, 2017Publication date: August 17, 2017Inventors: John R. NICKOLLS, Brett W. Coon, Michael C. Shebanow
-
Patent number: 9639479Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.Type: GrantFiled: September 22, 2010Date of Patent: May 2, 2017Assignee: NVIDIA CorporationInventors: John R. Nickolls, Brett W. Coon, Michael C. Shebanow
-
Patent number: 9536341Abstract: One embodiment of the present invention sets forth a technique for parallel distribution of primitives to multiple rasterizers. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives from the multiple geometry units concurrently to multiple rasterizers at rates of multiple primitives per clock. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.Type: GrantFiled: October 19, 2009Date of Patent: January 3, 2017Assignee: NVIDIA CorporationInventors: Johnny S. Rhoades, Emmett M. Kilgariff, Michael C. Shebanow, Ziyad S. Hakura, Dale L. Kirkland, James Daniel Kelly
-
Patent number: 9223578Abstract: One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.Type: GrantFiled: September 21, 2010Date of Patent: December 29, 2015Assignee: NVIDIA CorporationInventors: John R. Nickolls, Steven James Heinrich, Brett W. Coon, Michael C. Shebanow
-
Publication number: 20150205606Abstract: In one embodiment of the present invention, a streaming multiprocessor (SM) uses a tree of nodes to manage threads. Each node specifies a set of active threads and a program counter. Upon encountering a conditional instruction that causes an execution path to diverge, the SM creates child nodes corresponding to each of the divergent execution paths. Based on the conditional instruction, the SM assigns each active thread included in the parent node to at most one child node, and the SM temporarily discontinues executing instructions specified by the parent node. Instead, the SM concurrently executes instructions specified by the child nodes. After all the divergent paths reconverge to the parent path, the SM resumes executing instructions specified by the parent node. Advantageously, the disclosed techniques enable the SM to execute divergent paths in parallel, thereby reducing undesirable program behavior associated with conventional techniques that serialize divergent paths across thread groups.Type: ApplicationFiled: January 21, 2014Publication date: July 23, 2015Applicant: NVIDIA CORPORATIONInventors: John Erik LINDHOLM, Michael C. SHEBANOW
-
Publication number: 20150187043Abstract: A method for storage allocation for a graphical processing unit includes maintaining a unified storage structure for the graphical processing unit. Multiple physical storage structures are virtualized in the unified storage structure by dynamically forming multiple logical storage structures from the unified storage structure for the multiple physical storage structures.Type: ApplicationFiled: December 27, 2013Publication date: July 2, 2015Applicant: SAMSUNG ELECTRONICS COMPANY, LTD.Inventors: Michael C. Shebanow, MAGNUS EKMAN
-
Patent number: 8947444Abstract: A data structure that includes pointers to vertex attributes and primitive descriptions is generated and then processed within a general processing cluster. The general processing cluster includes a vertex attribute fetch unit that fetches from memory vertex attributes corresponding to the vertices defined by the primitive descriptions.Type: GrantFiled: December 9, 2008Date of Patent: February 3, 2015Assignee: NVIDIA CorporationInventors: Ziyad S. Hakura, Emmett M. Kilgariff, Michael C. Shebanow, James C. Bowman, Philip Browning Johnson, Johnny S. Rhoades, Rohit Gupta
-
Patent number: 8860742Abstract: A technique for caching coverage information for edges that are shared between adjacent graphics primitives may reduce the number of times a shared edge is rasterized. Consequently, power consumed during rasterization may be reduced. During rasterization of a first graphics primitive coverage information is generated that (1) indicates cells within a sampling grid that are entirely outside an edge of the first graphics primitive and (2) indicates cells within the sampling grid that are intersected by the edge and are only partially covered by the first graphics primitive. The coverage information for the edge is stored in a cache. When a second graphics primitive is rasterized that shares the edge with the first graphics primitive, the coverage information is read from the cache instead of being recomputed.Type: GrantFiled: May 2, 2012Date of Patent: October 14, 2014Assignee: NVIDIA CorporationInventors: Michael C. Shebanow, Anjul Patney
-
Patent number: 8817031Abstract: A technique for performing stream output operations in a parallel processing system is disclosed. A stream synchronization unit is provided that enables the parallel processing unit to track batches of vertices being processed in a graphics processing pipeline. A plurality of stream output units is also provided, where each stream output unit writes vertex attribute data to one or more stream output buffers for a portion of the batches of vertices. A messaging protocol is implemented between the stream synchronization unit and the plurality of stream output units that ensures that each of the stream output units writes vertex attribute data for the particular batch of vertices distributed to that particular stream output unit in the same order in the stream output buffers as the order in which the batch of vertices was received from a device driver by the parallel processing unit.Type: GrantFiled: September 29, 2010Date of Patent: August 26, 2014Assignee: NVIDIA CorporationInventors: Ziyad S. Hakura, Rohit Gupta, Michael C. Shebanow, Emmett M. Kilgariff
-
Publication number: 20140160126Abstract: A system and method for performing tessellation of three-dimensional surface patches performs some tessellation operations using programmable processing units and other tessellation operations using fixed function units with limited precision. (u,v) parameter coordinates for each vertex are computed using fixed function units to offload programmable processing engines. The (u,v) computation is a symmetric operation and is based on integer coordinates of the vertex, tessellation level of detail values, and a spacing mode.Type: ApplicationFiled: December 2, 2013Publication date: June 12, 2014Applicant: NVIDIA CORPORATIONInventors: Justin S. LEGAKIS, Emmett M. KILGARIFF, Michael C. SHEBANOW
-
Patent number: 8704836Abstract: One embodiment of the present invention sets forth a technique for parallel distribution of primitives to multiple rasterizers. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives from the multiple geometry units concurrently to multiple rasterizers at rates of multiple primitives per clock. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.Type: GrantFiled: October 19, 2009Date of Patent: April 22, 2014Assignee: NVIDIA CorporationInventors: Johnny S. Rhoades, Steven E. Molnar, Emmett M. Kilgariff, Michael C. Shebanow, Ziyad S. Hakura, Dale L. Kirkland, James Daniel Kelly
-
Patent number: 8700877Abstract: A method for thread address mapping in a parallel thread processor. The method includes receiving a thread address associated with a first thread in a thread group; computing an effective address based on a location of the thread address within a local window of a thread address space; computing a thread group address in an address space associated with the thread group based on the effective address and a thread identifier associated with a first thread; and computing a virtual address associated with the first thread based on the thread group address and a thread group identifier, where the virtual address is used to access a location in a memory associated with the thread address to load or store data.Type: GrantFiled: September 24, 2010Date of Patent: April 15, 2014Assignee: Nvidia CorporationInventors: Michael C. Shebanow, Yan Yan Tang, John R. Nickolls
-
Patent number: 8599202Abstract: A system and method for performing tessellation of three-dimensional surface patches performs some tessellation operations using programmable processing units and other tessellation operations using fixed function units with limited precision. (u,v) parameter coordinates for each vertex are computed using fixed function units to offload programmable processing engines. The (u,v) computation is a symmetric operation and is based on integer coordinates of the vertex, tessellation level of detail values, and a spacing mode.Type: GrantFiled: September 29, 2008Date of Patent: December 3, 2013Assignee: Nvidia CorporationInventors: Justin S. Legakis, Emmett M. Kilgariff, Michael C. Shebanow
-
Patent number: 8533435Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.Type: GrantFiled: September 3, 2010Date of Patent: September 10, 2013Assignee: NVIDIA CorporationInventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
-
Patent number: 8522000Abstract: A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.Type: GrantFiled: September 29, 2009Date of Patent: August 27, 2013Assignee: Nvidia CorporationInventors: Michael C. Shebanow, Jack Choquette, Brett W. Coon, Steven J. Heinrich, Aravind Kalaiah, John R. Nickolls, Daniel Salinas, Ming Y. Siu, Tommy Thorn, Nicholas Wang
-
Patent number: 8458440Abstract: One embodiment of the present invention sets forth a technique for computing virtual addresses for accessing thread data. Components of the complete virtual address for a thread group are used to determine whether or not a cache line corresponding to the complete virtual address is not allocated in the cache. Actual computation of the complete virtual address is deferred until after determining that a cache line corresponding to the complete virtual address is not allocated in the cache.Type: GrantFiled: August 17, 2010Date of Patent: June 4, 2013Assignee: NVIDIA CorporationInventor: Michael C. Shebanow
-
Publication number: 20120280992Abstract: The grid walk sampling technique is an efficient sampling algorithm aimed at optimizing the cost of triangle rasterization for modern graphics workloads. Grid walk sampling is an iterative rasterization algorithm that intelligently tests the intersection of triangle edges with multi-cell grids, determining coverage for a grid cell while identifying other cells in the grid that are either fully covered or fully uncovered by the triangle. Grid walk sampling rasterizes triangles using fewer computations and simpler computations compared with conventional highly parallel rasterizers. Therefore, a rasterizer employing grid walk sampling may compute sample coverage of triangles more efficiently in terms of power and circuitry die area compared with conventional highly parallel rasterizers.Type: ApplicationFiled: May 1, 2012Publication date: November 8, 2012Inventors: Michael C. Shebanow, Anjul Patney