Patents by Inventor Michael C Shebanow

Michael C Shebanow has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Instructions for managing a parallel cache hierarchy

Patent number: 10365930

Abstract: A technique for managing a parallel cache hierarchy that includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.

Type: Grant

Filed: May 1, 2017

Date of Patent: July 30, 2019

Assignee: NVIDIA CORPORATION

Inventors: John R. Nickolls, Brett W. Coon, Michael C. Shebanow
Computing tessellation coordinates using dedicated hardware

Patent number: 9922457

Abstract: A system and method for performing tessellation of three-dimensional surface patches performs some tessellation operations using programmable processing units and other tessellation operations using fixed function units with limited precision. (u,v) parameter coordinates for each vertex are computed using fixed function units to offload programmable processing engines. The (u,v) computation is a symmetric operation and is based on integer coordinates of the vertex, tessellation level of detail values, and a spacing mode.

Type: Grant

Filed: December 2, 2013

Date of Patent: March 20, 2018

Assignee: NVIDIA CORPORATION

Inventors: Justin S. Legakis, Emmett M. Kilgariff, Michael C. Shebanow
Tree-based thread management

Patent number: 9830161

Abstract: In one embodiment of the present invention, a streaming multiprocessor (SM) uses a tree of nodes to manage threads. Each node specifies a set of active threads and a program counter. Upon encountering a conditional instruction that causes an execution path to diverge, the SM creates child nodes corresponding to each of the divergent execution paths. Based on the conditional instruction, the SM assigns each active thread included in the parent node to at most one child node, and the SM temporarily discontinues executing instructions specified by the parent node. Instead, the SM concurrently executes instructions specified by the child nodes. After all the divergent paths reconverge to the parent path, the SM resumes executing instructions specified by the parent node. Advantageously, the disclosed techniques enable the SM to execute divergent paths in parallel, thereby reducing undesirable program behavior associated with conventional techniques that serialize divergent paths across thread groups.

Type: Grant

Filed: January 21, 2014

Date of Patent: November 28, 2017

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Michael C. Shebanow
INSTRUCTIONS FOR MANAGING A PARALLEL CACHE HIERARCHY

Publication number: 20170235581

Abstract: A technique for managing a parallel cache hierarchy that includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.

Type: Application

Filed: May 1, 2017

Publication date: August 17, 2017

Inventors: John R. NICKOLLS, Brett W. Coon, Michael C. Shebanow
Instructions for managing a parallel cache hierarchy

Patent number: 9639479

Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.

Type: Grant

Filed: September 22, 2010

Date of Patent: May 2, 2017

Assignee: NVIDIA Corporation

Inventors: John R. Nickolls, Brett W. Coon, Michael C. Shebanow
Distributing primitives to multiple rasterizers

Patent number: 9536341

Abstract: One embodiment of the present invention sets forth a technique for parallel distribution of primitives to multiple rasterizers. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives from the multiple geometry units concurrently to multiple rasterizers at rates of multiple primitives per clock. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.

Type: Grant

Filed: October 19, 2009

Date of Patent: January 3, 2017

Assignee: NVIDIA Corporation

Inventors: Johnny S. Rhoades, Emmett M. Kilgariff, Michael C. Shebanow, Ziyad S. Hakura, Dale L. Kirkland, James Daniel Kelly
Coalescing memory barrier operations across multiple parallel threads

Patent number: 9223578

Abstract: One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.

Type: Grant

Filed: September 21, 2010

Date of Patent: December 29, 2015

Assignee: NVIDIA Corporation

Inventors: John R. Nickolls, Steven James Heinrich, Brett W. Coon, Michael C. Shebanow
TREE-BASED THREAD MANAGEMENT

Publication number: 20150205606

Abstract: In one embodiment of the present invention, a streaming multiprocessor (SM) uses a tree of nodes to manage threads. Each node specifies a set of active threads and a program counter. Upon encountering a conditional instruction that causes an execution path to diverge, the SM creates child nodes corresponding to each of the divergent execution paths. Based on the conditional instruction, the SM assigns each active thread included in the parent node to at most one child node, and the SM temporarily discontinues executing instructions specified by the parent node. Instead, the SM concurrently executes instructions specified by the child nodes. After all the divergent paths reconverge to the parent path, the SM resumes executing instructions specified by the parent node. Advantageously, the disclosed techniques enable the SM to execute divergent paths in parallel, thereby reducing undesirable program behavior associated with conventional techniques that serialize divergent paths across thread groups.

Type: Application

Filed: January 21, 2014

Publication date: July 23, 2015

Applicant: NVIDIA CORPORATION

Inventors: John Erik LINDHOLM, Michael C. SHEBANOW
VIRTUALIZING STORAGE STRUCTURES WITH UNIFIED HEAP ARCHITECTURE

Publication number: 20150187043

Abstract: A method for storage allocation for a graphical processing unit includes maintaining a unified storage structure for the graphical processing unit. Multiple physical storage structures are virtualized in the unified storage structure by dynamically forming multiple logical storage structures from the unified storage structure for the multiple physical storage structures.

Type: Application

Filed: December 27, 2013

Publication date: July 2, 2015

Applicant: SAMSUNG ELECTRONICS COMPANY, LTD.

Inventors: Michael C. Shebanow, MAGNUS EKMAN
Distributed vertex attribute fetch

Patent number: 8947444

Abstract: A data structure that includes pointers to vertex attributes and primitive descriptions is generated and then processed within a general processing cluster. The general processing cluster includes a vertex attribute fetch unit that fetches from memory vertex attributes corresponding to the vertices defined by the primitive descriptions.

Type: Grant

Filed: December 9, 2008

Date of Patent: February 3, 2015

Assignee: NVIDIA Corporation

Inventors: Ziyad S. Hakura, Emmett M. Kilgariff, Michael C. Shebanow, James C. Bowman, Philip Browning Johnson, Johnny S. Rhoades, Rohit Gupta
Coverage caching

Patent number: 8860742

Abstract: A technique for caching coverage information for edges that are shared between adjacent graphics primitives may reduce the number of times a shared edge is rasterized. Consequently, power consumed during rasterization may be reduced. During rasterization of a first graphics primitive coverage information is generated that (1) indicates cells within a sampling grid that are entirely outside an edge of the first graphics primitive and (2) indicates cells within the sampling grid that are intersected by the edge and are only partially covered by the first graphics primitive. The coverage information for the edge is stored in a cache. When a second graphics primitive is rasterized that shares the edge with the first graphics primitive, the coverage information is read from the cache instead of being recomputed.

Type: Grant

Filed: May 2, 2012

Date of Patent: October 14, 2014

Assignee: NVIDIA Corporation

Inventors: Michael C. Shebanow, Anjul Patney
Distributed stream output in a parallel processing unit

Patent number: 8817031

Abstract: A technique for performing stream output operations in a parallel processing system is disclosed. A stream synchronization unit is provided that enables the parallel processing unit to track batches of vertices being processed in a graphics processing pipeline. A plurality of stream output units is also provided, where each stream output unit writes vertex attribute data to one or more stream output buffers for a portion of the batches of vertices. A messaging protocol is implemented between the stream synchronization unit and the plurality of stream output units that ensures that each of the stream output units writes vertex attribute data for the particular batch of vertices distributed to that particular stream output unit in the same order in the stream output buffers as the order in which the batch of vertices was received from a device driver by the parallel processing unit.

Type: Grant

Filed: September 29, 2010

Date of Patent: August 26, 2014

Assignee: NVIDIA Corporation

Inventors: Ziyad S. Hakura, Rohit Gupta, Michael C. Shebanow, Emmett M. Kilgariff
COMPUTING TESSELLATION COORDINATES USING DEDICATED HARDWARE

Publication number: 20140160126

Abstract: A system and method for performing tessellation of three-dimensional surface patches performs some tessellation operations using programmable processing units and other tessellation operations using fixed function units with limited precision. (u,v) parameter coordinates for each vertex are computed using fixed function units to offload programmable processing engines. The (u,v) computation is a symmetric operation and is based on integer coordinates of the vertex, tessellation level of detail values, and a spacing mode.

Type: Application

Filed: December 2, 2013

Publication date: June 12, 2014

Applicant: NVIDIA CORPORATION

Inventors: Justin S. LEGAKIS, Emmett M. KILGARIFF, Michael C. SHEBANOW
Distributing primitives to multiple rasterizers

Patent number: 8704836

Abstract: One embodiment of the present invention sets forth a technique for parallel distribution of primitives to multiple rasterizers. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives from the multiple geometry units concurrently to multiple rasterizers at rates of multiple primitives per clock. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.

Type: Grant

Filed: October 19, 2009

Date of Patent: April 22, 2014

Assignee: NVIDIA Corporation

Inventors: Johnny S. Rhoades, Steven E. Molnar, Emmett M. Kilgariff, Michael C. Shebanow, Ziyad S. Hakura, Dale L. Kirkland, James Daniel Kelly
Address mapping for a parallel thread processor

Patent number: 8700877

Abstract: A method for thread address mapping in a parallel thread processor. The method includes receiving a thread address associated with a first thread in a thread group; computing an effective address based on a location of the thread address within a local window of a thread address space; computing a thread group address in an address space associated with the thread group based on the effective address and a thread identifier associated with a first thread; and computing a virtual address associated with the first thread based on the thread group address and a thread group identifier, where the virtual address is used to access a location in a memory associated with the thread address to load or store data.

Type: Grant

Filed: September 24, 2010

Date of Patent: April 15, 2014

Assignee: Nvidia Corporation

Inventors: Michael C. Shebanow, Yan Yan Tang, John R. Nickolls
Computing tessellation coordinates using dedicated hardware

Patent number: 8599202

Abstract: A system and method for performing tessellation of three-dimensional surface patches performs some tessellation operations using programmable processing units and other tessellation operations using fixed function units with limited precision. (u,v) parameter coordinates for each vertex are computed using fixed function units to offload programmable processing engines. The (u,v) computation is a symmetric operation and is based on integer coordinates of the vertex, tessellation level of detail values, and a spacing mode.

Type: Grant

Filed: September 29, 2008

Date of Patent: December 3, 2013

Assignee: Nvidia Corporation

Inventors: Justin S. Legakis, Emmett M. Kilgariff, Michael C. Shebanow
Reordering operands assigned to each one of read request ports concurrently accessing multibank register file to avoid bank conflict

Patent number: 8533435

Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.

Type: Grant

Filed: September 3, 2010

Date of Patent: September 10, 2013

Assignee: NVIDIA Corporation

Inventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
Trap handler architecture for a parallel processing unit

Patent number: 8522000

Abstract: A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.

Type: Grant

Filed: September 29, 2009

Date of Patent: August 27, 2013

Assignee: Nvidia Corporation

Inventors: Michael C. Shebanow, Jack Choquette, Brett W. Coon, Steven J. Heinrich, Aravind Kalaiah, John R. Nickolls, Daniel Salinas, Ming Y. Siu, Tommy Thorn, Nicholas Wang
Deferred complete virtual address computation for local memory space requests

Patent number: 8458440

Abstract: One embodiment of the present invention sets forth a technique for computing virtual addresses for accessing thread data. Components of the complete virtual address for a thread group are used to determine whether or not a cache line corresponding to the complete virtual address is not allocated in the cache. Actual computation of the complete virtual address is deferred until after determining that a cache line corresponding to the complete virtual address is not allocated in the cache.

Type: Grant

Filed: August 17, 2010

Date of Patent: June 4, 2013

Assignee: NVIDIA Corporation

Inventor: Michael C. Shebanow
GRID WALK SAMPLING

Publication number: 20120280992

Abstract: The grid walk sampling technique is an efficient sampling algorithm aimed at optimizing the cost of triangle rasterization for modern graphics workloads. Grid walk sampling is an iterative rasterization algorithm that intelligently tests the intersection of triangle edges with multi-cell grids, determining coverage for a grid cell while identifying other cells in the grid that are either fully covered or fully uncovered by the triangle. Grid walk sampling rasterizes triangles using fewer computations and simpler computations compared with conventional highly parallel rasterizers. Therefore, a rasterizer employing grid walk sampling may compute sample coverage of triangles more efficiently in terms of power and circuitry die area compared with conventional highly parallel rasterizers.

Type: Application

Filed: May 1, 2012

Publication date: November 8, 2012

Inventors: Michael C. Shebanow, Anjul Patney

1 2 3 next