Patents by Inventor Michael C Shebanow
Michael C Shebanow has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20120281004Abstract: A technique for caching coverage information for edges that are shared between adjacent graphics primitives may reduce the number of times a shared edge is rasterized. Consequently, power consumed during rasterization may be reduced. During rasterization of a first graphics primitive coverage information is generated that (1) indicates cells within a sampling grid that are entirely outside an edge of the first graphics primitive and (2) indicates cells within the sampling grid that are intersected by the edge and are only partially covered by the first graphics primitive. The coverage information for the edge is stored in a cache. When a second graphics primitive is rasterized that shares the edge with the first graphics primitive, the coverage information is read from the cache instead of being recomputed.Type: ApplicationFiled: May 2, 2012Publication date: November 8, 2012Inventors: Michael C. Shebanow, Anjul Patney
-
Patent number: 8223158Abstract: A method and system for connecting multiple shaders are disclosed. Specifically, one embodiment of the present invention sets forth a method, which includes the steps of configuring a set of shaders in a user-defined sequence within a modular pipeline (MPipe), allocating resources to execute the programming instructions of each of the set of shaders in the user-defined sequence to operate on the data unit, and directing the output of the MPipe to an external sink.Type: GrantFiled: December 19, 2006Date of Patent: July 17, 2012Assignee: NVIDIA CorporationInventors: John Erik Lindholm, Michael C. Shebanow, Jerome F. Duluk, Jr.
-
Patent number: 8127181Abstract: Processing units are configured to capture the unit state in unit level error status registers when a runtime error event is detected in order to facilitate debugging of runtime errors. The reporting of warnings may be disabled or enabled to selectively monitor each processing unit. Warnings for each processing unit are propagated to an exception register in a front end monitoring unit. The warnings are then aggregated and propagated to an interrupt register in a front end monitoring unit in order to selectively generate an interrupt and facilitate debugging. A debugging application may be used to query the interrupt, exception, and unit level error status registers to determine the cause of the error. A default error handling behavior that overrides error conditions may be used in conjunction with the hardware warning protocol to allow the processing units to continue operating and facilitate in the debug of runtime errors.Type: GrantFiled: November 2, 2007Date of Patent: February 28, 2012Assignee: NVIDIA CorporationInventors: Michael C. Shebanow, John S. Montrym, Richard A. Silkebakken, Robert C. Keller
-
Patent number: 8019978Abstract: A unit status reporting protocol may also be used for context switching, debugging, and removing deadlock conditions in a processing unit. A processing unit is in one of five states: empty, active, stalled, quiescent, and halted. The state that a processing unit is in is reported to a front end monitoring unit to enable the front end monitoring unit to determine when a context switch may be performed or when a deadlock condition exists. The front end monitoring unit can issue a halt command to perform a context switch or take action to remove a deadlock condition and allow processing to resume.Type: GrantFiled: August 13, 2007Date of Patent: September 13, 2011Assignee: NVIDIA CorporationInventors: Michael C. Shebanow, Robert C. Keller, Richard A. Silkebakken
-
Publication number: 20110141122Abstract: A technique for performing stream output operations in a parallel processing system is disclosed. A stream synchronization unit is provided that enables the parallel processing unit to track batches of vertices being processed in a graphics processing pipeline. A plurality of stream output units is also provided, where each stream output unit writes vertex attribute data to one or more stream output buffers for a portion of the batches of vertices. A messaging protocol is implemented between the stream synchronization unit and the plurality of stream output units that ensures that each of the stream output units writes vertex attribute data for the particular batch of vertices distributed to that particular stream output unit in the same order in the stream output buffers as the order in which the batch of vertices was received from a device driver by the parallel processing unit.Type: ApplicationFiled: September 29, 2010Publication date: June 16, 2011Inventors: Ziyad S. Hakura, Rohit Gupta, Michael C. Shebanow, Emmett M. Kilgariff
-
Publication number: 20110078427Abstract: A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.Type: ApplicationFiled: September 29, 2009Publication date: March 31, 2011Inventors: Michael C. Shebanow, Jack Choquette, Brett W. Coon, Steven J. Heinrich, Aravind Kalaiah, John R. Nickolls, Daniel Salinas, Ming Y. Siu, Tommy Thorn, Nicholas Wang
-
Publication number: 20110078358Abstract: One embodiment of the present invention sets forth a technique for computing virtual addresses for accessing thread data. Components of the complete virtual address for a thread group are used to determine whether or not a cache line corresponding to the complete virtual address is not allocated in the cache. Actual computation of the complete virtual address is deferred until after determining that a cache line corresponding to the complete virtual address is not allocated in the cache.Type: ApplicationFiled: August 17, 2010Publication date: March 31, 2011Inventor: Michael C. Shebanow
-
Publication number: 20110078692Abstract: One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.Type: ApplicationFiled: September 21, 2010Publication date: March 31, 2011Inventors: John R. NICKOLLS, Steven James Heinrich, Brett W. Coon, Michael C. Shebanow
-
Publication number: 20110078689Abstract: A method for thread address mapping in a parallel thread processor. The method includes receiving a thread address associated with a first thread in a thread group; computing an effective address based on a location of the thread address within a local window of a thread address space; computing a thread group address in an address space associated with the thread group based on the effective address and a thread identifier associated with a first thread; and computing a virtual address associated with the first thread based on the thread group address and a thread group identifier, where the virtual address is used to access a location in a memory associated with the thread address to load or store data.Type: ApplicationFiled: September 24, 2010Publication date: March 31, 2011Inventors: Michael C. SHEBANOW, Yan Yan Tang, John R. Nickolls
-
Patent number: 7916146Abstract: In a processing pipeline having a plurality of units, an interface unit is provided between a first, upstream pipeline unit that needs to be drained prior to a context switch and a second, downstream pipeline unit that might halt prior to a context switch. The interface unit redirects data that are drained from the first pipeline unit and to be received by the second pipeline unit, to a buffer memory provided in the front end of the processing pipeline. The contents of the buffer memory are subsequently dumped into memory reserved for the context that is being stored. When the processing pipeline is restored with this context, the data that were dumped into memory are retrieved back into the buffer memory and provided to the interface unit. The interface unit receives these commands and directs them to the second pipeline unit.Type: GrantFiled: December 2, 2005Date of Patent: March 29, 2011Assignee: NVIDIA CorporationInventors: Robert C. Keller, Michael C. Shebanow, Makarand M. Dharmapurikar
-
Publication number: 20110072243Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.Type: ApplicationFiled: September 3, 2010Publication date: March 24, 2011Inventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
-
Publication number: 20110072213Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.Type: ApplicationFiled: September 22, 2010Publication date: March 24, 2011Inventors: John R. NICKOLLS, Brett W. Coon, Michael C. Shebanow
-
Publication number: 20100095117Abstract: One embodiment takes the form of a method for authenticating an identity of a first party to a second party, without any prior contact between the parties. Further, the first party may authenticate its identity to the second party while eliminating the ability of the second party to steal the first party's identity. A trusted authority may facilitate authenticating the identity of two or more communicating parties. In one embodiment, the authority may ensure the validity of the identification of a number of parties talking over a communications network. The parties communicating over the secure network trust what the authority states concerning the identities of the other parties in the network. Another embodiment may prevent the authority from monitoring which two parties are communicating to each other through the network.Type: ApplicationFiled: October 15, 2008Publication date: April 15, 2010Inventor: Michael C. Shebanow
-
Patent number: 7627723Abstract: Methods, apparatuses, and systems are presented for updating data in memory while executing multiple threads of instructions, involving receiving a single instruction from one of a plurality of concurrently executing threads of instructions, in response to the single instruction received, reading data from a specific memory location, performing an operation involving the data read from the memory location to generate a result, and storing the result to the specific memory location, without requiring separate load and store instructions, and in response to the single instruction received, precluding another one of the plurality of threads of instructions from altering data at the specific memory location while reading of the data from the specific memory location, performing the operation involving the data, and storing the result to the specific memory location.Type: GrantFiled: September 21, 2006Date of Patent: December 1, 2009Assignee: NVIDIA CorporationInventors: Ian A. Buck, John R. Nickolls, Michael C. Shebanow, Lars S. Nyland
-
Patent number: 7512773Abstract: A halt sequencing protocol permits a context switch to occur in a processing pipeline even before all units of the processing pipeline are idle. The context switch method based on the halt sequencing protocol includes the steps of issuing a halt request signal to the units of a processing pipeline, monitoring the status of each of the units, and freezing the states of all of the units when they are either idle or halted. Then, the states of the units, which pertain to the thread that has been halted, are dumped into memory, and the units are restored with states corresponding to a different thread that is to be executed after the context switch.Type: GrantFiled: October 18, 2005Date of Patent: March 31, 2009Assignee: NVIDIA CorporationInventors: Michael C. Shebanow, Robert C. Keller, Richard A. Silkebakken, Benjamin J. Garlick
-
Patent number: 7293162Abstract: A scheduling scheme and mechanism for a processor system is disclosed. The scheduling scheme provides a reservation station system that includes a control reservation station and a data reservation station. The reservation station system receives an operational entry and for each operational entry it identifies scheduling state information, operand state information, and operand information. The reservation station system stores the scheduling state information and operand information as a control reservation station entry in the control reservation station and stores the operating state information and the operand information as a data reservation station entry in the data reservation station. When control reservation station entries are identified as ready, they are scheduled and issued for execution by a functional unit.Type: GrantFiled: December 18, 2002Date of Patent: November 6, 2007Assignee: Fujitsu LimitedInventors: Michael C Shebanow, Michael G Butler
-
Publication number: 20040123077Abstract: A scheduling scheme and mechanism for a processor system is disclosed. The scheduling scheme provides a reservation station system that includes a control reservation station and a data reservation station. The reservation station system receives an operational entry and for each operational entry it identifies scheduling state information, operand state information, and operand information. The reservation station system stores the scheduling state information and operand information as a control reservation station entry in the control reservation station and stores the operating state information and the operand information as a data reservation station entry in the data reservation station. When control reservation station entries are identified as ready, they are scheduled and issued for execution by a functional unit.Type: ApplicationFiled: December 18, 2002Publication date: June 24, 2004Inventors: Michael C. Shebanow, Michael G. Butler
-
Publication number: 20040123298Abstract: A system and method for performing dynamic resource allocation. A deallocation block sends batons to an allocation block representing assigned resources. The allocation block receives the assigned resources and, if needed, allocates the assigned resources to an execution machine that preforms tasks such as executing instructions. The deallocation block continually sends batons independent of the allocation block's current need for resources. The deallocation returns unused batons or sends used an indication of used batons to the deallocation block. The deallocation block is physically decoupled and distributed from the allocation block.Type: ApplicationFiled: June 11, 2003Publication date: June 24, 2004Inventor: Michael C. Shebanow
-
Patent number: 5966530Abstract: A high-performance processor is disclosed with structure and methods for: (1) aggressively scheduling long latency instructions including load/store instructions while maintaining precise state; (2) maintaining and restoring state at any instruction boundary; (3) tracking instruction status; (4) checkpointing instructions; (5) creating, maintaining, and using a time-out checkpoint; (6) tracking floating-point exceptions; (7) creating, maintaining, and using a watchpoint for plural, simultaneous, unresolved-branch evaluation; and (9) increasing processor throughput while maintaining precise state. In one embodiment of the invention, a method of restoring machine state in a processor at any instruction boundary is disclosed. For any instruction which may modify control registers, the processor is either synchronized prior to execution or an instruction checkpoint is stored to preserve state; and for any instruction that creates a program counter discontinuity an instruction checkpoint is stored.Type: GrantFiled: June 11, 1997Date of Patent: October 12, 1999Assignee: Fujitsu, Ltd.Inventors: Gene W. Shen, John Szeto, Niteen A. Patkar, Michael C. Shebanow
-
Patent number: 5896526Abstract: A system and method providing a programmable hardware device within a CPU. The programmable hardware device permits a plurality of instructions to be trapped before they are executed. The instructions that are to be trapped are programmable to provide flexibility during CPU debugging and to ensure that a variety of application programs can be properly executed by the CPU. The system must also provide a means for permitting a trapped instruction to be emulated and/or to be executed serially.Type: GrantFiled: February 18, 1998Date of Patent: April 20, 1999Assignee: Fujitsu, Ltd.Inventors: Sunil Savkar, Gene W. Shen, Farnad Sajjadian, Michael C. Shebanow