Patents by Inventor Michael C Shebanow

Michael C Shebanow has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20120281004
    Abstract: A technique for caching coverage information for edges that are shared between adjacent graphics primitives may reduce the number of times a shared edge is rasterized. Consequently, power consumed during rasterization may be reduced. During rasterization of a first graphics primitive coverage information is generated that (1) indicates cells within a sampling grid that are entirely outside an edge of the first graphics primitive and (2) indicates cells within the sampling grid that are intersected by the edge and are only partially covered by the first graphics primitive. The coverage information for the edge is stored in a cache. When a second graphics primitive is rasterized that shares the edge with the first graphics primitive, the coverage information is read from the cache instead of being recomputed.
    Type: Application
    Filed: May 2, 2012
    Publication date: November 8, 2012
    Inventors: Michael C. Shebanow, Anjul Patney
  • Patent number: 8223158
    Abstract: A method and system for connecting multiple shaders are disclosed. Specifically, one embodiment of the present invention sets forth a method, which includes the steps of configuring a set of shaders in a user-defined sequence within a modular pipeline (MPipe), allocating resources to execute the programming instructions of each of the set of shaders in the user-defined sequence to operate on the data unit, and directing the output of the MPipe to an external sink.
    Type: Grant
    Filed: December 19, 2006
    Date of Patent: July 17, 2012
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Michael C. Shebanow, Jerome F. Duluk, Jr.
  • Patent number: 8127181
    Abstract: Processing units are configured to capture the unit state in unit level error status registers when a runtime error event is detected in order to facilitate debugging of runtime errors. The reporting of warnings may be disabled or enabled to selectively monitor each processing unit. Warnings for each processing unit are propagated to an exception register in a front end monitoring unit. The warnings are then aggregated and propagated to an interrupt register in a front end monitoring unit in order to selectively generate an interrupt and facilitate debugging. A debugging application may be used to query the interrupt, exception, and unit level error status registers to determine the cause of the error. A default error handling behavior that overrides error conditions may be used in conjunction with the hardware warning protocol to allow the processing units to continue operating and facilitate in the debug of runtime errors.
    Type: Grant
    Filed: November 2, 2007
    Date of Patent: February 28, 2012
    Assignee: NVIDIA Corporation
    Inventors: Michael C. Shebanow, John S. Montrym, Richard A. Silkebakken, Robert C. Keller
  • Patent number: 8019978
    Abstract: A unit status reporting protocol may also be used for context switching, debugging, and removing deadlock conditions in a processing unit. A processing unit is in one of five states: empty, active, stalled, quiescent, and halted. The state that a processing unit is in is reported to a front end monitoring unit to enable the front end monitoring unit to determine when a context switch may be performed or when a deadlock condition exists. The front end monitoring unit can issue a halt command to perform a context switch or take action to remove a deadlock condition and allow processing to resume.
    Type: Grant
    Filed: August 13, 2007
    Date of Patent: September 13, 2011
    Assignee: NVIDIA Corporation
    Inventors: Michael C. Shebanow, Robert C. Keller, Richard A. Silkebakken
  • Publication number: 20110141122
    Abstract: A technique for performing stream output operations in a parallel processing system is disclosed. A stream synchronization unit is provided that enables the parallel processing unit to track batches of vertices being processed in a graphics processing pipeline. A plurality of stream output units is also provided, where each stream output unit writes vertex attribute data to one or more stream output buffers for a portion of the batches of vertices. A messaging protocol is implemented between the stream synchronization unit and the plurality of stream output units that ensures that each of the stream output units writes vertex attribute data for the particular batch of vertices distributed to that particular stream output unit in the same order in the stream output buffers as the order in which the batch of vertices was received from a device driver by the parallel processing unit.
    Type: Application
    Filed: September 29, 2010
    Publication date: June 16, 2011
    Inventors: Ziyad S. Hakura, Rohit Gupta, Michael C. Shebanow, Emmett M. Kilgariff
  • Publication number: 20110078427
    Abstract: A trap handler architecture is incorporated into a parallel processing subsystem such as a GPU. The trap handler architecture minimizes design complexity and verification efforts for concurrently executing threads by imposing a property that all thread groups associated with a streaming multi-processor are either all executing within their respective code segments or are all executing within the trap handler code segment.
    Type: Application
    Filed: September 29, 2009
    Publication date: March 31, 2011
    Inventors: Michael C. Shebanow, Jack Choquette, Brett W. Coon, Steven J. Heinrich, Aravind Kalaiah, John R. Nickolls, Daniel Salinas, Ming Y. Siu, Tommy Thorn, Nicholas Wang
  • Publication number: 20110078358
    Abstract: One embodiment of the present invention sets forth a technique for computing virtual addresses for accessing thread data. Components of the complete virtual address for a thread group are used to determine whether or not a cache line corresponding to the complete virtual address is not allocated in the cache. Actual computation of the complete virtual address is deferred until after determining that a cache line corresponding to the complete virtual address is not allocated in the cache.
    Type: Application
    Filed: August 17, 2010
    Publication date: March 31, 2011
    Inventor: Michael C. Shebanow
  • Publication number: 20110078692
    Abstract: One embodiment of the present invention sets forth a technique for coalescing memory barrier operations across multiple parallel threads. Memory barrier requests from a given parallel thread processing unit are coalesced to reduce the impact to the rest of the system. Additionally, memory barrier requests may specify a level of a set of threads with respect to which the memory transactions are committed. For example, a first type of memory barrier instruction may commit the memory transactions to a level of a set of cooperating threads that share an L1 (level one) cache. A second type of memory barrier instruction may commit the memory transactions to a level of a set of threads sharing a global memory. Finally, a third type of memory barrier instruction may commit the memory transactions to a system level of all threads sharing all system memories. The latency required to execute the memory barrier instruction varies based on the type of memory barrier instruction.
    Type: Application
    Filed: September 21, 2010
    Publication date: March 31, 2011
    Inventors: John R. NICKOLLS, Steven James Heinrich, Brett W. Coon, Michael C. Shebanow
  • Publication number: 20110078689
    Abstract: A method for thread address mapping in a parallel thread processor. The method includes receiving a thread address associated with a first thread in a thread group; computing an effective address based on a location of the thread address within a local window of a thread address space; computing a thread group address in an address space associated with the thread group based on the effective address and a thread identifier associated with a first thread; and computing a virtual address associated with the first thread based on the thread group address and a thread group identifier, where the virtual address is used to access a location in a memory associated with the thread address to load or store data.
    Type: Application
    Filed: September 24, 2010
    Publication date: March 31, 2011
    Inventors: Michael C. SHEBANOW, Yan Yan Tang, John R. Nickolls
  • Patent number: 7916146
    Abstract: In a processing pipeline having a plurality of units, an interface unit is provided between a first, upstream pipeline unit that needs to be drained prior to a context switch and a second, downstream pipeline unit that might halt prior to a context switch. The interface unit redirects data that are drained from the first pipeline unit and to be received by the second pipeline unit, to a buffer memory provided in the front end of the processing pipeline. The contents of the buffer memory are subsequently dumped into memory reserved for the context that is being stored. When the processing pipeline is restored with this context, the data that were dumped into memory are retrieved back into the buffer memory and provided to the interface unit. The interface unit receives these commands and directs them to the second pipeline unit.
    Type: Grant
    Filed: December 2, 2005
    Date of Patent: March 29, 2011
    Assignee: NVIDIA Corporation
    Inventors: Robert C. Keller, Michael C. Shebanow, Makarand M. Dharmapurikar
  • Publication number: 20110072243
    Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.
    Type: Application
    Filed: September 3, 2010
    Publication date: March 24, 2011
    Inventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
  • Publication number: 20110072213
    Abstract: A method for managing a parallel cache hierarchy in a processing unit. The method includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.
    Type: Application
    Filed: September 22, 2010
    Publication date: March 24, 2011
    Inventors: John R. NICKOLLS, Brett W. Coon, Michael C. Shebanow
  • Publication number: 20100095117
    Abstract: One embodiment takes the form of a method for authenticating an identity of a first party to a second party, without any prior contact between the parties. Further, the first party may authenticate its identity to the second party while eliminating the ability of the second party to steal the first party's identity. A trusted authority may facilitate authenticating the identity of two or more communicating parties. In one embodiment, the authority may ensure the validity of the identification of a number of parties talking over a communications network. The parties communicating over the secure network trust what the authority states concerning the identities of the other parties in the network. Another embodiment may prevent the authority from monitoring which two parties are communicating to each other through the network.
    Type: Application
    Filed: October 15, 2008
    Publication date: April 15, 2010
    Inventor: Michael C. Shebanow
  • Patent number: 7627723
    Abstract: Methods, apparatuses, and systems are presented for updating data in memory while executing multiple threads of instructions, involving receiving a single instruction from one of a plurality of concurrently executing threads of instructions, in response to the single instruction received, reading data from a specific memory location, performing an operation involving the data read from the memory location to generate a result, and storing the result to the specific memory location, without requiring separate load and store instructions, and in response to the single instruction received, precluding another one of the plurality of threads of instructions from altering data at the specific memory location while reading of the data from the specific memory location, performing the operation involving the data, and storing the result to the specific memory location.
    Type: Grant
    Filed: September 21, 2006
    Date of Patent: December 1, 2009
    Assignee: NVIDIA Corporation
    Inventors: Ian A. Buck, John R. Nickolls, Michael C. Shebanow, Lars S. Nyland
  • Patent number: 7512773
    Abstract: A halt sequencing protocol permits a context switch to occur in a processing pipeline even before all units of the processing pipeline are idle. The context switch method based on the halt sequencing protocol includes the steps of issuing a halt request signal to the units of a processing pipeline, monitoring the status of each of the units, and freezing the states of all of the units when they are either idle or halted. Then, the states of the units, which pertain to the thread that has been halted, are dumped into memory, and the units are restored with states corresponding to a different thread that is to be executed after the context switch.
    Type: Grant
    Filed: October 18, 2005
    Date of Patent: March 31, 2009
    Assignee: NVIDIA Corporation
    Inventors: Michael C. Shebanow, Robert C. Keller, Richard A. Silkebakken, Benjamin J. Garlick
  • Patent number: 7293162
    Abstract: A scheduling scheme and mechanism for a processor system is disclosed. The scheduling scheme provides a reservation station system that includes a control reservation station and a data reservation station. The reservation station system receives an operational entry and for each operational entry it identifies scheduling state information, operand state information, and operand information. The reservation station system stores the scheduling state information and operand information as a control reservation station entry in the control reservation station and stores the operating state information and the operand information as a data reservation station entry in the data reservation station. When control reservation station entries are identified as ready, they are scheduled and issued for execution by a functional unit.
    Type: Grant
    Filed: December 18, 2002
    Date of Patent: November 6, 2007
    Assignee: Fujitsu Limited
    Inventors: Michael C Shebanow, Michael G Butler
  • Publication number: 20040123077
    Abstract: A scheduling scheme and mechanism for a processor system is disclosed. The scheduling scheme provides a reservation station system that includes a control reservation station and a data reservation station. The reservation station system receives an operational entry and for each operational entry it identifies scheduling state information, operand state information, and operand information. The reservation station system stores the scheduling state information and operand information as a control reservation station entry in the control reservation station and stores the operating state information and the operand information as a data reservation station entry in the data reservation station. When control reservation station entries are identified as ready, they are scheduled and issued for execution by a functional unit.
    Type: Application
    Filed: December 18, 2002
    Publication date: June 24, 2004
    Inventors: Michael C. Shebanow, Michael G. Butler
  • Publication number: 20040123298
    Abstract: A system and method for performing dynamic resource allocation. A deallocation block sends batons to an allocation block representing assigned resources. The allocation block receives the assigned resources and, if needed, allocates the assigned resources to an execution machine that preforms tasks such as executing instructions. The deallocation block continually sends batons independent of the allocation block's current need for resources. The deallocation returns unused batons or sends used an indication of used batons to the deallocation block. The deallocation block is physically decoupled and distributed from the allocation block.
    Type: Application
    Filed: June 11, 2003
    Publication date: June 24, 2004
    Inventor: Michael C. Shebanow
  • Patent number: 5966530
    Abstract: A high-performance processor is disclosed with structure and methods for: (1) aggressively scheduling long latency instructions including load/store instructions while maintaining precise state; (2) maintaining and restoring state at any instruction boundary; (3) tracking instruction status; (4) checkpointing instructions; (5) creating, maintaining, and using a time-out checkpoint; (6) tracking floating-point exceptions; (7) creating, maintaining, and using a watchpoint for plural, simultaneous, unresolved-branch evaluation; and (9) increasing processor throughput while maintaining precise state. In one embodiment of the invention, a method of restoring machine state in a processor at any instruction boundary is disclosed. For any instruction which may modify control registers, the processor is either synchronized prior to execution or an instruction checkpoint is stored to preserve state; and for any instruction that creates a program counter discontinuity an instruction checkpoint is stored.
    Type: Grant
    Filed: June 11, 1997
    Date of Patent: October 12, 1999
    Assignee: Fujitsu, Ltd.
    Inventors: Gene W. Shen, John Szeto, Niteen A. Patkar, Michael C. Shebanow
  • Patent number: 5896526
    Abstract: A system and method providing a programmable hardware device within a CPU. The programmable hardware device permits a plurality of instructions to be trapped before they are executed. The instructions that are to be trapped are programmable to provide flexibility during CPU debugging and to ensure that a variety of application programs can be properly executed by the CPU. The system must also provide a means for permitting a trapped instruction to be emulated and/or to be executed serially.
    Type: Grant
    Filed: February 18, 1998
    Date of Patent: April 20, 1999
    Assignee: Fujitsu, Ltd.
    Inventors: Sunil Savkar, Gene W. Shen, Farnad Sajjadian, Michael C. Shebanow