Patents by Inventor Stuart F. Oberman

Stuart F. Oberman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Dynamic load balancing of instructions for execution by heterogeneous processing engines

Patent number: 8578387

Abstract: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.

Type: Grant

Filed: July 31, 2007

Date of Patent: November 5, 2013

Assignee: Nvidia Corporation

Inventors: Peter C. Mills, Stuart F. Oberman, John Erik Lindholm, Samuel Liu
Reordering operands assigned to each one of read request ports concurrently accessing multibank register file to avoid bank conflict

Patent number: 8533435

Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.

Type: Grant

Filed: September 3, 2010

Date of Patent: September 10, 2013

Assignee: NVIDIA Corporation

Inventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
Programmable graphics processor for multithreaded execution of programs

Patent number: 8405665

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Type: Grant

Filed: May 7, 2012

Date of Patent: March 26, 2013

Assignee: Nvidia Corporation

Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS

Publication number: 20120218267

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Type: Application

Filed: May 7, 2012

Publication date: August 30, 2012

Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
SHARED SINGLE-ACCESS MEMORY WITH MANAGEMENT OF MULTIPLE PARALLEL REQUESTS

Publication number: 20120221808

Abstract: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

Type: Application

Filed: May 7, 2012

Publication date: August 30, 2012

Applicant: NVIDIA Corporation

Inventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
Scoreboard having size indicators for tracking sequential destination register usage in a multi-threaded processor

Patent number: 8225076

Abstract: A scoreboard memory for a processing unit has separate memory regions allocated to each of the multiple threads to be processed. For each thread, the scoreboard memory stores register identifiers of registers that have pending writes. When an instruction is added to an instruction buffer, the register identifiers of the registers specified in the instruction are compared with the register identifiers stored in the scoreboard memory for that instruction's thread, and a multi-bit value representing the comparison result is generated. The multi-bit value is stored with the instruction in the instruction buffer and may be updated as instructions belonging to the same thread complete their execution. Before the instruction is issued for execution, this multi-bit value is checked. If this multi-bit value indicates that none of the registers specified in the instruction have pending writes, the instruction is allowed to issue for execution.

Type: Grant

Filed: September 18, 2008

Date of Patent: July 17, 2012

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, Peter C. Mills, Stuart F. Oberman, Ming Y. Siu
Multipurpose arithmetic functional unit

Patent number: 8190669

Abstract: Multipurpose arithmetic functional units can perform planar attribute interpolation and unary function approximation operations. In one embodiment, planar interpolation operations for coordinates (x, y) are executed by computing A*x+B*y+C, and unary function approximation operations for operand x are executed by computing F2(xb)*xh2+F1(xb)*xh+F0(xb), where xh=x?xb. Shared multiplier and adder circuits are advantageously used to implement the product and sum operations for both classes of operations.

Type: Grant

Filed: October 20, 2004

Date of Patent: May 29, 2012

Assignee: NVIDIA Corporation

Inventors: Stuart F. Oberman, Ming Y. Siu
Shared single-access memory with management of multiple parallel requests

Patent number: 8176265

Abstract: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

Type: Grant

Filed: June 21, 2011

Date of Patent: May 8, 2012

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
Programmable graphics processor for multithreaded execution of programs

Patent number: 8174531

Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Type: Grant

Filed: December 29, 2009

Date of Patent: May 8, 2012

Assignee: NVIDIA Corporation

Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
Shared memory with parallel access and access conflict resolution mechanism

Patent number: 8108625

Abstract: Concurrent threads in a multithreaded processor share access to a memory, with any location in the shared memory being accessible by any thread. In one embodiment, the shared memory has multiple independently-addressable memory banks, and one location per bank can be accessed in parallel. Parallel processing engines executing the threads generate a group of parallel memory access requests. Address conflict logic determines whether the requests can be satisfied in parallel (e.g., based on bank access constraints) and serializes the requests to the extent needed to avoid conflicts. In some embodiments, data read from one address in the shared memory can be broadcast to multiple processing engines.

Type: Grant

Filed: October 30, 2006

Date of Patent: January 31, 2012

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
SHARED SINGLE ACCESS MEMORY WITH MANAGEMENT OF MULTIPLE PARALLEL REQUESTS

Publication number: 20110252204

Abstract: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

Type: Application

Filed: June 21, 2011

Publication date: October 13, 2011

Applicant: NVIDIA Corporation

Inventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
Multipurpose functional unit with single-precision and double-precision operations

Patent number: 8037119

Abstract: A multipurpose arithmetic functional unit selectably performs planar attribute interpolation, unary function approximation, and double-precision arithmetic. In one embodiment, planar interpolation operations for coordinates (x, y) are executed by computing A*x+B*y+C, and unary function approximation operations for operand x are executed by computing F2(xb)*xh2+F1(xb)*xh+F0(xb), where xh=x?xb. Shared multiplier and adder circuits are advantageously used to implement the product and sum operations for unary function approximation and planar interpolation; the same multipliers and adders are also leveraged to implement double-precision multiplication and addition.

Type: Grant

Filed: February 21, 2006

Date of Patent: October 11, 2011

Assignee: NVIDIA Corporation

Inventors: Stuart F. Oberman, Ming Y. Siu
USING A PIXEL OFFSET FOR EVALUATING A PLANE EQUATION

Publication number: 20110081100

Abstract: One embodiment of the present invention sets forth a technique controlling the pixel location at which the plane equation is evaluated. Multiple pixel offsets (dx, dy) may be specified that each define to a sub-pixel sample position. Attributes are then calculated for each sub-pixel sample position that is covered by a geometric primitive. One advantage of the technique is that anti-aliasing quality may be improved since high frequency color components may be selectively supersampled for particular geometric primitives.

Type: Application

Filed: October 5, 2010

Publication date: April 7, 2011

Inventors: John Erik Lindholm, Henry Packard Moreton, Ming Y. Siu, Stuart F. Oberman
Credit-Based Streaming Multiprocessor Warp Scheduling

Publication number: 20110072244

Abstract: One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.

Type: Application

Filed: September 17, 2010

Publication date: March 24, 2011

Inventors: John Erik Lindholm, Brett W. Coon, Jered Wierzbicki, Robert J. Stoll, Stuart F. Oberman
Unified Collector Structure for Multi-Bank Register File

Publication number: 20110072243

Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.

Type: Application

Filed: September 3, 2010

Publication date: March 24, 2011

Inventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
Operand collector architecture

Patent number: 7834881

Abstract: An apparatus and method for simulating a multi-ported memory using lower port count memories as banks. A collector units gather source operands from the banks as needed to process program instructions. The collector units also gather constants that are used as operands. When all of the source operands needed to process a program instruction have been gathered, a collector unit outputs the source operands to an execution unit while avoiding writeback conflicts to registers specified by the program instruction that may be accessed by other execution units.

Type: Grant

Filed: November 1, 2006

Date of Patent: November 16, 2010

Assignee: NVIDIA Corporation

Inventors: Samuel Liu, John Erik Lindholm, Ming Y Siu, Brett W. Coon, Stuart F. Oberman
High jitter scheduling of interleaved frames in an arbitrated loop

Patent number: 7809852

Abstract: A system and method for converting low-jitter, interleaved frame traffic, such as that generated in an IP network, to high jitter traffic to improve the utilization of bandwidth on arbitrated loops such as Fibre Channel Arbitrated Loops. Embodiments of a high jitter scheduling algorithm may be used in devices such as network switches that interface an arbitrated loop with an IP network that carries low-jitter traffic. The high jitter algorithm may use a separate queue for each device on the arbitrated loop, or alternatively may use one queue for two or more devices. Incoming frames are distributed among the queues based upon each frame's destination device. The scheduling algorithm may then service the queues and forward queued frames to the devices from the queues. In one embodiment, the queues are serviced in a round-robin fashion. In one embodiment, each queue may be serviced for a programmed limit.

Type: Grant

Filed: May 22, 2002

Date of Patent: October 5, 2010

Assignee: Brocade Communications Systems, Inc.

Inventors: Rodney N. Mullendore, Stuart F. Oberman, Anil Mehta, Keith Schakel, Kamran Malik
Single interconnect providing read and write access to a memory shared by concurrent threads

Patent number: 7680988

Abstract: A shared memory is usable by concurrent threads in a multithreaded processor, with any addressable storage location in the shared memory being readable and writeable by any of the threads. Processing engines that execute the threads are coupled to the shared memory via an interconnect that transfers data in only one direction (e.g., from the shared memory to the processing engines); the same interconnect supports both read and write operations. The interconnect advantageously supports multiple parallel read or write operations.

Type: Grant

Filed: October 30, 2006

Date of Patent: March 16, 2010

Assignee: NVIDIA Corporation

Inventors: John R. Nickolls, Brett W. Coon, Ming Y. Siu, Stuart F. Oberman, Samuel Liu
Method and apparatus to ensure consistency of depth values computed in different sections of a graphics processor

Patent number: 7659893

Abstract: At least two different processing sections in a graphics processors compute Z coordinates for a sample location from a compressed Z representation. The processors are designed to ensure that Z coordinates computed in any unit in the processor are identical. In one embodiment, the respective arithmetic circuits included in each processing section that computes Z coordinates are “bit-identical,” meaning that, for any input planar Z representation and coordinates, the output Z coordinates produced by the circuits are identical to each other.

Type: Grant

Filed: October 2, 2006

Date of Patent: February 9, 2010

Assignee: NVIDIA Corporation

Inventors: Stuart F. Oberman, Steven E. Molnar, Alexander L. Minkin, Peter B. Holmqvist
Multipurpose arithmetic functional unit

Patent number: 7640285

Abstract: Multipurpose arithmetic functional units can perform planar attribute interpolation and unary function approximation operations. In one embodiment, planar interpolation operations for coordinates (x, y) are executed by computing A*x+B*y+C, and unary function approximation operations for operand x are executed by computing F2(xb)*xh2+F1(xb)*xh+F0(xb), where xh=x?xb. Shared multiplier and adder circuits are advantageously used to implement the product and sum operations for both classes of operations.

Type: Grant

Filed: October 20, 2004

Date of Patent: December 29, 2009

Assignee: NVIDIA Corporation

Inventors: Stuart F. Oberman, Ming Y. Siu

prev 1 2 3 4 5 next