Patents by Inventor Stuart F. Oberman

Stuart F. Oberman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8578387
    Abstract: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.
    Type: Grant
    Filed: July 31, 2007
    Date of Patent: November 5, 2013
    Assignee: Nvidia Corporation
    Inventors: Peter C. Mills, Stuart F. Oberman, John Erik Lindholm, Samuel Liu
  • Patent number: 8533435
    Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.
    Type: Grant
    Filed: September 3, 2010
    Date of Patent: September 10, 2013
    Assignee: NVIDIA Corporation
    Inventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
  • Patent number: 8405665
    Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.
    Type: Grant
    Filed: May 7, 2012
    Date of Patent: March 26, 2013
    Assignee: Nvidia Corporation
    Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
  • Publication number: 20120218267
    Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.
    Type: Application
    Filed: May 7, 2012
    Publication date: August 30, 2012
    Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
  • Publication number: 20120221808
    Abstract: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.
    Type: Application
    Filed: May 7, 2012
    Publication date: August 30, 2012
    Applicant: NVIDIA Corporation
    Inventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
  • Patent number: 8225076
    Abstract: A scoreboard memory for a processing unit has separate memory regions allocated to each of the multiple threads to be processed. For each thread, the scoreboard memory stores register identifiers of registers that have pending writes. When an instruction is added to an instruction buffer, the register identifiers of the registers specified in the instruction are compared with the register identifiers stored in the scoreboard memory for that instruction's thread, and a multi-bit value representing the comparison result is generated. The multi-bit value is stored with the instruction in the instruction buffer and may be updated as instructions belonging to the same thread complete their execution. Before the instruction is issued for execution, this multi-bit value is checked. If this multi-bit value indicates that none of the registers specified in the instruction have pending writes, the instruction is allowed to issue for execution.
    Type: Grant
    Filed: September 18, 2008
    Date of Patent: July 17, 2012
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, Peter C. Mills, Stuart F. Oberman, Ming Y. Siu
  • Patent number: 8190669
    Abstract: Multipurpose arithmetic functional units can perform planar attribute interpolation and unary function approximation operations. In one embodiment, planar interpolation operations for coordinates (x, y) are executed by computing A*x+B*y+C, and unary function approximation operations for operand x are executed by computing F2(xb)*xh2+F1(xb)*xh+F0(xb), where xh=x?xb. Shared multiplier and adder circuits are advantageously used to implement the product and sum operations for both classes of operations.
    Type: Grant
    Filed: October 20, 2004
    Date of Patent: May 29, 2012
    Assignee: NVIDIA Corporation
    Inventors: Stuart F. Oberman, Ming Y. Siu
  • Patent number: 8176265
    Abstract: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.
    Type: Grant
    Filed: June 21, 2011
    Date of Patent: May 8, 2012
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
  • Patent number: 8174531
    Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.
    Type: Grant
    Filed: December 29, 2009
    Date of Patent: May 8, 2012
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
  • Patent number: 8108625
    Abstract: Concurrent threads in a multithreaded processor share access to a memory, with any location in the shared memory being accessible by any thread. In one embodiment, the shared memory has multiple independently-addressable memory banks, and one location per bank can be accessed in parallel. Parallel processing engines executing the threads generate a group of parallel memory access requests. Address conflict logic determines whether the requests can be satisfied in parallel (e.g., based on bank access constraints) and serializes the requests to the extent needed to avoid conflicts. In some embodiments, data read from one address in the shared memory can be broadcast to multiple processing engines.
    Type: Grant
    Filed: October 30, 2006
    Date of Patent: January 31, 2012
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
  • Publication number: 20110252204
    Abstract: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.
    Type: Application
    Filed: June 21, 2011
    Publication date: October 13, 2011
    Applicant: NVIDIA Corporation
    Inventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
  • Patent number: 8037119
    Abstract: A multipurpose arithmetic functional unit selectably performs planar attribute interpolation, unary function approximation, and double-precision arithmetic. In one embodiment, planar interpolation operations for coordinates (x, y) are executed by computing A*x+B*y+C, and unary function approximation operations for operand x are executed by computing F2(xb)*xh2+F1(xb)*xh+F0(xb), where xh=x?xb. Shared multiplier and adder circuits are advantageously used to implement the product and sum operations for unary function approximation and planar interpolation; the same multipliers and adders are also leveraged to implement double-precision multiplication and addition.
    Type: Grant
    Filed: February 21, 2006
    Date of Patent: October 11, 2011
    Assignee: NVIDIA Corporation
    Inventors: Stuart F. Oberman, Ming Y. Siu
  • Publication number: 20110081100
    Abstract: One embodiment of the present invention sets forth a technique controlling the pixel location at which the plane equation is evaluated. Multiple pixel offsets (dx, dy) may be specified that each define to a sub-pixel sample position. Attributes are then calculated for each sub-pixel sample position that is covered by a geometric primitive. One advantage of the technique is that anti-aliasing quality may be improved since high frequency color components may be selectively supersampled for particular geometric primitives.
    Type: Application
    Filed: October 5, 2010
    Publication date: April 7, 2011
    Inventors: John Erik Lindholm, Henry Packard Moreton, Ming Y. Siu, Stuart F. Oberman
  • Publication number: 20110072244
    Abstract: One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.
    Type: Application
    Filed: September 17, 2010
    Publication date: March 24, 2011
    Inventors: John Erik Lindholm, Brett W. Coon, Jered Wierzbicki, Robert J. Stoll, Stuart F. Oberman
  • Publication number: 20110072243
    Abstract: One embodiment of the present invention sets forth a technique for collecting operands specified by an instruction. As a sequence of instructions is received the operands specified by the instructions are assigned to ports, so that each one of the operands specified by a single instruction is assigned to a different port. Reading of the operands from a multi-bank register file is scheduled by selecting an operand from each one of the different ports to produce an operand read request and ensuring that two or more of the selected operands are not stored in the same bank of the multi-bank register file. The operands specified by the operand read request are read from the multi-bank register file in a single clock cycle. Each instruction is then executed as the operands specified by the instruction are read from the multi-bank register file and collected over one or more clock cycles.
    Type: Application
    Filed: September 3, 2010
    Publication date: March 24, 2011
    Inventors: Xiaogang Qiu, Ming Y. Siu, Yan Yan Tang, John Erik Lindholm, Michael C. Shebanow, Stuart F. Oberman
  • Patent number: 7834881
    Abstract: An apparatus and method for simulating a multi-ported memory using lower port count memories as banks. A collector units gather source operands from the banks as needed to process program instructions. The collector units also gather constants that are used as operands. When all of the source operands needed to process a program instruction have been gathered, a collector unit outputs the source operands to an execution unit while avoiding writeback conflicts to registers specified by the program instruction that may be accessed by other execution units.
    Type: Grant
    Filed: November 1, 2006
    Date of Patent: November 16, 2010
    Assignee: NVIDIA Corporation
    Inventors: Samuel Liu, John Erik Lindholm, Ming Y Siu, Brett W. Coon, Stuart F. Oberman
  • Patent number: 7809852
    Abstract: A system and method for converting low-jitter, interleaved frame traffic, such as that generated in an IP network, to high jitter traffic to improve the utilization of bandwidth on arbitrated loops such as Fibre Channel Arbitrated Loops. Embodiments of a high jitter scheduling algorithm may be used in devices such as network switches that interface an arbitrated loop with an IP network that carries low-jitter traffic. The high jitter algorithm may use a separate queue for each device on the arbitrated loop, or alternatively may use one queue for two or more devices. Incoming frames are distributed among the queues based upon each frame's destination device. The scheduling algorithm may then service the queues and forward queued frames to the devices from the queues. In one embodiment, the queues are serviced in a round-robin fashion. In one embodiment, each queue may be serviced for a programmed limit.
    Type: Grant
    Filed: May 22, 2002
    Date of Patent: October 5, 2010
    Assignee: Brocade Communications Systems, Inc.
    Inventors: Rodney N. Mullendore, Stuart F. Oberman, Anil Mehta, Keith Schakel, Kamran Malik
  • Patent number: 7680988
    Abstract: A shared memory is usable by concurrent threads in a multithreaded processor, with any addressable storage location in the shared memory being readable and writeable by any of the threads. Processing engines that execute the threads are coupled to the shared memory via an interconnect that transfers data in only one direction (e.g., from the shared memory to the processing engines); the same interconnect supports both read and write operations. The interconnect advantageously supports multiple parallel read or write operations.
    Type: Grant
    Filed: October 30, 2006
    Date of Patent: March 16, 2010
    Assignee: NVIDIA Corporation
    Inventors: John R. Nickolls, Brett W. Coon, Ming Y. Siu, Stuart F. Oberman, Samuel Liu
  • Patent number: 7659893
    Abstract: At least two different processing sections in a graphics processors compute Z coordinates for a sample location from a compressed Z representation. The processors are designed to ensure that Z coordinates computed in any unit in the processor are identical. In one embodiment, the respective arithmetic circuits included in each processing section that computes Z coordinates are “bit-identical,” meaning that, for any input planar Z representation and coordinates, the output Z coordinates produced by the circuits are identical to each other.
    Type: Grant
    Filed: October 2, 2006
    Date of Patent: February 9, 2010
    Assignee: NVIDIA Corporation
    Inventors: Stuart F. Oberman, Steven E. Molnar, Alexander L. Minkin, Peter B. Holmqvist
  • Patent number: 7640285
    Abstract: Multipurpose arithmetic functional units can perform planar attribute interpolation and unary function approximation operations. In one embodiment, planar interpolation operations for coordinates (x, y) are executed by computing A*x+B*y+C, and unary function approximation operations for operand x are executed by computing F2(xb)*xh2+F1(xb)*xh+F0(xb), where xh=x?xb. Shared multiplier and adder circuits are advantageously used to implement the product and sum operations for both classes of operations.
    Type: Grant
    Filed: October 20, 2004
    Date of Patent: December 29, 2009
    Assignee: NVIDIA Corporation
    Inventors: Stuart F. Oberman, Ming Y. Siu