Patents by Inventor Stuart F. Oberman
Stuart F. Oberman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230100785Abstract: In various embodiments, the maximum or minimum of multiple input values is determined. For each of a set of possible values, a corresponding detection result is set to indicate whether at least one of the input values matches the possible value. The detection results are used to ascertain the maximum or minimum of the multiple input values.Type: ApplicationFiled: September 28, 2021Publication date: March 30, 2023Inventors: Ilyas ELKIN, Brent Ralph BOSWELL, Stuart F. OBERMAN, Ming Y. SIU
-
Patent number: 11150721Abstract: A system and method are described for providing hints to a processing unit that subsequent operations are likely. Responsively, the processing unit takes steps to prepare for the likely subsequent operations. Where the hints are more likely than not to be correct, the processing unit operates more efficiently. For example, in an embodiment, the processing unit consumes less power. In another embodiment, subsequent operations are performed more quickly because the processing unit is prepared to efficiently handle the subsequent operations.Type: GrantFiled: November 7, 2012Date of Patent: October 19, 2021Assignee: NVIDIA CorporationInventors: David Conrad Tannenbaum, Ming Y. Siu, Stuart F Oberman, Colin Sprinkle, Srinivasan Iyer, Ian Chi Yan Kwong
-
Patent number: 10503513Abstract: A subsystem is configured to support a distributed instruction set architecture with primary and secondary execution pipelines. The primary execution pipeline supports the execution of a subset of instructions in the distributed instruction set architecture that are issued frequently. The secondary execution pipeline supports the execution of another subset of instructions in the distributed instruction set architecture that are issued less frequently. Both execution pipelines also support the execution of FFMA instructions as well as a common subset of instructions in the distributed instruction set architecture. When dispatching a requested instruction, an instruction scheduling unit is configured to select between the two execution pipelines based on various criteria. Those criteria may include power efficiency with which the instruction can be executed and availability of execution units to support execution of the instruction.Type: GrantFiled: October 23, 2013Date of Patent: December 10, 2019Assignee: NVIDIA CORPORATIONInventors: David Conrad Tannenbaum, Srinivasan (Vasu) Iyer, Stuart F. Oberman, Ming Y. Siu, Michael Alan Fetterman, John Matthew Burgess, Shirish Gadre
-
Patent number: 10217184Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.Type: GrantFiled: May 23, 2017Date of Patent: February 26, 2019Assignee: NVIDIA CORPORATIONInventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
-
Patent number: 9829956Abstract: An approach is provided for enabling power reduction in floating-point operations. In one example, a system receives floating-point numbers of a fused multiply-add instruction. The system determines the fused multiply-add instruction does not require compliance with a standard of precision for floating-point numbers. The system generates gating signals for an integrated circuit that is configured to perform operations of the fused multiply-add instruction. The system then sends the gating signals to the integrated circuit to turn off a plurality of logic gates included in the integrated circuit.Type: GrantFiled: November 21, 2012Date of Patent: November 28, 2017Assignee: NVIDIA CorporationInventors: David Conrad Tannenbaum, Colin Sprinkle, Stuart F. Oberman, Ming Y. Siu, Srinivasan Iyer, Ian-Chi Yan Kwong
-
Publication number: 20170256022Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.Type: ApplicationFiled: May 23, 2017Publication date: September 7, 2017Inventors: John Erik LINDHOLM, Brett W. COON, Stuart F. OBERMAN, Ming Y. SIU, Matthew P. GERLACH
-
Patent number: 9659339Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.Type: GrantFiled: March 25, 2013Date of Patent: May 23, 2017Assignee: NVIDIA CORPORATIONInventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
-
Publication number: 20160300319Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.Type: ApplicationFiled: March 25, 2013Publication date: October 13, 2016Applicant: NVIDIA CorporationInventors: John Erik LINDHOLM, Brett W. COON, Stuart F. OBERMAN, Ming Y. SIU, Matthew P. GERLACH
-
Patent number: 9465575Abstract: A fused floating-point multiply-add element includes a multiplier that generates a product, and a shifter that shifts an addend within a narrow range. Interpreting logic analyzes the magnitude of the addend relative to the product and then causes logic arrays to position the shifted addend within the left, center, or right portions of a composite register depending in the magnitude of the addend relative to the product. The interpreting logic also forces other portions of the composite register to zero. When the addend is zero, the interpreting logic forces all portions of the composite register to zero. Final combining logic then adds the contents of the composite register to the product.Type: GrantFiled: August 5, 2013Date of Patent: October 11, 2016Assignee: NVIDIA CorporationInventors: Srinivasan Iyer, David Conrad Tannenbaum, Stuart F. Oberman, Ming (Michael) Y. Siu
-
Patent number: 9189242Abstract: One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.Type: GrantFiled: September 17, 2010Date of Patent: November 17, 2015Assignee: NVIDIA CorporationInventors: John Erik Lindholm, Brett W. Coon, Jered Wierzbicki, Robert J. Stoll, Stuart F. Oberman
-
Publication number: 20150193203Abstract: A four cycle fused floating point multiply-add unit includes a radix 8 Booth encoder multiplier that is partitioned over two stages with the compression element allocated to the second stage. The unit further includes an improved shifter design. Processing logic analyzes the input operands, detects values of zero and one, and inhibits portions of the processing logic accordingly. When one of the multiplicand inputs has a value of zero or one, the required multiplication becomes trivial, and the unit inhibits the associated coding logic and data transfer to reduce power consumption. The unit then performs an add-only operation. When the addend input has a value of zero, the addition becomes trivial, and the unit inhibits the improved shifter and data transfer to further reduce power consumption. The unit then performs a multiply-only operation.Type: ApplicationFiled: January 7, 2014Publication date: July 9, 2015Applicant: NVIDIA CORPORATIONInventors: Srinivasan (Vasu) IYER, David Conrad TANNENBAUM, Stuart F. OBERMAN
-
Patent number: 9058672Abstract: One embodiment of the present invention sets forth a technique controlling the pixel location at which the plane equation is evaluated. Multiple pixel offsets (dx, dy) may be specified that each define to a sub-pixel sample position. Attributes are then calculated for each sub-pixel sample position that is covered by a geometric primitive. One advantage of the technique is that anti-aliasing quality may be improved since high frequency color components may be selectively supersampled for particular geometric primitives.Type: GrantFiled: October 5, 2010Date of Patent: June 16, 2015Assignee: NVIDIA CorporationInventors: John Erik Lindholm, Henry Packard Moreton, Ming Y. Siu, Stuart F. Oberman
-
Publication number: 20150113254Abstract: A subsystem is configured to support a distributed instruction set architecture with primary and secondary execution pipelines. The primary execution pipeline supports the execution of a subset of instructions in the distributed instruction set architecture that are issued frequently. The secondary execution pipeline supports the execution of another subset of instructions in the distributed instruction set architecture that are issued less frequently. Both execution pipelines also support the execution of FFMA instructions as well a common subset of instructions in the distributed instruction set architecture. When dispatching a requested instruction, an instruction scheduling unit is configured to select between the two execution pipelines based on various criteria. Those criteria may include power efficiency with which the instruction can be executed and availability of execution units to support execution of the instruction.Type: ApplicationFiled: October 23, 2013Publication date: April 23, 2015Applicant: NVIDIA CORPORATIONInventors: David Conrad TANNENBAUM, Srinivasan (Vasu) IYER, Stuart F. OBERMAN, Ming Y. SIU, Michael Alan FETTERMAN, John Matthew BURGESS, Shirish GADRE
-
Publication number: 20150039662Abstract: A fused floating-point multiply-add element includes a multiplier that generates a product, and a shifter that shifts an addend within a narrow range. Interpreting logic analyzes the magnitude of the addend relative to the product and then causes logic arrays to position the shifted addend within the left, center, or right portions of a composite register depending in the magnitude of the addend relative to the product. The interpreting logic also forces other portions of the composite register to zero. When the addend is zero, the interpreting logic forces all portions of the composite register to zero. Final combining logic then adds the contents of the composite register to the product.Type: ApplicationFiled: August 5, 2013Publication date: February 5, 2015Applicant: NVIDIA CORPORATIONInventors: Srinivasan IYER, David Conrad TANNENBAUM, Stuart F. OBERMAN, Ming (Michael) Y. SIU
-
Patent number: 8860741Abstract: In contrast to a conventional computing system in which the graphics processor (graphics processing unit or GPU) is treated as a slave to one or several CPUs, systems and methods are provided that allow the GPU to be treated as a central processing unit (CPU) from the perspective of the operating system. The GPU can access a memory space shared by other CPUs in the computing system. Caches utilized by the GPU may be coherent with caches utilized by other CPUs in the computing system. The GPU may share execution of general-purpose computations with other CPUs in the computing system.Type: GrantFiled: December 8, 2006Date of Patent: October 14, 2014Assignee: NVIDIA CorporationInventors: Norbert Juffa, Stuart F. Oberman
-
Patent number: 8860737Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.Type: GrantFiled: July 19, 2006Date of Patent: October 14, 2014Assignee: NVIDIA CorporationInventors: John Erik Lindholm, Brett W. Coon, Stuart F. Oberman, Ming Y. Siu, Matthew P. Gerlach
-
Publication number: 20140285500Abstract: A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.Type: ApplicationFiled: March 25, 2013Publication date: September 25, 2014Applicant: NVIDIA CorporationInventors: John Erik LINDHOLM, Brett W. COON, Stuart F. OBERMAN, Ming Y. SIU, Matthew P. GERLACH
-
Publication number: 20140143564Abstract: An approach is provided for enabling power reduction in floating-point operations. In one example, a system receives floating-point numbers of a fused multiply-add instruction. The system determines the fused multiply-add instruction does not require compliance with a standard of precision for floating-point numbers. The system generates gating signals for an integrated circuit that is configured to perform operations of the fused multiply-add instruction. The system then sends the gating signals to the integrated circuit to turn off a plurality of logic gates included in the integrated circuit.Type: ApplicationFiled: November 21, 2012Publication date: May 22, 2014Applicant: NVIDIA CorporationInventors: David Conrad TANNENBAUM, Colin SPRINKLE, Stuart F. OBERMAN, Ming Y. SIU, Srinivasan IYER, Ian-Chi Yan KWONG
-
Publication number: 20140129807Abstract: A system and method are described for providing hints to a processing unit that subsequent operations are likely. Responsively, the processing unit takes steps to prepare for the likely subsequent operations. Where the hints are more likely than not to be correct, the processing unit operates more efficiently. For example, in an embodiment, the processing unit consumes less power. In another embodiment, subsequent operations are performed more quickly because the processing unit is prepared to efficiently handle the subsequent operations.Type: ApplicationFiled: November 7, 2012Publication date: May 8, 2014Applicant: NVIDIA CORPORATIONInventors: David Conrad TANNENBAUM, Ming Y. SIU, Stuart F. OBERMAN, Colin SPRINKLE, Srinivasan IYER, Ian Chi Yan KWONG
-
Patent number: 8645638Abstract: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.Type: GrantFiled: May 7, 2012Date of Patent: February 4, 2014Assignee: NVIDIA CorporationInventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills