Patents by Inventor David C. Tannenbaum

David C. Tannenbaum has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHOD FOR PERFORMING SHADER OCCUPANCY FOR SMALL PRIMITIVES

Publication number: 20220036631

Abstract: A GPU includes shader cores and a shader warp packer unit. The shader warp packer unit may receive a first primitive associated with a first partially covered quad, and a second primitive associated with a second partially covered quad. The shader warp packer unit may determine that the first partially covered quad and the second partially covered quad have non-overlapping coverage. The shader warp packer unit may pack the first partially covered quad and the second partially covered quad into a packed quad. The shader warp packer unit may send the packed quad to the shader cores. The first partially covered quad and the second partially covered quad may be spatially disjoint from each other. The shader cores may receive and process the packed quad with no loss of information relative to the shader cores individually processing the first partially covered quad and the second partially covered quad.

Type: Application

Filed: February 4, 2021

Publication date: February 3, 2022

Inventors: Keshavan VARADARAJAN, David C. TANNENBAUM, FNU GURUPAD
PRECISION MODULATED SHADING

Publication number: 20210358191

Abstract: A GPU is disclosed, which may include a VRS interface to provide spatial information and/or primitive-specific information. The GPU may include one or more shader cores including a control logic section to determine a shading precision value based on the spatial information and/or the primitive-specific information. The control logic section may modulate a shading precision according to the shading precision value. A method for controlling shading precision by a GPU may include providing, by a VRS interface, the spatial information and/or primitive-specific information. The method may include determining, by a control logic section, a shading precision value based on the spatial information and/or the primitive-specific information. The method may include modulating a shading precision according to the shading precision value.

Type: Application

Filed: November 20, 2020

Publication date: November 18, 2021

Inventors: Christopher P. FRASCATI, Raun M. KRISCH, Derek J. LENTZ, David C. TANNENBAUM
Efficient redundant coverage discard mechanism to reduce pixel shader work in a tile-based graphics rendering pipeline

Patent number: 11010954

Abstract: A computer-implemented redundant-coverage discard method and apparatus for reducing pixel shader work in a tile-based graphics rendering pipeline is disclosed. A coverage block information (CBI) FIFO buffer is disposed within an early coverage discard (ECD) logic section. The FIFO buffer receives and buffers coverage blocks in FIFO order. At least one coverage block that matches the block position within the TCPM is updated. The TCPM stores per-pixel primitive coverage information. The FIFO buffer buffers a moving window of the coverage blocks. Incoming primitive information associated with the coverage blocks is compared with the per-pixel primitive coverage information stored in the tile coverage-primitive map (TCPM) table at the corresponding positions for the live coverages only. Any preceding overlapping coverage within the moving window of the coverage blocks is rejected. An alternate embodiment uses a doubly linked-list rather than a FIFO buffer.

Type: Grant

Filed: June 11, 2019

Date of Patent: May 18, 2021

Inventors: Nilanjan Goswami, Derek Lentz, Adithya Hrudhayan Krishnamurthy, David C. Tannenbaum
Power saving branch modes in hardware

Patent number: 10691455

Abstract: A method and apparatus are provided. The method includes executing a plurality of threads in a temporal dimension, executing a plurality of threads in a spatial dimension, determining a branch target address for each of the plurality of threads in the temporal dimension and the plurality of threads in the spatial dimension, and comparing each of the branch target addresses to determine a minimum branch target address, wherein the minimum branch target address is a minimum value among branch target addresses of each of the plurality of threads.

Type: Grant

Filed: August 23, 2017

Date of Patent: June 23, 2020

Assignee: Samsung Electronics Co., Ltd

Inventors: Tejash M. Shah, Srinivasan S. Iyer, David C. Tannenbaum
EFFICIENT REDUNDANT COVERAGE DISCARD MECHANISM TO REDUCE PIXEL SHADER WORK IN A TILE-BASED GRAPHICS RENDERING PIPELINE

Publication number: 20200184715

Abstract: A computer-implemented redundant-coverage discard method and apparatus for reducing pixel shader work in a tile-based graphics rendering pipeline is disclosed. A coverage block information (CBI) FIFO buffer is disposed within an early coverage discard (ECD) logic section. The FIFO buffer receives and buffers coverage blocks in FIFO order. At least one coverage block that matches the block position within the TCPM is updated. The TCPM stores per-pixel primitive coverage information. The FIFO buffer buffers a moving window of the coverage blocks. Incoming primitive information associated with the coverage blocks is compared with the per-pixel primitive coverage information stored in the tile coverage-primitive map (TCPM) table at the corresponding positions for the live coverages only. Any preceding overlapping coverage within the moving window of the coverage blocks is rejected. An alternate embodiment uses a doubly linked-list rather than a FIFO buffer.

Type: Application

Filed: June 11, 2019

Publication date: June 11, 2020

Inventors: Nilanjan GOSWAMI, Derek LENTZ, Adithya Hrudhayan KRISHNAMURTHY, David C. TANNENBAUM
Efficient interface and transport mechanism for binding bindless shader programs to run-time specified graphics pipeline configurations and objects

Patent number: 10635439

Abstract: A system and method for binding instructions to a graphical processing unit (GPU) includes a GPU configured to receive bindlessly compiled instructions and interpret the bindlessly compiled instruction at runtime to identify a needed conversion The GPU generates a conversion information based on the bindlessly compiled instruction and needed conversion and converts the bindlessly compiled instruction according to the conversion information to generate a bound format instruction. The GPU may then execute the bound format instruction.

Type: Grant

Filed: September 10, 2018

Date of Patent: April 28, 2020

Assignee: Samsung Electronics Co., Ltd.

Inventors: Mitchell K. Alsup, David C. Tannenbaum, Derek Lentz, Srinivasan S. Iyer, Christopher J. Goodman
EFFICIENT INTERFACE AND TRANSPORT MECHANISM FOR BINDING BINDLESS SHADER PROGRAMS TO RUN-TIME SPECIFIED GRAPHICS PIPELINE CONFIGURATIONS AND OBJECTS

Publication number: 20190384600

Abstract: A system and method for binding instructions to a graphical processing unit (GPU) includes a GPU configured to receive bindlessly compiled instructions and interpret the bindlessly compiled instruction at runtime to identify a needed conversion The GPU generates a conversion information based on the bindlessly compiled instruction and needed conversion and converts the bindlessly compiled instruction according to the conversion information to generate a bound format instruction. The GPU may then execute the bound format instruction.

Type: Application

Filed: September 10, 2018

Publication date: December 19, 2019

Inventors: Mitchell K. Alsup, David C. Tannenbaum, Derek Lentz, Srinivasan S. Iyer, Christopher J. Goodman
Central arbitration scheme for a highly efficient interconnection topology in a GPU

Patent number: 10496578

Abstract: According to one general aspect, an apparatus may include a network of node circuits and a central arbiter circuit. The network of node circuits is within an integrated circuit, wherein the network includes a plurality of segments. The central arbiter circuit may be configured to schedule a routing of a message between a pair of node circuits in the network, wherein the routing includes a guaranteed latency between the pair of node circuits.

Type: Grant

Filed: September 22, 2017

Date of Patent: December 3, 2019

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: David C. Tannenbaum, Mitchell K. Alsup, Srinivasan S. Iyer
Highly flexible performance counter and system debug module

Patent number: 10386410

Abstract: According to one general aspect, an apparatus may include a plurality of performance and debug monitoring circuits (PDMCs). Each performance and debug monitoring circuit (PDMC) may include an input stage, a combinatorial stage, and a counter. The input stage may be configured to receive a plurality of input signals, wherein the input signals include: signals from other performance and debug monitoring circuits, signals from combinatorial logic circuits, and configuration values. The combinatorial stage may be configured to perform one or more logical operations on a selected sub-set of the input signals. The counter may be configured to increment based, at least in part, upon a result of the combinatorial stage.

Type: Grant

Filed: March 20, 2017

Date of Patent: August 20, 2019

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Lawrence H. Rubin, David C. Tannenbaum
System and method for maintaining data in a low-power structure

Patent number: 10360034

Abstract: A graphics processing unit may include a register file memory, a processing element (PE) and a load-store unit (LSU). The register file memory includes a plurality of registers. The PE is coupled to the register file memory and processes at least one thread of a vector of threads of a graphical application. Each thread in the vector of threads are processed in a non-stalling manner. The PE stores data in a first predetermined set of the plurality of registers in the register file memory that has been generated by processing the at least one thread and that is to be routed to a first stallable logic unit that is external to the PE. The LSU is coupled to the register file memory, and the LSU accesses the data in the first predetermined set of the plurality of registers and routes to the first stallable logic unit.

Type: Grant

Filed: June 26, 2017

Date of Patent: July 23, 2019

Assignee: Samsung Electronics Co., Ltd.

Inventors: David C. Tannenbaum, Srinivasan S. Iyer, Mitchell K. Alsup
Lightweight, low overhead debug bus

Patent number: 10310012

Abstract: According to one general aspect, an apparatus may include an interconnect bus, an interconnect-to-debug bus interface, and a debug bus. The interconnect bus may be configured to connect and manage combinatorial logical blocks during normal operation of a processor and operate synchronous to a core clock. The interconnect-to-debug bus interface may be configured to translate communications between the interconnect bus and the debug bus. The debug bus may include a plurality of debug wrapper circuits arranged in a daisy chain for unidirectional communication, and configured to operate synchronous to the core clock. Each of the plurality of debug wrapper circuits may be configured to: identify if the respective debug wrapper circuit is activated by the debug bus, receive a non-invasive input from a respective combinatorial logic block, and place the non-invasive input from the respective combinatorial logic block on the debug bus.

Type: Grant

Filed: March 29, 2017

Date of Patent: June 4, 2019

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Lawrence H. Rubin, David C. Tannenbaum
Vertex attribute compression and decompression in hardware

Patent number: 10282889

Abstract: One or more embodiments of the present disclosure provide an apparatus used in source data compression, comprising a memory and a at least one processor. The memory is configured to store vertex attribute data and a set of instructions. The processor is coupled to the memory. The processor is configured to receive a source data stream that includes one or more values corresponding to the vertex attribute data. The processor is also configured to provide a dictionary for the one or more values in the source data stream, wherein the dictionary includes a plurality of index values corresponding to the one or more values in the source data stream. The processor is also configured to lace at least some of the one or more values in the source data stream with corresponding index values of the plurality of index values.

Type: Grant

Filed: February 14, 2017

Date of Patent: May 7, 2019

Assignee: Samsung Electronics Co., Ltd.

Inventors: David C. Tannenbaum, Manshila Adlakha, Vikash Kumar, Abhinav Golas
POWER SAVING BRANCH MODES IN HARDWARE

Publication number: 20180341489

Abstract: A method and apparatus are provided. The method includes executing a plurality of threads in a temporal dimension, executing a plurality of threads in a spatial dimension, determining a branch target address for each of the plurality of threads in the temporal dimension and the plurality of threads in the spatial dimension, and comparing each of the branch target addresses to determine a minimum branch target address, wherein the minimum branch target address is a minimum value among branch target addresses of each of the plurality of threads.

Type: Application

Filed: August 23, 2017

Publication date: November 29, 2018

Inventors: Tejash M. Shah, Srinivasan S. Iyer, David C. Tannenbaum
SYSTEM AND METHOD FOR MAINTAINING DATA IN A LOW-POWER STRUCTURE

Publication number: 20180300131

Abstract: A graphics processing unit may include a register file memory, a processing element (PE) and a load-store unit (LSU). The register file memory includes a plurality of registers. The PE is coupled to the register file memory and processes at least one thread of a vector of threads of a graphical application. Each thread in the vector of threads are processed in a non-stalling manner. The PE stores data in a first predetermined set of the plurality of registers in the register file memory that has been generated by processing the at least one thread and that is to be routed to a first stallable logic unit that is external to the PE. The LSU is coupled to the register file memory, and the LSU accesses the data in the first predetermined set of the plurality of registers and routes to the first stallable logic unit.

Type: Application

Filed: June 26, 2017

Publication date: October 18, 2018

Inventors: David C. TANNENBAUM, Srinivasan S. IYER, Mitchell K. ALSUP
CENTRAL ARBITRATION SCHEME FOR A HIGHLY EFFICIENT INTERCONNECTION TOPOLOGY IN A GPU

Publication number: 20180196771

Abstract: According to one general aspect, an apparatus may include a network of node circuits and a central arbiter circuit. The network of node circuits is within an integrated circuit, wherein the network includes a plurality of segments. The central arbiter circuit may be configured to schedule a routing of a message between a pair of node circuits in the network, wherein the routing includes a guaranteed latency between the pair of node circuits.

Type: Application

Filed: September 22, 2017

Publication date: July 12, 2018

Inventors: David C. TANNENBAUM, Mitchell K. ALSUP, Srinivasan S. IYER
LIGHTWEIGHT, LOW OVERHEAD DEBUG BUS

Publication number: 20180172765

Abstract: According to one general aspect, an apparatus may include an interconnect bus, an interconnect-to-debug bus interface, and a debug bus. The interconnect bus may be configured to connect and manage combinatorial logical blocks during normal operation of a processor and operate synchronous to a core clock. The interconnect-to-debug bus interface may be configured to translate communications between the interconnect bus and the debug bus. The debug bus may include a plurality of debug wrapper circuits arranged in a daisy chain for unidirectional communication, and configured to operate synchronous to the core clock. Each of the plurality of debug wrapper circuits may be configured to: identify if the respective debug wrapper circuit is activated by the debug bus, receive a non-invasive input from a respective combinatorial logic block, and place the non-invasive input from the respective combinatorial logic block on the debug bus.

Type: Application

Filed: March 29, 2017

Publication date: June 21, 2018

Inventors: Lawrence H. RUBIN, David C. TANNENBAUM
HIGHLY FLEXIBLE PERFORMANCE COUNTER AND SYSTEM DEBUG MODULE

Publication number: 20180164372

Abstract: According to one general aspect, an apparatus may include a plurality of performance and debug monitoring circuits (PDMCs). Each performance and debug monitoring circuit (PDMC) may include an input stage, a combinatorial stage, and a counter. The input stage may be configured to receive a plurality of input signals, wherein the input signals include: signals from other performance and debug monitoring circuits, signals from combinatorial logic circuits, and configuration values. The combinatorial stage may be configured to perform one or more logical operations on a selected sub-set of the input signals. The counter may be configured to increment based, at least in part, upon a result of the combinatorial stage.

Type: Application

Filed: March 20, 2017

Publication date: June 14, 2018

Inventors: Lawrence H. RUBIN, David C. TANNENBAUM
VERTEX ATTRIBUTE COMPRESSION AND DECOMPRESSION IN HARDWARE

Publication number: 20180150991

Abstract: One or more embodiments of the present disclosure provide an apparatus used in source data compression, comprising a memory and a at least one processor. The memory is configured to store vertex attribute data and a set of instructions. The processor is coupled to the memory. The processor is configured to receive a source data stream that includes one or more values corresponding to the vertex attribute data. The processor is also configured to provide a dictionary for the one or more values in the source data stream, wherein the dictionary includes a plurality of index values corresponding to the one or more values in the source data stream. The processor is also configured to lace at least some of the one or more values in the source data stream with corresponding index values of the plurality of index values.

Type: Application

Filed: February 14, 2017

Publication date: May 31, 2018

Inventors: David C. Tannenbaum, Manshila Adlakha, Vikash Kumar, Abhinav Golas
Logic circuitry configurable to perform 32-bit or dual 16-bit floating-point operations

Patent number: 9465578

Abstract: A system and method are provided for performing 32-bit or dual 16-bit floating-point arithmetic operations using logic circuitry. An operating mode that specifies an operating mode for a multiplication operation is received, where the operating mode is one of a 32-bit floating-point mode and a dual 16-bit floating-point mode. Based on the operating mode, nine recoding terms for a mantissa of at least one floating-point input operand are determined. A dual-mode multiplier array circuit that is configurable to generate partial products for either one 32-bit floating-point result or for two 16-bit floating-point results computes the partial products based on the nine recoding terms. The partial products are processed to generate an output based on the operating mode.

Type: Grant

Filed: December 13, 2013

Date of Patent: October 11, 2016

Assignee: NVIDIA Corporation

Inventors: David C. Tannenbaum, Srinivasan Iyer
System and method for dynamically reducing power consumption of floating-point logic

Patent number: 9268528

Abstract: A system and method are provided for dynamically reducing power consumption of floating-point logic. A disable control signal that is based on a characteristic of a floating-point format input operand is received and a portion of a logic circuit is disabled based on the disable control signal. The logic circuit processes the floating-point format input operand to generate an output.

Type: Grant

Filed: May 23, 2013

Date of Patent: February 23, 2016

Assignee: NVIDIA Corporation

Inventors: David C. Tannenbaum, Srinivasan Iyer

prev 1 2 3 4 next