Patents by Inventor David C. Tannenbaum

David C. Tannenbaum has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Flexible-access instructions for efficient access of ML data

Patent number: 11971949

Abstract: A graphics processing unit (GPU) and a method is disclosed that performs a convolution operation recast as a matrix multiplication operation. The GPU includes a register file, a processor and a state machine. The register file stores data of an input feature map and data of a filter weight kernel. The processor performs a convolution operation on data of the input feature map and data of the filter weight kernel as a matrix multiplication operation. The state machine facilitates performance of the convolution operation by unrolling the data of the input feature map and the data of the filter weight kernel in the register file. The state machine includes control registers that determine movement of data through the register file to perform the matrix multiplication operation on the data in the register file in an unrolled manner.

Type: Grant

Filed: February 10, 2021

Date of Patent: April 30, 2024

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Christopher P. Frascati, Simon Waters, Rama S. B Harihara, David C. Tannenbaum
Systems and methods of adaptive, variable-rate, hybrid ray tracing

Patent number: 11869117

Abstract: A hybrid ray tracing system includes: a processor; and memory including instructions that, when executed by the processor, cause the processor to: identify a subset of pixels of an image to be ray-traced based on variable rate shading (VRS) screenspace image data; set, based on the VRS screenspace image data, one or more material properties of at least one object corresponding to the subset of pixels; and perform ray-tracing for the subset of pixels to generate a ray-traced image. The ray-tracing includes performing a limited ray casting process based on the set one or more material properties.

Type: Grant

Filed: January 14, 2022

Date of Patent: January 9, 2024

Assignee: Samsung Electronics Co., Ltd.

Inventors: Keshavan Varadarajan, David C. Tannenbaum
Methods and apparatus for pixel packing

Patent number: 11798218

Abstract: A method of packing coverage in a graphics processing unit (GPU) may include receiving an indication for a portion of an image, determining, based on the indication, a packing technique for the portion of the image, and packing coverage for the portion of the image based on the packing technique. The indication may include one or more of: an importance, a quality, a level of interest, a level of detail, or a variable-rate shading (VRS) level. The indication may be received from an application. The packing technique may include array merging. The array merging may include quad merging. The packing technique may include pixel piling. The packing technique may be a first packing technique, and the method may further include determining, based on the indication, a second packing technique for the portion of the image, and packing coverage for the portion of the image based on the second packing technique.

Type: Grant

Filed: October 15, 2021

Date of Patent: October 24, 2023

Inventors: Keshavan Varadarajan, Veynu Narasiman, David C. Tannenbaum
Method and apparatus for the automation of variable rate shading in a GPU driver context

Patent number: 11763521

Abstract: A system and a method are disclosed for varying a pixel-rate functionality of a GPU as an optional feature without an explicit implementation from within an application. User interface (UI) content may be detected in a draw call of an application and a variable-rate shader lookup map may be generated based on the detected UI content. A pixel rate of 3D content may be increased using the variable-rate shader lookup map. Additionally or alternatively, other conditions may be detected for increasing the pixel rate, such as using information in an application profile, detecting high or low luminance values, detecting motion and/or detecting temporal anti-aliasing.

Type: Grant

Filed: October 6, 2021

Date of Patent: September 19, 2023

Inventors: Gabriel T. Dagani, Gregory Bergschneider, David C. Tannenbaum
Method for performing shader occupancy for small primitives

Patent number: 11748933

Abstract: A GPU includes shader cores and a shader warp packer unit. The shader warp packer unit may receive a first primitive associated with a first partially covered quad, and a second primitive associated with a second partially covered quad. The shader warp packer unit may determine that the first partially covered quad and the second partially covered quad have non-overlapping coverage. The shader warp packer unit may pack the first partially covered quad and the second partially covered quad into a packed quad. The shader warp packer unit may send the packed quad to the shader cores. The first partially covered quad and the second partially covered quad may be spatially disjoint from each other. The shader cores may receive and process the packed quad with no loss of information relative to the shader cores individually processing the first partially covered quad and the second partially covered quad.

Type: Grant

Filed: February 4, 2021

Date of Patent: September 5, 2023

Inventors: Keshavan Varadarajan, David C. Tannenbaum, F N U Gurupad
Method for performing shader occupancy for small primitives

Patent number: 11715252

Abstract: A GPU includes shader cores and a shader warp packer unit. The shader warp packer unit may receive a first primitive associated with a first partially covered quad, and a second primitive associated with a second partially covered quad. The shader warp packer unit may determine that the first partially covered quad and the second partially covered quad have non-overlapping coverage. The shader warp packer unit may pack the first partially covered quad and the second partially covered quad into a packed quad. The shader warp packer unit may send the packed quad to the shader cores. The first partially covered quad and the second partially covered quad may be spatially disjoint from each other. The shader cores may receive and process the packed quad with no loss of information relative to the shader cores individually processing the first partially covered quad and the second partially covered quad.

Type: Grant

Filed: February 4, 2021

Date of Patent: August 1, 2023

Inventors: Keshavan Varadarajan, David C. Tannenbaum, F N U Gurupad
Methods and apparatus for atomic operations with multiple processing paths

Patent number: 11620222

Abstract: A method for performing an atomic memory operation may include receiving an atomic input, receiving an address for an atomic memory location, and performing an atomic operation on the atomic memory location based on the atomic input, wherein performing the atomic operation may include performing a first operation on a first portion of the atomic input, and performing a second operation, which may be different from the first operation, on a second portion of the atomic input. The method may further include storing a result of the first operation in a first portion of the atomic memory location, and storing a result of the second operation in a second portion of the atomic memory location. The method may further include returning an original content of the first portion of the atomic memory location concatenated with an original content of the second portion of the atomic memory location.

Type: Grant

Filed: October 30, 2020

Date of Patent: April 4, 2023

Inventors: David C. Tannenbaum, Raun M. Krisch, Christopher P. Frascati
Methods and apparatus for implementing cache policies in a graphics processing unit

Patent number: 11610281

Abstract: A method of processing a workload in a graphics processing unit (GPU) may include detecting a work item of the workload in the GPU, determining a cache policy for the work item, and operating at least a portion of a cache memory hierarchy in the GPU for at least a portion of the work item based on the cache policy. The work item may be detected based on information received from an application and/or monitoring one or more performance counters by a driver and/or hardware detection logic. The method may further include monitoring one or more performance counters, wherein the cache policy for the work item may be determined and/or changed based on the one or more performance counters. The cache policy for the work item may be selected based on a runtime learning model.

Type: Grant

Filed: January 11, 2021

Date of Patent: March 21, 2023

Inventors: Sushant Kondguli, Arun Radhakrishnan, Zachary D. Neyland, David C. Tannenbaum
METHOD AND APPARATUS FOR THE AUTOMATION OF VARIABLE RATE SHADING IN A GPU DRIVER CONTEXT

Publication number: 20230052075

Abstract: A system and a method are disclosed for varying a pixel-rate functionality of a GPU as an optional feature without an explicit implementation from within an application. User interface (UI) content may be detected in a draw call of an application and a variable-rate shader lookup map may be generated based on the detected UI content. A pixel rate of 3D content may be increased using the variable-rate shader lookup map. Additionally or alternatively, other conditions may be detected for increasing the pixel rate, such as using information in an application profile, detecting high or low luminance values, detecting motion and/or detecting temporal anti-aliasing.

Type: Application

Filed: October 6, 2021

Publication date: February 16, 2023

Inventors: Gabriel T. DAGANI, Gregory BERGSCHNEIDER, David C. TANNENBAUM
SYSTEMS AND METHODS OF ADAPTIVE, VARIABLE-RATE, HYBRID RAY TRACING

Publication number: 20220301233

Abstract: A hybrid ray tracing system includes: a processor; and memory including instructions that, when executed by the processor, cause the processor to: identify a subset of pixels of an image to be ray-traced based on variable rate shading (VRS) screenspace image data; set, based on the VRS screenspace image data, one or more material properties of at least one object corresponding to the subset of pixels; and perform ray-tracing for the subset of pixels to generate a ray-traced image. The ray-tracing includes performing a limited ray casting process based on the set one or more material properties.

Type: Application

Filed: January 14, 2022

Publication date: September 22, 2022

Inventors: Keshavan Varadarajan, David C. Tannenbaum
Shader accessible configurable binning subsystem

Patent number: 11416960

Abstract: A binning subsystem of a GPU includes a storage subsystem, a shader core to output first data via a first path, a selector to receive the first data via the first path, and to receive second data from the storage subsystem via a second path. The storage subsystem includes a binner unit and a control logic unit. The control logic unit causes the selector to transfer the first data or the second data to the binner unit. The binner unit may transfer binner output data to the shader core via a third path. The binner unit may transfer the binner output data to one or more subsequent stages of a graphics pipeline via a fourth path. The binner unit may transfer the binner output data to the storage subsystem via a fifth path. The control logic unit may control the binner unit such that the binner unit can be used for general purpose computation.

Type: Grant

Filed: December 2, 2020

Date of Patent: August 16, 2022

Inventors: David C. Tannenbaum, Keshavan Varadarajan, Veynu Narasiman
METHOD AND APPARATUS FOR DISPLAYING MULTIPLE DEVICES ON SHARED SCREEN

Publication number: 20220206737

Abstract: A system and method is disclosed that allows multiple casting devices to work together to populate a large display screen according to the subject matter disclosed herein. The system includes a receiving device that includes two or more screen-cast receivers and a controller. Each screen-cast receiver receives from a corresponding casting device at least a portion of a frame of original content of the corresponding casting device generated in a native resolution of the corresponding casting device. The controller synchronizes each received portion of the frame of the original content of the corresponding casting device to form a video output signal that comprises a combination of each received portion, in addition to any internally generated content derived by the receiving display. A casting device may be a smartphone, a tablet, or a computing device, such as a laptop computer.

Type: Application

Filed: March 22, 2021

Publication date: June 30, 2022

Inventors: Gabriel T. DAGANI, David C. TANNENBAUM, Christopher P. FRASCATI, Michael PHILLIP
FLEXIBLE-ACCESS INSTRUCTIONS FOR EFFICIENT ACCESS OF ML DATA

Publication number: 20220197976

Abstract: A graphics processing unit (GPU) and a method is disclosed that performs a convolution operation recast as a matrix multiplication operation. The GPU includes a register file, a processor and a state machine. The register file stores data of an input feature map and data of a filter weight kernel. The processor performs a convolution operation on data of the input feature map and data of the filter weight kernel as a matrix multiplication operation. The state machine facilitates performance of the convolution operation by unrolling the data of the input feature map and the data of the filter weight kernel in the register file. The state machine includes control registers that determine movement of data through the register file to perform the matrix multiplication operation on the data in the register file in an unrolled manner.

Type: Application

Filed: February 10, 2021

Publication date: June 23, 2022

Inventors: Christopher P. FRASCATI, Simon WATERS, Rama S.B HARIHARA, David C. TANNENBAUM
Method and apparatus for displaying multiple devices on shared screen

Patent number: 11360732

Abstract: A system and method is disclosed that allows multiple casting devices to work together to populate a large display screen according to the subject matter disclosed herein. The system includes a receiving device that includes two or more screen-cast receivers and a controller. Each screen-cast receiver receives from a corresponding casting device at least a portion of a frame of original content of the corresponding casting device generated in a native resolution of the corresponding casting device. The controller synchronizes each received portion of the frame of the original content of the corresponding casting device to form a video output signal that comprises a combination of each received portion, in addition to any internally generated content derived by the receiving display. A casting device may be a smartphone, a tablet, or a computing device, such as a laptop computer.

Type: Grant

Filed: March 22, 2021

Date of Patent: June 14, 2022

Inventors: Gabriel T. Dagani, David C. Tannenbaum, Christopher P. Frascati, Michael Phillip
SHADER ACCESSIBLE CONFIGURABLE BINNING SUBSYSTEM

Publication number: 20220148122

Abstract: A binning subsystem of a GPU includes a storage subsystem, a shader core to output first data via a first path, a selector to receive the first data via the first path, and to receive second data from the storage subsystem via a second path. The storage subsystem includes a binner unit and a control logic unit. The control logic unit causes the selector to transfer the first data or the second data to the binner unit. The binner unit may transfer binner output data to the shader core via a third path. The binner unit may transfer the binner output data to one or more subsequent stages of a graphics pipeline via a fourth path. The binner unit may transfer the binner output data to the storage subsystem via a fifth path. The control logic unit may control the binner unit such that the binner unit can be used for general purpose computation.

Type: Application

Filed: December 2, 2020

Publication date: May 12, 2022

Inventors: David C. TANNENBAUM, Keshavan VARADARAJAN, Veynu NARASIMAN
Method and apparatus for graphics driver optimization using daemon-based resources

Patent number: 11321907

Abstract: A system and a method are disclosed that optimizes a graphics driver. The system may be embodied as a computing device that includes a storage that is internal to the computing device, a graphic processing unit that includes a driver and a controller. The controller may be configured to run a daemon process that optimizes a shader and/or a shader pipeline for an application that is resident on the computing device when the computing device is not running the application and stores at least one optimization for the shader in the storage. The at least one optimization may be based on the application. The daemon process may further receive a request from the driver of the GPU for an optimization for the shader/shader pipeline during a runtime compilation of the shader and provide the at least one optimization to the driver of the GPU from the storage.

Type: Grant

Filed: April 9, 2021

Date of Patent: May 3, 2022

Inventors: Gabriel T. Dagani, Raun M. Krisch, Zachary Neyland, Robert Metzger, David C. Tannenbaum
METHODS AND APPARATUS FOR ATOMIC OPERATIONS WITH MULTIPLE PROCESSING PATHS

Publication number: 20220066934

Abstract: A method for performing an atomic memory operation may include receiving an atomic input, receiving an address for an atomic memory location, and performing an atomic operation on the atomic memory location based on the atomic input, wherein performing the atomic operation may include performing a first operation on a first portion of the atomic input, and performing a second operation, which may be different from the first operation, on a second portion of the atomic input. The method may further include storing a result of the first operation in a first portion of the atomic memory location, and storing a result of the second operation in a second portion of the atomic memory location. The method may further include returning an original content of the first portion of the atomic memory location concatenated with an original content of the second portion of the atomic memory location.

Type: Application

Filed: October 30, 2020

Publication date: March 3, 2022

Inventors: David C. TANNENBAUM, Raun M. KRISCH, Christopher P. FRASCATI
METHODS AND APPARATUS FOR IMPLEMENTING CACHE POLICIES IN A GRAPHICS PROCESSING UNIT

Publication number: 20220067876

Abstract: A method of processing a workload in a graphics processing unit (GPU) may include detecting a work item of the workload in the GPU, determining a cache policy for the work item, and operating at least a portion of a cache memory hierarchy in the GPU for at least a portion of the work item based on the cache policy. The work item may be detected based on information received from an application and/or monitoring one or more performance counters by a driver and/or hardware detection logic. The method may further include monitoring one or more performance counters, wherein the cache policy for the work item may be determined and/or changed based on the one or more performance counters. The cache policy for the work item may be selected based on a runtime learning model.

Type: Application

Filed: January 11, 2021

Publication date: March 3, 2022

Inventors: Sushant KONDGULI, Arun RADHAKRISHNAN, Zachary D. NEYLAND, David C. TANNENBAUM
METHODS AND APPARATUS FOR PIXEL PACKING

Publication number: 20220036634

Abstract: A method of packing coverage in a graphics processing unit (GPU) may include receiving an indication for a portion of an image, determining, based on the indication, a packing technique for the portion of the image, and packing coverage for the portion of the image based on the packing technique. The indication may include one or more of: an importance, a quality, a level of interest, a level of detail, or a variable-rate shading (VRS) level. The indication may be received from an application. The packing technique may include array merging. The array merging may include quad merging. The packing technique may include pixel piling. The packing technique may be a first packing technique, and the method may further include determining, based on the indication, a second packing technique for the portion of the image, and packing coverage for the portion of the image based on the second packing technique.

Type: Application

Filed: October 15, 2021

Publication date: February 3, 2022

Inventors: Keshavan VARADARAJAN, Veynu NARASIMAN, David C. TANNENBAUM
POST-PROCESSING IN A MEMORY-SYSTEM EFFICIENT MANNER

Publication number: 20220036632

Abstract: A GPU includes one or more post-processing controllers, and a 3D graphics pipeline including a post-processing shader stage following a pixel shader stage. The one or more post-processing controllers may synchronize an execution of one or more post-processing stages including the post-processing shader stage. The 3D pipeline may include one or more pixel shaders, one or more tile buffers, and a direct communication link between the post-processing shader stage and the one or more tile buffers. The one or more post-processing controllers may synchronize communication between the one or more post-processing shaders and the one or more tile buffers.

Type: Application

Filed: February 26, 2021

Publication date: February 3, 2022

Inventors: Raun M. KRISCH, David C. TANNENBAUM, Moumine BALLO, Keshavan VARADARAJAN

1 2 3 4 next