Patents by Inventor Sean Treichler

Sean Treichler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Application programming interface to wait on matrix multiply-accumulate

Patent number: 12204897

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.

Type: Grant

Filed: November 30, 2022

Date of Patent: January 21, 2025

Assignee: NVIDIA CORPORATION

Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO WAIT ON MATRIX MULTIPLY-ACCUMULATE

Publication number: 20240168762

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.

Type: Application

Filed: November 30, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO SYNCHRONIZE MATRIX MULTIPLY-ACCUMULATE MEMORY TRANSACTIONS

Publication number: 20240169022

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until matrix multiply-accumulate (MMA) memory transactions are performed.

Type: Application

Filed: November 30, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE MATRIX MULTIPLY-ACCUMULATE

Publication number: 20240169023

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to indicate whether matrix multiply-accumulate (MMA) memory operations are complete.

Type: Application

Filed: November 30, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE OPERATIONS TO BE PERFORMED BY CORRESPONDING STREAMING MULTIPROCESSORS

Publication number: 20240168763

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause two or more other computational operations to be performed by two or more streaming multiprocessors (SMs).

Type: Application

Filed: November 30, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
SOFTWARE-DIRECTED REGISTER FILE SHARING

Publication number: 20230144553

Abstract: A computing system including one or more processor and one or more memory that stores application code that configures the processor to execute an application. The system includes logic to identify high and low register utilization regions of the application code and insert register acquire instructions and register release instructions in the application code by the compiler, such that when executed by the processor, the application code borrows and returns registers to an inter-block register pool when execution enters a high and low register utilization region, respectively.

Type: Application

Filed: March 17, 2022

Publication date: May 11, 2023

Inventors: Sana Damani, Sean Treichler, Mark Stephenson
SOFTWARE-DIRECTED DIVERGENT BRANCH TARGET PRIORITIZATION

Publication number: 20230115044

Abstract: Instruction set architecture extensions to configure priority ordering of divergent target branch instructions on SIMT computing platforms to enable tools such as compilers (e.g., under influence of execution profilers) or human software developers to configure branch direction prioritization explicitly in code. Extensions for simple (two-way) branch instructions as well as multi-target (more than two branch target instructions) are disclosed.

Type: Application

Filed: January 4, 2022

Publication date: April 13, 2023

Applicant: NVIDIA Corp.

Inventors: Sana Damani, Sean Treichler, Mark Stephenson, Daniel Robert Johnson
System and method for runtime scheduling of GPU tasks

Patent number: 9891949

Abstract: A method for scheduling work for processing by a GPU is disclosed. The method includes accessing a work completion data structure and accessing a work tracking data structure. Dependency logic analysis is then performed using work completion data and work tracking data. Work items that have dependencies are then launched into the GPU by using a software work item launch interface.

Type: Grant

Filed: March 6, 2013

Date of Patent: February 13, 2018

Assignee: Nvidia Corporation

Inventors: Timothy Paul Lottes, Daniel Wexler, Craig Duttweiler, Sean Treichler, Luke Durant, Philip Cuadra
SYSTEM AND METHOD FOR RUNTIME SCHEDULING OF GPU TASKS

Publication number: 20140259016

Abstract: A method for scheduling work for processing by a GPU is disclosed. The method includes accessing a work completion data structure and accessing a work tracking data structure. Dependency logic analysis is then performed using work completion data and work tracking data. Work items that have dependencies are then launched into the GPU by using a software work item launch interface.

Type: Application

Filed: March 6, 2013

Publication date: September 11, 2014

Applicant: NVIDIA CORPORATION

Inventors: Timothy Paul LOTTES, Daniel WEXLER, Craig DUTTWEILER, Sean TREICHLER, Luke DURANT, Philip CUADRA
Method and system for selecting a set of parameters

Patent number: 7809782

Abstract: A method for selecting a set of parameters from a parameter space of a contemplated implementation of a pipelined processor for configuring the processor to generate an output word in response to each of a set of input words. The method includes determining a mapping between each set of parameters in the parameter space and the area of an integrated circuit implementation of the processor, and searching the parameter space to select a preferred set of the parameters that minimizes the area of the integrated circuit implementation subject to the constraints that each of the input word and the output word has specified format and that the preferred set of the parameters results in no more than a specified maximum error between the function of each of the input values and the approximation of the function of said each of the input values.

Type: Grant

Filed: September 6, 2006

Date of Patent: October 5, 2010

Assignee: NVIDIA Corporation

Inventors: Nicholas J. Foskett, Robert J. Prevett, Jr., Sean Treichler
SCALABLE SHADER ARCHITECTURE

Publication number: 20080094405

Abstract: A scalable shader architecture is disclosed. In accord with that architecture, a shader includes multiple shader pipelines, each of which can perform processing operations on rasterized pixel data. Shader pipelines can be functionally removed as required, thus preventing a defective shader pipeline from causing a chip rejection. The shader includes a shader distributor that processes rasterized pixel data and then selectively distributes the processed rasterized pixel data to the various shader pipelines, beneficially in a manner that balances workloads. A shader collector formats the outputs of the various shader pipelines into proper order to form shaded pixel data. A shader instruction processor (scheduler) programs the individual shader pipelines to perform their intended tasks.

Type: Application

Filed: December 14, 2007

Publication date: April 24, 2008

Inventors: Rui BASTOS, Karim Abdalla, Christian Rouet, Michael Toksvig, Johnny Rhoades, Roger Allen, John Tynefield, Emmett Kilgariff, Gary Tarolli, Brian Cabral, Craig Wittenbrink, Sean Treichler
Method and system for performing pipelined reciprocal and reciprocal square root operations

Patent number: 7117238

Abstract: A pipelined circuit configured to generate a Taylor's series approximation at least one function, preferably at least one of the reciprocal and the reciprocal square root, of an input value. The circuit is preloaded with or configured to generate a predetermined set of Taylor's series coefficients for each segment of the input value range. Other aspects of the invention are methods for determining preferred parameters for elements of such a circuit, a circuit designed in accordance with such a method, and a system (e.g., a pipelined graphics processor) for and method of pipelined graphics data processing using any embodiment of the circuit. The preferred parameters are determined by minimizing the circuit's size subject to constraints on input and output value format and output accuracy, assuming a specific function to be approximated and a specific degree for the approximation but allowing variation of parameters such as coefficient width and number of input value range segments.

Type: Grant

Filed: September 19, 2002

Date of Patent: October 3, 2006

Assignee: NVIDIA Corporation

Inventors: Nicholas J. Foskett, Robert J. Prevett, Jr., Sean Treichler
System and method for remotely configuring semiconductor functional circuits

Publication number: 20060004536

Abstract: The present invention systems and methods facilitate configuration of functional components included in a remotely located integrated circuit die. In one exemplary implementation, a die functional component reconfiguration request process is engaged in wherein a system requests a reconfiguration code from a remote centralized resource. A reconfiguration code production process is executed in which a request for a reconfiguration code and a permission indicator are received, validity of permission indicator is analyzed, and a reconfiguration code is provided if the permission indicator is valid. A die functional component configuration process is performed on the die when an appropriate reconfiguration code is received by the die. The functional component configuration process includes directing alteration of a functional component configuration. Workflow is diverted from disabled functional components to enabled functional components.

Type: Application

Filed: December 18, 2003

Publication date: January 5, 2006

Inventors: Michael Diamond, John Montrym, James Van Dyke, Michael Nagy, Sean Treichler
System and method for enhancing depth value processing in a graphics pipeline

Patent number: 6980208

Abstract: A system, method and computer program product are provided for performing depth testing and blending operations in a first mode and a second mode. In the first mode, a circuit processes a first number (m) of first pixels per clock cycle, each of the first pixels including both color values and depth values. In the second mode, the circuit processes a second number (n) of second pixels per clock cycle. Each of the second pixels includes the depth values and not the color values. Further, the second number (n) is greater than the first number (m).

Type: Grant

Filed: September 3, 2002

Date of Patent: December 27, 2005

Assignee: NVIDIA Corporation

Inventors: John Montrym, Jonah M. Alben, Sean Treichler, John M. Danskin, Gary Tarolli
Integrated circuit configuration system and method

Publication number: 20050261863

Abstract: The present invention systems and methods enable configuration of functional components in integrated circuits. A present invention system and method can flexibly change the operational characteristics of functional components in an integrated circuit die based upon a variety of factors, including if the die has a defective component. An indication of the defective functional component identification is received. A determination is made if the defective functional component is one of a plurality of similar functional components that can provide the same functionality. The other similar components can be examined to determine if they are parallel components to the defective functional component. The defective functional component is disabled if it is one of the plurality of similar functional components and another component can handle the workflow that would otherwise be assigned to the defective component. Workflow is diverted from the disabled component to other similar functional components.

Type: Application

Filed: December 18, 2003

Publication date: November 24, 2005

Inventors: James Van Dyke, John Montrym, Michael Nagy, Sean Treichler
System and method for increasing die yield

Publication number: 20050251358

Abstract: The present invention systems and methods facilitate increased die yields by flexibly changing the operational characteristics of functional components in an integrated circuit die. The present invention system and method enable integrated circuit chips with defective functional components to be salvaged. Defective functional components in the die are disabled in a manner that maintains the basic functionality of the chip. A chip is tested and a functional component configuration process is performed on the chip based upon results of the testing. If an indication of a defective functional component is received, the functional component is disabled. Workflow is diverted from disabled functional components to enabled functional components.

Type: Application

Filed: December 18, 2003

Publication date: November 10, 2005

Inventors: James Van Dyke, John Montrym, Michael Nagy, Sean Treichler
Integrated circuit configuration system and method

Publication number: 20050251761

Abstract: The present invention systems and methods enable configuration of functional components in integrated circuits. A present invention system and method can flexibly change the operational characteristics of functional components in an integrated circuit die based upon a variety of factors. In one embodiment, manufacturing yields, compatibility characteristics, performance requirements, and system health (e.g., the number of components operating properly) are factored into changes to the operational characteristics of functional components. In one exemplary implementation, the changes to operational characteristics of a functional component are coordinated with changes to other functional components. Workflow scheduling and distribution is also adjusted based upon the changes to the operational characteristics of the functional components. For example, a functional component configuration controller changes the operational characteristics settings and provides an indication to a workflow distribution component.

Type: Application

Filed: December 18, 2003

Publication date: November 10, 2005

Inventors: Michael Diamond, John Montrym, James Van Dyke, Michael Nagy, Sean Treichler
System and method for a high bandwidth-low latency memory controller

Patent number: 6957298

Abstract: A memory controller system is provided including a plurality of memory controller subsystems each coupled between memory and one of a plurality of computer components. Each memory controller subsystem includes at least one queue for managing pages in the memory. In use, each memory controller subsystem is capable of being loaded from the associated computer component independent of the state of the memory. Since high bandwidth and low latency are conflicting requirements in high performance memory systems, the present invention separates references from various computer components into multiple command streams. Each stream thus can hide activate bank preparation commands within its own stream for maximum bandwidth. A page context switch technique may be employed that allows instantaneous switching from one look ahead stream to another to allow low latency and high bandwidth while preserving maximum bank state from the previous stream.

Type: Grant

Filed: September 8, 2003

Date of Patent: October 18, 2005

Assignee: NVIDIA Corporation

Inventors: James M. Van Dyke, Nicholas J. Foskett, Brad Simeral, Sean Treichler
Scalable shader architecture

Publication number: 20050225554

Abstract: A scalable shader architecture is disclosed. In accord with that architecture, a shader includes multiple shader pipelines, each of which can perform processing operations on rasterized pixel data. Shader pipelines can be functionally removed as required, thus preventing a defective shader pipeline from causing a chip rejection. The shader includes a shader distributor that processes rasterized pixel data and then selectively distributes the processed rasterized pixel data to the various shader pipelines, beneficially in a manner that balances workloads. A shader collector formats the outputs of the various shader pipelines into proper order to form shaded pixel data. A shader instruction processor (scheduler) programs the individual shader pipelines to perform their intended tasks.

Type: Application

Filed: September 10, 2004

Publication date: October 13, 2005

Inventors: Rui Bastos, Karim Abdalla, Christian Rouet, Michael Toksvig, Johnny Rhoades, Roger Allen, John Tynefield, Emmett Kilgariff, Gary Tarolli, Brian Cabral, Craig Wittenbrink, Sean Treichler
High bandwidth-low latency memory controller

Patent number: 6647456

Abstract: At memory controller system is provided including a plurality of memory controller subsystems each coupled between memory and one of a plurality of computer components. Each memory controller subsystem includes at least one queue for managing pages in the memory. In use, each memory controller subsystem is capable of being loaded from the associated computer component independent of the state of the memory. Since high bandwidth and low latency are conflicting requirements in high performance memory systems, the present invention separates references from various computer components into multiple command streams. Each stream thus can hide precharge and activate bank preparation commands within its own stream for maximum bandwidth.

Type: Grant

Filed: February 23, 2001

Date of Patent: November 11, 2003

Assignee: NVIDIA Corporation

Inventors: James M. Van Dyke, Nicholas J. Foskett, Brad Simeral, Sean Treichler