Patents by Inventor Sean J. Treichler

Sean J. Treichler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210019185
    Abstract: One embodiment of the present invention sets forth a technique for encapsulating compute task state that enables out-of-order scheduling and execution of the compute tasks. The scheduling circuitry organizes the compute tasks into groups based on priority levels. The compute tasks may then be selected for execution using different scheduling schemes. Each group is maintained as a linked list of pointers to compute tasks that are encoded as task metadata (TMD) stored in memory. A TMD encapsulates the state and parameters needed to initialize, schedule, and execute a compute task.
    Type: Application
    Filed: October 5, 2020
    Publication date: January 21, 2021
    Inventors: Jerome F. DULUK, JR., Lacky V. SHAH, Sean J. TREICHLER
  • Patent number: 10795722
    Abstract: One embodiment of the present invention sets forth a technique for encapsulating compute task state that enables out-of-order scheduling and execution of the compute tasks. The scheduling circuitry organizes the compute tasks into groups based on priority levels. The compute tasks may then be selected for execution using different scheduling schemes. Each group is maintained as a linked list of pointers to compute tasks that are encoded as task metadata (TMD) stored in memory. A TMD encapsulates the state and parameters needed to initialize, schedule, and execute a compute task.
    Type: Grant
    Filed: November 9, 2011
    Date of Patent: October 6, 2020
    Assignee: NVIDIA Corporation
    Inventors: Jerome F. Duluk, Jr., Lacky V. Shah, Sean J. Treichler
  • Patent number: 9921873
    Abstract: A technique for controlling the distribution of compute task processing in a multi-threaded system encodes each processing task as task metadata (TMD) stored in memory. The TMD includes work distribution parameters specifying how the processing task should be distributed for processing. Scheduling circuitry selects a task for execution when entries of a work queue for the task have been written. The work distribution parameters may define a number of work queue entries needed before a cooperative thread array” (“CTA”) may be launched to process the work queue entries according to the compute task. The work distribution parameters may define a number of CTAs that are launched to process the same work queue entries. Finally, the work distribution parameters may define a step size that is used to update pointers to the work queue entries.
    Type: Grant
    Filed: January 31, 2012
    Date of Patent: March 20, 2018
    Assignee: NVIDIA Corporation
    Inventors: Lacky V. Shah, Karim M. Abdalla, Sean J. Treichler, Abraham B. de Waal
  • Patent number: 9442759
    Abstract: A time slice group (TSG) is a grouping of different streams of work (referred to herein as “channels”) that share the same context information. The set of channels belonging to a TSG are processed in a pre-determined order. However, when a channel stalls while processing, the next channel with independent work can be switched to fully load the parallel processing unit. Importantly, because each channel in the TSG shares the same context information, a context switch operation is not needed when the processing of a particular channel in the TSG stops and the processing of a next channel in the TSG begins. Therefore, multiple independent streams of work are allowed to run concurrently within a single context increasing utilization of parallel processing units.
    Type: Grant
    Filed: December 9, 2011
    Date of Patent: September 13, 2016
    Assignee: NVIDIA Corporation
    Inventors: Samuel H. Duncan, Lacky V. Shah, Sean J. Treichler, Daniel Elliot Wexler, Jerome F. Duluk, Jr., Philip Browning Johnson, Jonathon Stuart Ramsay Evans
  • Patent number: 9147224
    Abstract: One embodiment of the present invention sets forth a technique for receiving versions of state objects at one or more stages in a processing pipeline. The method includes receiving a first version of a state object at a first stage in the processing pipeline, determining that the first version of the state object is relevant to the first stage, incrementing a first reference counter associated with the first version of the state object, assigning the first version of the state object to work requests that arrive at the first stage subsequent to the receipt of the first version of the state object, and transmitting the first version of the state object to a second stage in the processing pipeline.
    Type: Grant
    Filed: November 11, 2011
    Date of Patent: September 29, 2015
    Assignee: NVIDIA Corporation
    Inventors: Sean J. Treichler, Lacky V. Shah, Daniel Elliot Wexler
  • Patent number: 9098383
    Abstract: One embodiment of the present invention sets forth a crossbar unit that is coupled to a plurality of client subsystems. The crossbar unit is configured to transmit data packets between the client subsystems and includes a high-bandwidth channel and a narrow-bandwidth channel. The high-bandwidth channel is used for transmitting large data packets, while the narrow-bandwidth is used for transmitting smaller data packets. The transmission of data packets may be prioritized based on the source and destination clients as well as the type of data being transmitted. Further, the crossbar unit includes a buffer mechanism for buffering data packets received from source clients until those data packets can be received by the destination clients.
    Type: Grant
    Filed: June 2, 2010
    Date of Patent: August 4, 2015
    Assignee: NVIDIA CORPORATION
    Inventors: Sean J. Treichler, Dane T. Mrazek, Yin Fung (David) Tang, David B. Glasco, Colyn Scott Case, Emmett M. Kilgariff
  • Patent number: 8984183
    Abstract: One embodiment of the present invention sets forth a technique for enabling the insertion of generated tasks into a scheduling pipeline of a multiple processor system allows a compute task that is being executed to dynamically generate a dynamic task and notify a scheduling unit of the multiple processor system without intervention by a CPU. A reflected notification signal is generated in response to a write request when data for the dynamic task is written to a queue. Additional reflected notification signals are generated for other events that occur during execution of a compute task, e.g., to invalidate cache entries storing data for the compute task and to enable scheduling of another compute task.
    Type: Grant
    Filed: December 16, 2011
    Date of Patent: March 17, 2015
    Assignee: Nvidia Corporation
    Inventors: Timothy John Purcell, Lacky V. Shah, Jerome F. Duluk, Jr., Sean J. Treichler, Karim M. Abdalla, Philip Alexander Cuadra, Brian Pharris
  • Patent number: 8976185
    Abstract: One embodiment of the present invention sets forth a technique for executing an operation once work associated with a version of a state object has been completed. The method includes receiving the version of the state object at a first stage in a processing pipeline, where the version of the state object is associated with a reference count object, determining that the version of the state object is relevant to the first stage, incrementing a counter included in the reference count object, transmitting the version of the state object to a second stage in the processing pipeline, processing work associated with the version of the state object, decrementing the counter, determining that the counter is equal to zero, and in response, executing an operation specified by the reference count object.
    Type: Grant
    Filed: November 11, 2011
    Date of Patent: March 10, 2015
    Assignee: NVIDIA Corporation
    Inventors: Sean J. Treichler, Lacky V. Shah, Daniel Elliot Wexler
  • Patent number: 8948167
    Abstract: One embodiment of the present invention is a control unit for distributing packets of work to one or more consumer of works. The control unit is configured to assign at least one processing domain from a set of processing domains to each consumer included in the one or more consumers, receive a plurality of packets of work from at least one producer of work, wherein each packet of work is associated with a processing domain from the set of processing domains, and a first packet of work associated with a first processing domain can be processed by the one or more consumers independently of a second packet of work associated with a second processing domain, identify a first consumer that has been assigned the first processing domain, and transmit the first packet of work to the first consumer for processing.
    Type: Grant
    Filed: September 15, 2011
    Date of Patent: February 3, 2015
    Assignee: NVIDIA Corporation
    Inventors: Lacky V. Shah, Sean J. Treichler, Abraham B. de Waal
  • Patent number: 8941653
    Abstract: One embodiment of the present invention sets forth a technique for rendering graphics primitives in parallel while maintaining the API primitive ordering. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives concurrently to multiple rasterizers at rates of multiple primitives per clock while maintaining the primitive ordering for each pixel. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.
    Type: Grant
    Filed: November 18, 2013
    Date of Patent: January 27, 2015
    Assignee: NVIDIA Corporation
    Inventors: Steven E. Molnar, Emmett M. Kilgariff, John S. Rhoades, Timothy John Purcell, Sean J. Treichler, Ziyad S. Hakura, Franklin C. Crow, James C. Bowman
  • Patent number: 8872833
    Abstract: The present invention systems and methods enable configuration of functional components in integrated circuits. A present invention system and method can flexibly change the operational characteristics of functional components in an integrated circuit die based upon a variety of factors, including if the die has a defective component. An indication of the defective functional component identification is received. A determination is made if the defective functional component is one of a plurality of similar functional components that can provide the same functionality. The other similar components can be examined to determine if they are parallel components to the defective functional component. The defective functional component is disabled if it is one of the plurality of similar functional components and another component can handle the workflow that would otherwise be assigned to the defective component. Workflow is diverted from the disabled component to other similar functional components.
    Type: Grant
    Filed: December 18, 2003
    Date of Patent: October 28, 2014
    Assignee: Nvidia Corporation
    Inventors: James M. Van Dyke, John S. Montrym, Michael B. Nagy, Sean J. Treichler
  • Patent number: 8788996
    Abstract: The present invention systems and methods enable configuration of functional components in integrated circuits. A present invention system and method can flexibly change the operational characteristics of functional components in an integrated circuit die based upon a variety of factors. In one embodiment, manufacturing yields, compatibility characteristics, performance requirements, and system health (e.g., the number of components operating properly) are factored into changes to the operational characteristics of functional components. In one exemplary implementation, the changes to operational characteristics of a functional component are coordinated with changes to other functional components. Workflow scheduling and distribution is also adjusted based upon the changes to the operational characteristics of the functional components. For example, a functional component configuration controller changes the operational characteristics settings and provides an indication to a workflow distribution component.
    Type: Grant
    Filed: December 18, 2003
    Date of Patent: July 22, 2014
    Assignee: Nvidia Corporation
    Inventors: Michael B. Diamond, John S. Montrym, James M. Van Dyke, Michael B. Nagy, Sean J. Treichler
  • Patent number: 8775112
    Abstract: The present invention systems and methods facilitate increased die yields by flexibly changing the operational characteristics of functional components in an integrated circuit die. The present invention system and method enable integrated circuit chips with defective functional components to be salvaged. Defective functional components in the die are disabled in a manner that maintains the basic functionality of the chip. A chip is tested and a functional component configuration process is performed on the chip based upon results of the testing. If an indication of a defective functional component is received, the functional component is disabled. Workflow is diverted from disabled functional components to enabled functional components.
    Type: Grant
    Filed: December 18, 2003
    Date of Patent: July 8, 2014
    Assignee: Nvidia Corporation
    Inventors: James M. Van Dyke, John S. Montrym, Michael B. Nagy, Sean J. Treichler
  • Patent number: 8768642
    Abstract: The present invention systems and methods facilitate configuration of functional components included in a remotely located integrated circuit die. In one exemplary implementation, a die functional component reconfiguration request process is engaged in wherein a system requests a reconfiguration code from a remote centralized resource. A reconfiguration code production process is executed in which a request for a reconfiguration code and a permission indicator are received, validity of permission indicator is analyzed, and a reconfiguration code is provided if the permission indicator is valid. A die functional component configuration process is performed on the die when an appropriate reconfiguration code is received by the die. The functional component configuration process includes directing alteration of a functional component configuration. Workflow is diverted from disabled functional components to enabled functional components.
    Type: Grant
    Filed: December 18, 2003
    Date of Patent: July 1, 2014
    Assignee: Nvidia Corporation
    Inventors: Michael B. Diamond, John S. Montrym, James M. Van Dyke, Michael B. Nagy, Sean J. Treichler
  • Patent number: 8760460
    Abstract: One embodiment of the present invention sets forth a technique for using a shared memory to store hardware-managed virtual buffers. A circular buffer is allocated within a general-purpose multi-use cache for storage of primitive attribute data rather than having a dedicated buffer for the storage of the primitive attribute data. The general-purpose multi-use cache is also configured to store other graphics data sinces the space requirement for primitive attribute data storage is highly variable, depending on the number of attributes and the size of primitives. Entries in the circular buffer are allocated as needed and released and invalidated after the primitive attribute data has been consumed. An address to the circular buffer entry is transmitted along with primitive descriptors from object-space processing to the distributed processing in screen-space.
    Type: Grant
    Filed: May 4, 2010
    Date of Patent: June 24, 2014
    Assignee: NVIDIA Corporation
    Inventors: Emmett M. Kilgariff, Steven E. Molnar, Sean J. Treichler, Johnny S. Rhoades, Gernot Schaufler, Dale L. Kirkland, Cynthia Ann Edgeworth Allison, Karl M. Wurstner, Timothy John Purcell
  • Publication number: 20140152652
    Abstract: One embodiment of the present invention sets forth a technique for rendering graphics primitives in parallel while maintaining the API primitive ordering. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives concurrently to multiple rasterizers at rates of multiple primitives per clock while maintaining the primitive ordering for each pixel. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.
    Type: Application
    Filed: November 18, 2013
    Publication date: June 5, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Steven E. MOLNAR, Emmett M. KILGARIFF, John S. RHOADES, Timothy John PURCELL, Sean J. TREICHLER, Ziyad S. HAKURA, Franklin C. CROW, James C. BOWMAN
  • Patent number: 8587581
    Abstract: One embodiment of the present invention sets forth a technique for rendering graphics primitives in parallel while maintaining the API primitive ordering. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives concurrently to multiple rasterizers at rates of multiple primitives per clock while maintaining the primitive ordering for each pixel. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.
    Type: Grant
    Filed: October 15, 2009
    Date of Patent: November 19, 2013
    Assignee: Nvidia Corporation
    Inventors: Steven E. Molnar, Emmett M. Kilgariff, Johnny S. Rhoades, Timothy John Purcell, Sean J. Treichler, Ziyad S. Hakura, Franklin C. Crow, James C. Bowman
  • Patent number: 8547993
    Abstract: Methods, apparatuses, and systems are presented for performing asynchronous communications involving using an asynchronous interface to send signals between a source device and a plurality of client devices, the source device and the plurality of client devices being part of a processing unit capable of performing graphics operations, the source device being coupled to the plurality of client devices using the asynchronous interface, wherein the asynchronous interface includes at least one request signal, at least one address signal, at least one acknowledge signal, and at least one data signal, and wherein the asynchronous interface operates in accordance with at least one programmable timing characteristic associated with the source device.
    Type: Grant
    Filed: August 10, 2006
    Date of Patent: October 1, 2013
    Assignee: NVIDIA Corporation
    Inventors: Lincoln G. Garlick, Richard A. Silkebakken, Prakash G. Apte, Paolo E. Sabella, Samuel H. Duncan, Dennis K. Ma, Sean J. Treichler
  • Publication number: 20130198759
    Abstract: A technique for controlling the distribution of compute task processing in a multi-threaded system encodes each processing task as task metadata (TMD) stored in memory. The TMD includes work distribution parameters specifying how the processing task should be distributed for processing. Scheduling circuitry selects a task for execution when entries of a work queue for the task have been written. The work distribution parameters may define a number of work queue entries needed before a cooperative thread array” (“CTA”) may be launched to process the work queue entries according to the compute task. The work distribution parameters may define a number of CTAS that are launched to process the same work queue entries. Finally, the work distribution parameters may define a step size that is used to update pointers to the work queue entries.
    Type: Application
    Filed: January 31, 2012
    Publication date: August 1, 2013
    Inventors: Lacky V. SHAH, Karim M. ABDALLA, Sean J. TREICHLER, Abraham B. DE WAAL
  • Publication number: 20130160021
    Abstract: One embodiment of the present invention sets forth a technique for enabling the insertion of generated tasks into a scheduling pipeline of a multiple processor system allows a compute task that is being executed to dynamically generate a dynamic task and notify a scheduling unit of the multiple processor system without intervention by a CPU. A reflected notification signal is generated in response to a write request when data for the dynamic task is written to a queue. Additional reflected notification signals are generated for other events that occur during execution of a compute task, e.g., to invalidate cache entries storing data for the compute task and to enable scheduling of another compute task.
    Type: Application
    Filed: December 16, 2011
    Publication date: June 20, 2013
    Inventors: Timothy John PURCELL, Lacky V. Shah, Jerome F. Duluk, JR., Sean J. Treichler, Karim M. Abdalla, Philip Alexander Cuadra, Brian Pharris