Patents by Inventor Sean J. Treichler

Sean J. Treichler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20130152093
    Abstract: A time slice group (TSG) is a grouping of different streams of work (referred to herein as “channels”) that share the same context information. The set of channels belonging to a TSG are processed in a pre-determined order. However, when a channel stalls while processing, the next channel with independent work can be switched to fully load the parallel processing unit. Importantly, because each channel in the TSG shares the same context information, a context switch operation is not needed when the processing of a particular channel in the TSG stops and the processing of a next channel in the TSG begins. Therefore, multiple independent streams of work are allowed to run concurrently within a single context increasing utilization of parallel processing units.
    Type: Application
    Filed: December 9, 2011
    Publication date: June 13, 2013
    Inventors: Samuel H. DUNCAN, Lacky V. SHAH, Sean J. TREICHLER, Daniel Elliot WEXLER, Jerome F. DULUK, JR., Phillip Browning JOHNSON, Jonathon Stuart Ramsay EVANS
  • Publication number: 20130120413
    Abstract: One embodiment of the present invention sets forth a technique for receiving versions of state objects at one or more stages in a processing pipeline. The method includes receiving a first version of a state object at a first stage in the processing pipeline, determining that the first version of the state object is relevant to the first stage, incrementing a first reference counter associated with the first version of the state object, assigning the first version of the state object to work requests that arrive at the first stage subsequent to the receipt of the first version of the state object, and transmitting the first version of the state object to a second stage in the processing pipeline.
    Type: Application
    Filed: November 11, 2011
    Publication date: May 16, 2013
    Inventors: Sean J. TREICHLER, Lacky V. Shah, Daniel Elliot Wexler
  • Publication number: 20130120412
    Abstract: One embodiment of the present invention sets forth a technique for executing an operation once work associated with a version of a state object has been completed. The method includes receiving the version of the state object at a first stage in a processing pipeline, where the version of the state object is associated with a reference count object, determining that the version of the state object is relevant to the first stage, incrementing a counter included in the reference count object, transmitting the version of the state object to a second stage in the processing pipeline, processing work associated with the version of the state object, decrementing the counter, determining that the counter is equal to zero, and in response, executing an operation specified by the reference count object.
    Type: Application
    Filed: November 11, 2011
    Publication date: May 16, 2013
    Inventors: Sean J. TREICHLER, Lacky V. Shah, Daniel Elliot Wexler
  • Publication number: 20130117751
    Abstract: One embodiment of the present invention sets forth a technique for encapsulating compute task state that enables out-of-order scheduling and execution of the compute tasks. The scheduling circuitry organizes the compute tasks into groups based on priority levels. The compute tasks may then be selected for execution using different scheduling schemes. Each group is maintained as a linked list of pointers to compute tasks that are encoded as task metadata (TMD) stored in memory. A TMD encapsulates the state and parameters needed to initialize, schedule, and execute a compute task.
    Type: Application
    Filed: November 9, 2011
    Publication date: May 9, 2013
    Inventors: Jerome F. DULUK, JR., Lacky V. SHAH, Sean J. TREICHLER
  • Publication number: 20130070760
    Abstract: One embodiment of the present invention is a control unit for distributing packets of work to one or more consumer of works. The control unit is configured to assign at least one processing domain from a set of processing domains to each consumer included in the one or more consumers, receive a plurality of packets of work from at least one producer of work, wherein each packet of work is associated with a processing domain from the set of processing domains, and a first packet of work associated with a first processing domain can be processed by the one or more consumers independently of a second packet of work associated with a second processing domain, identify a first consumer that has been assigned the first processing domain, and transmit the first packet of work to the first consumer for processing.
    Type: Application
    Filed: September 15, 2011
    Publication date: March 21, 2013
    Inventors: Lacky V. SHAH, Sean J. Treichler, Abraham B. de Waal
  • Patent number: 8327071
    Abstract: In a multiprocessor system level 2 caches are positioned on the memory side of a routing crossbar rather than on the processor side of the routing crossbar. This configuration permits the processors to store messages directly into each other's caches rather than into system memory or their own coherent caches. Therefore, inter-processor communication latency is reduced.
    Type: Grant
    Filed: November 13, 2007
    Date of Patent: December 4, 2012
    Assignee: NVIDIA Corporation
    Inventors: John M. Danskin, Emmett M. Kilgariff, David B. Glasco, Sean J. Treichler
  • Patent number: 8307165
    Abstract: One embodiment of the invention sets forth a mechanism for increasing the number of read commands or write commands transmitted to an activated bank page in the DRAM. Read requests and dirty notifications are organized in a read request sorter or a dirty notification sorter, respectively, and each sorter includes multiple sets with entries that may be associated with different bank pages in the DRAM. Read requests and dirty notifications are stored in read request lists and dirty notification lists, where each list is associated with a specific bank page. When a bank page is activated to process read requests, read commands associated with read requests stored in a particular read request list are transmitted to the bank page. When a bank page is activated to process dirty notifications, write commands associated with dirty notifications stored in a particular dirty notification list are transmitted to the bank page.
    Type: Grant
    Filed: July 10, 2009
    Date of Patent: November 6, 2012
    Assignee: Nvidia Corporation
    Inventors: Shane Keil, John H. Edmondson, Sean J. Treichler
  • Patent number: 8060765
    Abstract: A power monitor for electronic devices, such as computer chips, is used to estimate the power consumption and to compare the estimated power consumption against the power budget. The estimated power consumption is based on activity signals from various functional blocks of the computer chip. The activity signals that are monitored correlate accurately to the total number of flip-flops that are active at a given time. If the estimated power consumption exceeds the power budget, the speed of the clock signals supplied to the computer chip is reduced.
    Type: Grant
    Filed: November 2, 2006
    Date of Patent: November 15, 2011
    Assignee: NVIDIA Corporation
    Inventors: Hungse Cha, Robert J. Hasslen, III, John A. Robinson, Sean J. Treichler, Abdulkadir Utku Diril
  • Patent number: 8041841
    Abstract: Method and interface for configuring a link is described. A transceiver has configuration registers. The configuration registers are read to determine capability of the transceiver. An application is selected, and the configuration registers of the transceiver are configured responsive to the application selected. A protocol having initialization, transmit and receive portions is described to facilitate configuration operations, such as reads and writes of configuration registers, for such a link.
    Type: Grant
    Filed: April 3, 2009
    Date of Patent: October 18, 2011
    Assignee: NVIDIA Corporation
    Inventors: Sean J. Treichler, Edward W. Liu
  • Patent number: 7966439
    Abstract: A system controller includes a memory controller and a host interface residing in different clock domains. There is a time delay between the time when the memory controller issues a read command to a memory and the data becoming present and available at the host interface. The memory controller generates an alarm message at or near the time that it issues the read command. The alarm message indicates to the host interface the time that the data is available for transfer to a host.
    Type: Grant
    Filed: November 24, 2004
    Date of Patent: June 21, 2011
    Assignee: Nvidia Corporation
    Inventors: Sean J. Treichler, Brad W. Simeral, Roman Surgutchick, Anand Srinivasan, Dmitry Vyshetsky
  • Patent number: 7958483
    Abstract: An embodiment of the invention includes receiving an indicator of an activity-level of a functional block within an electronic chip. The functional block is configured to receive a clock signal from a clock signal generator. The clock signal to at least a portion of a functional block is disabled for a number of inactive clock cycles during a clock segment of the clock signal. The clock segment has a specified number of clock cycles and the number of inactive clock cycles is defined based on the activity-level and the specified number of clock cycles of the clock segment.
    Type: Grant
    Filed: December 21, 2006
    Date of Patent: June 7, 2011
    Assignee: NVIDIA Corporation
    Inventors: Jonah M. Alben, Robert J. Hasslen, III, Sean J. Treichler
  • Publication number: 20110090220
    Abstract: One embodiment of the present invention sets forth a technique for rendering graphics primitives in parallel while maintaining the API primitive ordering. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives concurrently to multiple rasterizers at rates of multiple primitives per clock while maintaining the primitive ordering for each pixel. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.
    Type: Application
    Filed: October 15, 2009
    Publication date: April 21, 2011
    Inventors: Steven E. Molnar, Emmett M. Kilgariff, Johnny S. Rhoades, Timothy John Purcell, Sean J. Treichler, Ziyad S. Hakura, Franklin C. Crow, James C. Bowman
  • Patent number: 7852340
    Abstract: A scalable shader architecture is disclosed. In accord with that architecture, a shader includes multiple shader pipelines, each of which can perform processing operations on rasterized pixel data. Shader pipelines can be functionally removed as required, thus preventing a defective shader pipeline from causing a chip rejection. The shader includes a shader distributor that processes rasterized pixel data and then selectively distributes the processed rasterized pixel data to the various shader pipelines, beneficially in a manner that balances workloads. A shader collector formats the outputs of the various shader pipelines into proper order to form shaded pixel data. A shader instruction processor (scheduler) programs the individual shader pipelines to perform their intended tasks.
    Type: Grant
    Filed: December 14, 2007
    Date of Patent: December 14, 2010
    Assignee: NVIDIA Corporation
    Inventors: Rui M. Bastos, Karim M. Abdalla, Christian Rouet, Michael J.M. Toksvig, Johnny S Rhoades, Roger L. Allen, John Douglas Tynefield, Jr., Emmett M. Kilgariff, Gary M. Tarolli, Brian Cabral, Craig Michael Wittenbrink, Sean J. Treichler
  • Patent number: 7821520
    Abstract: A new, useful, and non-obvious shader processor architecture having a shader register file that acts both as an internal storage register file for temporarily storing data within the shader processor and as a First-In First-Out (FIFO) buffer for a subsequent module. Some embodiments include automatic, programmable hardware conversion between numeric formats, for example, between floating point data and fixed point data.
    Type: Grant
    Filed: December 10, 2004
    Date of Patent: October 26, 2010
    Assignee: NVIDIA Corporation
    Inventors: Rui M. Bastos, Karim M. Abdalla, Sean J. Treichler, Emmett M. Kilgariff
  • Patent number: 7747915
    Abstract: A system and method for increasing the yield of integrated circuits containing memory partitions the memory into regions and then independently tests each region to determine which, if any, of the memory regions contain one or more memory failures. The test results are stored for later retrieval. Prior to using the memory, software retrieves the test results and uses only the memory sections that contain no memory failures. A consequence of this approach is that integrated circuits containing memory that would have been discarded for containing memory failures now may be used. This approach also does not significantly impact die area.
    Type: Grant
    Filed: January 5, 2009
    Date of Patent: June 29, 2010
    Assignee: NVIDIA Corporation
    Inventors: Anthony M. Tamasi, Oren Rubenstein, Srihari Vegesna, Jue Wu, Sean J. Treichler
  • Patent number: 7562205
    Abstract: A virtual address translation table and an on-chip address cache are usable for translating virtual addresses to physical addresses. Address translation information is provided using a cluster that is associated with some range of virtual addresses and that can be used to translate any virtual address in its range to a physical address, where the sizes of the ranges mapped by different clusters may be different. Clusters are stored in an address translation table that is indexed by virtual address so that, starting from any valid virtual address, the appropriate cluster for translating that address can be retrieved from the translation table. Recently retrieved clusters are stored in an on-chip cache, and a cached cluster can be used to translate any virtual address in its range without accessing the address translation table again.
    Type: Grant
    Filed: August 23, 2007
    Date of Patent: July 14, 2009
    Assignee: Nvidia Corporation
    Inventors: Colyn S. Case, Dmitry Vyshetsky, Sean J. Treichler
  • Publication number: 20090164841
    Abstract: A system and method for increasing the yield of integrated circuits containing memory partitions the memory into regions and then independently tests each region to determine which, if any, of the memory regions contain one or more memory failures. The test results are stored for later retrieval. Prior to using the memory, software retrieves the test results and uses only the memory sections that contain no memory failures. A consequence of this approach is that integrated circuits containing memory that would have been discarded for containing memory failures now may be used. This approach also does not significantly impact die area.
    Type: Application
    Filed: January 5, 2009
    Publication date: June 25, 2009
    Inventors: Anthony M. Tamasi, Oren Rubinstein, Srihari Vegesna, Jue Wu, Sean J. Treichler
  • Patent number: 7523209
    Abstract: Method and interface for configuring a link is described. A transceiver has configuration registers. The configuration registers are read to determine capability of the transceiver. An application is selected, and the configuration registers of the transceiver are configured responsive to the application selected. A protocol having initialization, transmit and receive portions is described to facilitate configuration operations, such as reads and writes of configuration registers, for such a link.
    Type: Grant
    Filed: September 4, 2002
    Date of Patent: April 21, 2009
    Assignee: NVIDIA Corporation
    Inventors: Sean J. Treichler, Edward W. Liu
  • Patent number: 7478289
    Abstract: A system and method for increasing the yield of integrated circuits containing memory partitions the memory into regions and then independently tests each region to determine which, if any, of the memory regions contain one or more memory failures. The test results are stored for later retrieval. Prior to using the memory, software retrieves the test results and uses only the memory sections that contain no memory failures. A consequence of this approach is that integrated circuits containing memory that would have been discarded for containing memory failures now may be used. This approach also does not significantly impact die area.
    Type: Grant
    Filed: June 3, 2005
    Date of Patent: January 13, 2009
    Assignee: NVIDIA Corporation
    Inventors: Anthony M. Tamasi, Oren Rubenstein, Srihari Vegesna, Jue Wu, Sean J. Treichler
  • Patent number: 7406546
    Abstract: One embodiment of a long-distance synchronous bus includes a sending unit and a receiving unit. The sending unit and receiving unit are configured to use credit-based handshaking signals to regulate data flow between themselves. The receiving unit includes a skid buffer for storing data packets received from the sending unit. The sending unit transmits one data packet to the receiving unit for each credit in possession and consumes one credit for each such transmitted data packet. The receiving unit transmits one credit to the sending unit for each data packet that is read out of the skid buffer. In another embodiment, transmitted data may be broadcast to multiple receiving units by routing the data from the sending unit to the multiple receiving units and maintaining separate credit-based handshaking signals for each receiving unit.
    Type: Grant
    Filed: August 17, 2005
    Date of Patent: July 29, 2008
    Assignee: NVIDIA Corporation
    Inventors: Blaise A. Vignon, Sean J. Treichler