Patents by Inventor Sean J. Treichler
Sean J. Treichler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20130152093Abstract: A time slice group (TSG) is a grouping of different streams of work (referred to herein as “channels”) that share the same context information. The set of channels belonging to a TSG are processed in a pre-determined order. However, when a channel stalls while processing, the next channel with independent work can be switched to fully load the parallel processing unit. Importantly, because each channel in the TSG shares the same context information, a context switch operation is not needed when the processing of a particular channel in the TSG stops and the processing of a next channel in the TSG begins. Therefore, multiple independent streams of work are allowed to run concurrently within a single context increasing utilization of parallel processing units.Type: ApplicationFiled: December 9, 2011Publication date: June 13, 2013Inventors: Samuel H. DUNCAN, Lacky V. SHAH, Sean J. TREICHLER, Daniel Elliot WEXLER, Jerome F. DULUK, JR., Phillip Browning JOHNSON, Jonathon Stuart Ramsay EVANS
-
Publication number: 20130120413Abstract: One embodiment of the present invention sets forth a technique for receiving versions of state objects at one or more stages in a processing pipeline. The method includes receiving a first version of a state object at a first stage in the processing pipeline, determining that the first version of the state object is relevant to the first stage, incrementing a first reference counter associated with the first version of the state object, assigning the first version of the state object to work requests that arrive at the first stage subsequent to the receipt of the first version of the state object, and transmitting the first version of the state object to a second stage in the processing pipeline.Type: ApplicationFiled: November 11, 2011Publication date: May 16, 2013Inventors: Sean J. TREICHLER, Lacky V. Shah, Daniel Elliot Wexler
-
Publication number: 20130120412Abstract: One embodiment of the present invention sets forth a technique for executing an operation once work associated with a version of a state object has been completed. The method includes receiving the version of the state object at a first stage in a processing pipeline, where the version of the state object is associated with a reference count object, determining that the version of the state object is relevant to the first stage, incrementing a counter included in the reference count object, transmitting the version of the state object to a second stage in the processing pipeline, processing work associated with the version of the state object, decrementing the counter, determining that the counter is equal to zero, and in response, executing an operation specified by the reference count object.Type: ApplicationFiled: November 11, 2011Publication date: May 16, 2013Inventors: Sean J. TREICHLER, Lacky V. Shah, Daniel Elliot Wexler
-
Publication number: 20130117751Abstract: One embodiment of the present invention sets forth a technique for encapsulating compute task state that enables out-of-order scheduling and execution of the compute tasks. The scheduling circuitry organizes the compute tasks into groups based on priority levels. The compute tasks may then be selected for execution using different scheduling schemes. Each group is maintained as a linked list of pointers to compute tasks that are encoded as task metadata (TMD) stored in memory. A TMD encapsulates the state and parameters needed to initialize, schedule, and execute a compute task.Type: ApplicationFiled: November 9, 2011Publication date: May 9, 2013Inventors: Jerome F. DULUK, JR., Lacky V. SHAH, Sean J. TREICHLER
-
Publication number: 20130070760Abstract: One embodiment of the present invention is a control unit for distributing packets of work to one or more consumer of works. The control unit is configured to assign at least one processing domain from a set of processing domains to each consumer included in the one or more consumers, receive a plurality of packets of work from at least one producer of work, wherein each packet of work is associated with a processing domain from the set of processing domains, and a first packet of work associated with a first processing domain can be processed by the one or more consumers independently of a second packet of work associated with a second processing domain, identify a first consumer that has been assigned the first processing domain, and transmit the first packet of work to the first consumer for processing.Type: ApplicationFiled: September 15, 2011Publication date: March 21, 2013Inventors: Lacky V. SHAH, Sean J. Treichler, Abraham B. de Waal
-
Patent number: 8327071Abstract: In a multiprocessor system level 2 caches are positioned on the memory side of a routing crossbar rather than on the processor side of the routing crossbar. This configuration permits the processors to store messages directly into each other's caches rather than into system memory or their own coherent caches. Therefore, inter-processor communication latency is reduced.Type: GrantFiled: November 13, 2007Date of Patent: December 4, 2012Assignee: NVIDIA CorporationInventors: John M. Danskin, Emmett M. Kilgariff, David B. Glasco, Sean J. Treichler
-
Patent number: 8307165Abstract: One embodiment of the invention sets forth a mechanism for increasing the number of read commands or write commands transmitted to an activated bank page in the DRAM. Read requests and dirty notifications are organized in a read request sorter or a dirty notification sorter, respectively, and each sorter includes multiple sets with entries that may be associated with different bank pages in the DRAM. Read requests and dirty notifications are stored in read request lists and dirty notification lists, where each list is associated with a specific bank page. When a bank page is activated to process read requests, read commands associated with read requests stored in a particular read request list are transmitted to the bank page. When a bank page is activated to process dirty notifications, write commands associated with dirty notifications stored in a particular dirty notification list are transmitted to the bank page.Type: GrantFiled: July 10, 2009Date of Patent: November 6, 2012Assignee: Nvidia CorporationInventors: Shane Keil, John H. Edmondson, Sean J. Treichler
-
Patent number: 8060765Abstract: A power monitor for electronic devices, such as computer chips, is used to estimate the power consumption and to compare the estimated power consumption against the power budget. The estimated power consumption is based on activity signals from various functional blocks of the computer chip. The activity signals that are monitored correlate accurately to the total number of flip-flops that are active at a given time. If the estimated power consumption exceeds the power budget, the speed of the clock signals supplied to the computer chip is reduced.Type: GrantFiled: November 2, 2006Date of Patent: November 15, 2011Assignee: NVIDIA CorporationInventors: Hungse Cha, Robert J. Hasslen, III, John A. Robinson, Sean J. Treichler, Abdulkadir Utku Diril
-
Patent number: 8041841Abstract: Method and interface for configuring a link is described. A transceiver has configuration registers. The configuration registers are read to determine capability of the transceiver. An application is selected, and the configuration registers of the transceiver are configured responsive to the application selected. A protocol having initialization, transmit and receive portions is described to facilitate configuration operations, such as reads and writes of configuration registers, for such a link.Type: GrantFiled: April 3, 2009Date of Patent: October 18, 2011Assignee: NVIDIA CorporationInventors: Sean J. Treichler, Edward W. Liu
-
Patent number: 7966439Abstract: A system controller includes a memory controller and a host interface residing in different clock domains. There is a time delay between the time when the memory controller issues a read command to a memory and the data becoming present and available at the host interface. The memory controller generates an alarm message at or near the time that it issues the read command. The alarm message indicates to the host interface the time that the data is available for transfer to a host.Type: GrantFiled: November 24, 2004Date of Patent: June 21, 2011Assignee: Nvidia CorporationInventors: Sean J. Treichler, Brad W. Simeral, Roman Surgutchick, Anand Srinivasan, Dmitry Vyshetsky
-
Patent number: 7958483Abstract: An embodiment of the invention includes receiving an indicator of an activity-level of a functional block within an electronic chip. The functional block is configured to receive a clock signal from a clock signal generator. The clock signal to at least a portion of a functional block is disabled for a number of inactive clock cycles during a clock segment of the clock signal. The clock segment has a specified number of clock cycles and the number of inactive clock cycles is defined based on the activity-level and the specified number of clock cycles of the clock segment.Type: GrantFiled: December 21, 2006Date of Patent: June 7, 2011Assignee: NVIDIA CorporationInventors: Jonah M. Alben, Robert J. Hasslen, III, Sean J. Treichler
-
Publication number: 20110090220Abstract: One embodiment of the present invention sets forth a technique for rendering graphics primitives in parallel while maintaining the API primitive ordering. Multiple, independent geometry units perform geometry processing concurrently on different graphics primitives. A primitive distribution scheme delivers primitives concurrently to multiple rasterizers at rates of multiple primitives per clock while maintaining the primitive ordering for each pixel. The multiple, independent rasterizer units perform rasterization concurrently on one or more graphics primitives, enabling the rendering of multiple primitives per system clock.Type: ApplicationFiled: October 15, 2009Publication date: April 21, 2011Inventors: Steven E. Molnar, Emmett M. Kilgariff, Johnny S. Rhoades, Timothy John Purcell, Sean J. Treichler, Ziyad S. Hakura, Franklin C. Crow, James C. Bowman
-
Patent number: 7852340Abstract: A scalable shader architecture is disclosed. In accord with that architecture, a shader includes multiple shader pipelines, each of which can perform processing operations on rasterized pixel data. Shader pipelines can be functionally removed as required, thus preventing a defective shader pipeline from causing a chip rejection. The shader includes a shader distributor that processes rasterized pixel data and then selectively distributes the processed rasterized pixel data to the various shader pipelines, beneficially in a manner that balances workloads. A shader collector formats the outputs of the various shader pipelines into proper order to form shaded pixel data. A shader instruction processor (scheduler) programs the individual shader pipelines to perform their intended tasks.Type: GrantFiled: December 14, 2007Date of Patent: December 14, 2010Assignee: NVIDIA CorporationInventors: Rui M. Bastos, Karim M. Abdalla, Christian Rouet, Michael J.M. Toksvig, Johnny S Rhoades, Roger L. Allen, John Douglas Tynefield, Jr., Emmett M. Kilgariff, Gary M. Tarolli, Brian Cabral, Craig Michael Wittenbrink, Sean J. Treichler
-
Patent number: 7821520Abstract: A new, useful, and non-obvious shader processor architecture having a shader register file that acts both as an internal storage register file for temporarily storing data within the shader processor and as a First-In First-Out (FIFO) buffer for a subsequent module. Some embodiments include automatic, programmable hardware conversion between numeric formats, for example, between floating point data and fixed point data.Type: GrantFiled: December 10, 2004Date of Patent: October 26, 2010Assignee: NVIDIA CorporationInventors: Rui M. Bastos, Karim M. Abdalla, Sean J. Treichler, Emmett M. Kilgariff
-
Patent number: 7747915Abstract: A system and method for increasing the yield of integrated circuits containing memory partitions the memory into regions and then independently tests each region to determine which, if any, of the memory regions contain one or more memory failures. The test results are stored for later retrieval. Prior to using the memory, software retrieves the test results and uses only the memory sections that contain no memory failures. A consequence of this approach is that integrated circuits containing memory that would have been discarded for containing memory failures now may be used. This approach also does not significantly impact die area.Type: GrantFiled: January 5, 2009Date of Patent: June 29, 2010Assignee: NVIDIA CorporationInventors: Anthony M. Tamasi, Oren Rubenstein, Srihari Vegesna, Jue Wu, Sean J. Treichler
-
Patent number: 7562205Abstract: A virtual address translation table and an on-chip address cache are usable for translating virtual addresses to physical addresses. Address translation information is provided using a cluster that is associated with some range of virtual addresses and that can be used to translate any virtual address in its range to a physical address, where the sizes of the ranges mapped by different clusters may be different. Clusters are stored in an address translation table that is indexed by virtual address so that, starting from any valid virtual address, the appropriate cluster for translating that address can be retrieved from the translation table. Recently retrieved clusters are stored in an on-chip cache, and a cached cluster can be used to translate any virtual address in its range without accessing the address translation table again.Type: GrantFiled: August 23, 2007Date of Patent: July 14, 2009Assignee: Nvidia CorporationInventors: Colyn S. Case, Dmitry Vyshetsky, Sean J. Treichler
-
Publication number: 20090164841Abstract: A system and method for increasing the yield of integrated circuits containing memory partitions the memory into regions and then independently tests each region to determine which, if any, of the memory regions contain one or more memory failures. The test results are stored for later retrieval. Prior to using the memory, software retrieves the test results and uses only the memory sections that contain no memory failures. A consequence of this approach is that integrated circuits containing memory that would have been discarded for containing memory failures now may be used. This approach also does not significantly impact die area.Type: ApplicationFiled: January 5, 2009Publication date: June 25, 2009Inventors: Anthony M. Tamasi, Oren Rubinstein, Srihari Vegesna, Jue Wu, Sean J. Treichler
-
Patent number: 7523209Abstract: Method and interface for configuring a link is described. A transceiver has configuration registers. The configuration registers are read to determine capability of the transceiver. An application is selected, and the configuration registers of the transceiver are configured responsive to the application selected. A protocol having initialization, transmit and receive portions is described to facilitate configuration operations, such as reads and writes of configuration registers, for such a link.Type: GrantFiled: September 4, 2002Date of Patent: April 21, 2009Assignee: NVIDIA CorporationInventors: Sean J. Treichler, Edward W. Liu
-
Patent number: 7478289Abstract: A system and method for increasing the yield of integrated circuits containing memory partitions the memory into regions and then independently tests each region to determine which, if any, of the memory regions contain one or more memory failures. The test results are stored for later retrieval. Prior to using the memory, software retrieves the test results and uses only the memory sections that contain no memory failures. A consequence of this approach is that integrated circuits containing memory that would have been discarded for containing memory failures now may be used. This approach also does not significantly impact die area.Type: GrantFiled: June 3, 2005Date of Patent: January 13, 2009Assignee: NVIDIA CorporationInventors: Anthony M. Tamasi, Oren Rubenstein, Srihari Vegesna, Jue Wu, Sean J. Treichler
-
Patent number: 7406546Abstract: One embodiment of a long-distance synchronous bus includes a sending unit and a receiving unit. The sending unit and receiving unit are configured to use credit-based handshaking signals to regulate data flow between themselves. The receiving unit includes a skid buffer for storing data packets received from the sending unit. The sending unit transmits one data packet to the receiving unit for each credit in possession and consumes one credit for each such transmitted data packet. The receiving unit transmits one credit to the sending unit for each data packet that is read out of the skid buffer. In another embodiment, transmitted data may be broadcast to multiple receiving units by routing the data from the sending unit to the multiple receiving units and maintaining separate credit-based handshaking signals for each receiving unit.Type: GrantFiled: August 17, 2005Date of Patent: July 29, 2008Assignee: NVIDIA CorporationInventors: Blaise A. Vignon, Sean J. Treichler