Patents by Inventor Samuel H. Duncan
Samuel H. Duncan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20140281255Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.Type: ApplicationFiled: October 16, 2013Publication date: September 18, 2014Applicant: NVIDIA CORPORATIONInventors: Jerome F. DULUK, JR., Cameron BUSCHARDT, Sherry CHEUNG, James Leroy DEMING, Samuel H. DUNCAN, Lucien DUNNING, Robert GEORGE, Arvind GOPALAKRISHNAN, Mark HAIRGROVE, Chenghuan JIA, John MASHEY
-
Publication number: 20140281358Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.Type: ApplicationFiled: October 16, 2013Publication date: September 18, 2014Applicant: NVIDIA CORPORATIONInventors: Jerome F. DULUK, JR., Cameron BUSCHARDT, Sherry CHEUNG, James Leroy DEMING, Samuel H. DUNCAN, Lucien DUNNING, Robert GEORGE, Arvind GOPALAKRISHNAN, Mark HAIRGROVE, Chenghuan JIA, John MASHEY
-
Publication number: 20140109102Abstract: A multi-threaded processing unit includes a hardware pre-processor coupled to one or more processing engines (e.g., copy engines, GPCs, etc.) that implement pre-emption techniques by dividing tasks into smaller subtasks and scheduling subtasks on the processing engines based on the priority of the tasks. By limiting the size of the subtasks, higher priority tasks may be executed quickly without switching the context state of the processing engine. Tasks may be subdivided based on a threshold size or by taking into account other consideration such as physical boundaries of the memory system.Type: ApplicationFiled: October 12, 2012Publication date: April 17, 2014Applicant: NVIDIA CORPORATIONInventors: Samuel H. Duncan, Gary WARD, M. Wasiur RASHID, Lincoln G. GARLICK, Wojciech Jan Truty
-
Publication number: 20140095759Abstract: Techniques are disclosed for performing an auxiliary operation via a compute engine associated with a host computing device. The method includes determining that the auxiliary operation is directed to the compute engine, and determining that the auxiliary operation is associated with a first context comprising a first set of state parameters. The method further includes determining a first subset of state parameters related to the auxiliary operation based on the first set of state parameters. The method further includes transmitting the first subset of state parameters to the compute engine, and transmitting the auxiliary operation to the compute engine. One advantage of the disclosed technique is that surface area and power consumption are reduced within the processor by utilizing copy engines that have no context switching capability.Type: ApplicationFiled: September 28, 2012Publication date: April 3, 2014Applicant: NVIDIA CORPORATIONInventors: Lincoln G. GARLICK, Philip Browning JOHNSON, Rafal ZBOINSKI, Jeff TUCKEY, Samuel H. DUNCAN, Peter C. MILLS
-
Method and apparatus for equalizing a bandwidth impedance mismatch between a client and an interface
Patent number: 8683089Abstract: One or more client engines issues write transactions to system memory or peer parallel processor (PP) memory across a peripheral component interconnect express (PCIe) interface. The client engines may issue write transactions faster than the PCIe interface can transport those transactions, causing write transactions to accumulate within the PCIe interface. To prevent the accumulation of write transactions within the PCIe interface, an arbiter throttles write transactions received from the client engines based on the number of write transactions currently being transported across the PCIe interface.Type: GrantFiled: December 30, 2009Date of Patent: March 25, 2014Assignee: Nvidia CorporationInventors: Raymond Hoi Man Wong, Samuel H. Duncan, Lukito Muliadi -
Patent number: 8667200Abstract: One embodiment of the present invention sets forth a technique for arbitrating between a set of requesters that transmit data transmission requests to the weighted LRU arbiter. Each data transmission request is associated with a specific amount of data to be transmitted over the crossbar unit. Based on the priority state associated with each requester, the weighted LRU arbiter then selects the requester in the set of requesters with the highest priority. The weighted LRU arbiter then decrements the weight associated with the selected requester stored in a corresponding weight store based on the size of the data to be transmitted. If the decremented weight is equal to or less than zero, then the priority associated with the selected requester is set to a lowest priority. If, however, the decremented weight is greater than zero, then the priority associated with the selected requester is not changed.Type: GrantFiled: February 24, 2010Date of Patent: March 4, 2014Assignee: NVIDIA CorporationInventors: Lukito Muliadi, Raymond Hoi Man Wong, Madhukiran V. Swarna, Samuel H. Duncan
-
Publication number: 20140012904Abstract: Non-contiguous or tiled payload data are efficiently transferred between peers over a fabric. Specifically, a client transfers a byte enable message to a peer device via a mailbox mechanism, where the byte enable message specifies which bytes of the payload data being transferred via the data packet are to be written to the frame buffer on the peer device and which bytes are not to be written. The client transfers the non-contiguous or tiled payload payload data to the peer device. Upon receiving the payload data, the peer device writes bytes from the payload data into the target frame buffer for only those bytes enabled via the byte enable message. One advantage of the present invention is that non-contiguous or tiled data are transferred over a fabric with improved efficiency.Type: ApplicationFiled: July 3, 2012Publication date: January 9, 2014Inventors: Samuel H. DUNCAN, Dennis K. MA, Wei-Je HUANG, Gary WARD
-
Patent number: 8547993Abstract: Methods, apparatuses, and systems are presented for performing asynchronous communications involving using an asynchronous interface to send signals between a source device and a plurality of client devices, the source device and the plurality of client devices being part of a processing unit capable of performing graphics operations, the source device being coupled to the plurality of client devices using the asynchronous interface, wherein the asynchronous interface includes at least one request signal, at least one address signal, at least one acknowledge signal, and at least one data signal, and wherein the asynchronous interface operates in accordance with at least one programmable timing characteristic associated with the source device.Type: GrantFiled: August 10, 2006Date of Patent: October 1, 2013Assignee: NVIDIA CorporationInventors: Lincoln G. Garlick, Richard A. Silkebakken, Prakash G. Apte, Paolo E. Sabella, Samuel H. Duncan, Dennis K. Ma, Sean J. Treichler
-
Patent number: 8539130Abstract: The invention sets forth a crossbar unit that includes multiple virtual channels, each virtual channel being a logical flow of data within the crossbar unit. Arbitration logic coupled to source client subsystems is configured to select a virtual channel for transmitting a data request or a data packet to a destination client subsystem based on the type of the source client subsystem and/or the type of data request. Higher priority traffic is transmitted over virtual channels that are configured to transmit data without causing deadlocks and/or stalls. Lower priority traffic is transmitted over virtual channels that can be stalled.Type: GrantFiled: August 31, 2010Date of Patent: September 17, 2013Assignee: NVIDIA CorporationInventors: David B. Glasco, Dane T. Mrazek, Samuel H. Duncan, Patrick R. Marchand, Ravi Kiran Manyam, Yin Fung Tang, John H. Edmondson
-
Publication number: 20130152093Abstract: A time slice group (TSG) is a grouping of different streams of work (referred to herein as “channels”) that share the same context information. The set of channels belonging to a TSG are processed in a pre-determined order. However, when a channel stalls while processing, the next channel with independent work can be switched to fully load the parallel processing unit. Importantly, because each channel in the TSG shares the same context information, a context switch operation is not needed when the processing of a particular channel in the TSG stops and the processing of a next channel in the TSG begins. Therefore, multiple independent streams of work are allowed to run concurrently within a single context increasing utilization of parallel processing units.Type: ApplicationFiled: December 9, 2011Publication date: June 13, 2013Inventors: Samuel H. DUNCAN, Lacky V. SHAH, Sean J. TREICHLER, Daniel Elliot WEXLER, Jerome F. DULUK, JR., Phillip Browning JOHNSON, Jonathon Stuart Ramsay EVANS
-
Publication number: 20130132711Abstract: One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity.Type: ApplicationFiled: November 22, 2011Publication date: May 23, 2013Inventors: Lacky V. SHAH, Gregory Scott Palmer, Gernot Schaufler, Samuel H. Duncan, Philip Browning Johnson, Shirish Gadre, Timothy John Purcell
-
Publication number: 20130124838Abstract: One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity.Type: ApplicationFiled: November 10, 2011Publication date: May 16, 2013Inventors: Lacky V. SHAH, Gregory Scott Palmer, Gernot Schaufler, Samuel H. Duncan, Philip Browning Johnson, Shirish Gadre, Robert Ohannessian, Nicholas Wang, Christopher Lamb, Philip Alexander Cuadra, Timothy John Purcell
-
Patent number: 8392667Abstract: Deadlocks are avoided by marking read requests issued by a parallel processor to system memory as “special.” Read completions associated with read requests marked as special are routed on virtual channel 1 of the PCIe bus. Data returning on virtual channel 1 cannot become stalled by write requests in virtual channel 0, thus avoiding a potential deadlock.Type: GrantFiled: December 12, 2008Date of Patent: March 5, 2013Assignee: NVIDIA CorporationInventors: Samuel H. Duncan, David B. Glasco, Wei-Je Huang, Atul Kalambur, Patrick R. Marchand, Dennis K. Ma
-
Patent number: 8364851Abstract: A system that supports a high performance, scalable, and efficient I/O port protocol to connect to I/O devices is disclosed. A distributed multiprocessing computer system contains a number of processors each coupled to an I/O bridge ASIC implementing the I/O port protocol. One or more I/O devices are coupled to the I/O bridge ASIC, each I/O device capable of accessing machine resources in the computer system by transmitting and receiving message packets. Machine resources in the computer system include data blocks, registers and interrupt queues. Each processor in the computer system is coupled to a memory module capable of storing data blocks shared between the processors. Coherence of the shared data blocks in this shared memory system is maintained using a directory based coherence protocol. Coherence of data blocks transferred during I/O device read and write accesses is maintained using the same coherence protocol as for the memory system.Type: GrantFiled: October 2, 2003Date of Patent: January 29, 2013Assignee: Hewlett-Packard Development Company, L.P.Inventors: Richard E. Kessler, Samuel H. Duncan, David W. Hartwell, David A. J. Webb, Jr., Steve Lang
-
Patent number: 7937606Abstract: Generally, the present disclosure concerns systems and methods for shadowing status for a circuit with a shadow unit. In one aspect, a system comprises a first circuit in a first dynamic clock domain of a plurality of dynamic clock domains, a processor configured to execute software instructions to generate a request for a status of the first circuit, and a second circuit coupled to the first circuit and to the processor. The second circuit, outside the first dynamic clock domain, is configured to shadow a status of the first circuit and to respond to the request for the status of the first circuit with the shadowed status.Type: GrantFiled: May 18, 2006Date of Patent: May 3, 2011Assignee: NVIDIA CorporationInventors: Lincoln G. Garlick, Paolo E. Sabella, Samuel H. Duncan, Robert J. Hasslen
-
Publication number: 20110072177Abstract: The invention sets forth a crossbar unit that includes multiple virtual channels, each virtual channel being a logical flow of data within the crossbar unit. Arbitration logic coupled to source client subsystems is configured to select a virtual channel for transmitting a data request or a data packet to a destination client subsystem based on the type of the source client subsystem and/or the type of data request. Higher priority traffic is transmitted over virtual channels that are configured to transmit data without causing deadlocks and/or stalls. Lower priority traffic is transmitted over virtual channels that can be stalled.Type: ApplicationFiled: August 31, 2010Publication date: March 24, 2011Inventors: David B. Glasco, Dane T. Mrazek, Samuel H. Duncan, Patrick R. Marchand, Ravi Kiran Manyam, Yin Fung Tang, John H. Edmondson
-
Publication number: 20100153658Abstract: Deadlocks are avoided by marking read requests issued by a parallel processor to system memory as “special.” Read completions associated with read requests marked as special are routed on virtual channel 1 of the PCIe bus. Data returning on virtual channel 1 cannot become stalled by write requests in virtual channel 0, thus avoiding a potential deadlock.Type: ApplicationFiled: December 12, 2008Publication date: June 17, 2010Inventors: Samuel H. Duncan, David B. Glasco, Wei-je Huang, Atul Kalambur, Patrick R. Marchand, Dennis K. Ma
-
Patent number: 7451259Abstract: A method and apparatus for providing peer-to-peer data transfer through an interconnecting fabric. The method and apparatus enable a first device to read and/or write data to/from a local memory of a second device by communicating read and write requests across the interconnectivity fabric. Such data transfer can be performed even when the communication protocol of the interconnectivity fabric does not permit such transfers.Type: GrantFiled: December 6, 2004Date of Patent: November 11, 2008Assignee: NVIDIA CorporationInventors: Samuel H. Duncan, Wei-Je Huang, John H. Edmondson
-
Patent number: 7275123Abstract: A method and apparatus for providing peer-to-peer data transfer through an interconnecting fabric. The method and apparatus enable a first device to read and/or write data to/from a local memory of a second device by communicating read and write requests across the interconnectivity fabric. Such data transfer can be performed even when the communication protocol of the interconnectivity fabric does not permit such transfers.Type: GrantFiled: December 6, 2004Date of Patent: September 25, 2007Assignee: NVIDIA CorporationInventors: Samuel H. Duncan, Wei-Je Huang, John H. Edmondson
-
Patent number: 7099978Abstract: A method and system for completing pending I/O device reads by periodically stalling the issuance of I/O device accesses by a program in a multiple-processor computer system.Type: GrantFiled: September 15, 2003Date of Patent: August 29, 2006Assignee: Hewlett-Packard Development Company, LP.Inventors: Samuel H. Duncan, Andrej Kocev, David T. Mayo