Patents by Inventor Samuel H. Duncan

Samuel H. Duncan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20140281255
    Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.
    Type: Application
    Filed: October 16, 2013
    Publication date: September 18, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Jerome F. DULUK, JR., Cameron BUSCHARDT, Sherry CHEUNG, James Leroy DEMING, Samuel H. DUNCAN, Lucien DUNNING, Robert GEORGE, Arvind GOPALAKRISHNAN, Mark HAIRGROVE, Chenghuan JIA, John MASHEY
  • Publication number: 20140281358
    Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.
    Type: Application
    Filed: October 16, 2013
    Publication date: September 18, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Jerome F. DULUK, JR., Cameron BUSCHARDT, Sherry CHEUNG, James Leroy DEMING, Samuel H. DUNCAN, Lucien DUNNING, Robert GEORGE, Arvind GOPALAKRISHNAN, Mark HAIRGROVE, Chenghuan JIA, John MASHEY
  • Publication number: 20140109102
    Abstract: A multi-threaded processing unit includes a hardware pre-processor coupled to one or more processing engines (e.g., copy engines, GPCs, etc.) that implement pre-emption techniques by dividing tasks into smaller subtasks and scheduling subtasks on the processing engines based on the priority of the tasks. By limiting the size of the subtasks, higher priority tasks may be executed quickly without switching the context state of the processing engine. Tasks may be subdivided based on a threshold size or by taking into account other consideration such as physical boundaries of the memory system.
    Type: Application
    Filed: October 12, 2012
    Publication date: April 17, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Samuel H. Duncan, Gary WARD, M. Wasiur RASHID, Lincoln G. GARLICK, Wojciech Jan Truty
  • Publication number: 20140095759
    Abstract: Techniques are disclosed for performing an auxiliary operation via a compute engine associated with a host computing device. The method includes determining that the auxiliary operation is directed to the compute engine, and determining that the auxiliary operation is associated with a first context comprising a first set of state parameters. The method further includes determining a first subset of state parameters related to the auxiliary operation based on the first set of state parameters. The method further includes transmitting the first subset of state parameters to the compute engine, and transmitting the auxiliary operation to the compute engine. One advantage of the disclosed technique is that surface area and power consumption are reduced within the processor by utilizing copy engines that have no context switching capability.
    Type: Application
    Filed: September 28, 2012
    Publication date: April 3, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Lincoln G. GARLICK, Philip Browning JOHNSON, Rafal ZBOINSKI, Jeff TUCKEY, Samuel H. DUNCAN, Peter C. MILLS
  • Patent number: 8683089
    Abstract: One or more client engines issues write transactions to system memory or peer parallel processor (PP) memory across a peripheral component interconnect express (PCIe) interface. The client engines may issue write transactions faster than the PCIe interface can transport those transactions, causing write transactions to accumulate within the PCIe interface. To prevent the accumulation of write transactions within the PCIe interface, an arbiter throttles write transactions received from the client engines based on the number of write transactions currently being transported across the PCIe interface.
    Type: Grant
    Filed: December 30, 2009
    Date of Patent: March 25, 2014
    Assignee: Nvidia Corporation
    Inventors: Raymond Hoi Man Wong, Samuel H. Duncan, Lukito Muliadi
  • Patent number: 8667200
    Abstract: One embodiment of the present invention sets forth a technique for arbitrating between a set of requesters that transmit data transmission requests to the weighted LRU arbiter. Each data transmission request is associated with a specific amount of data to be transmitted over the crossbar unit. Based on the priority state associated with each requester, the weighted LRU arbiter then selects the requester in the set of requesters with the highest priority. The weighted LRU arbiter then decrements the weight associated with the selected requester stored in a corresponding weight store based on the size of the data to be transmitted. If the decremented weight is equal to or less than zero, then the priority associated with the selected requester is set to a lowest priority. If, however, the decremented weight is greater than zero, then the priority associated with the selected requester is not changed.
    Type: Grant
    Filed: February 24, 2010
    Date of Patent: March 4, 2014
    Assignee: NVIDIA Corporation
    Inventors: Lukito Muliadi, Raymond Hoi Man Wong, Madhukiran V. Swarna, Samuel H. Duncan
  • Publication number: 20140012904
    Abstract: Non-contiguous or tiled payload data are efficiently transferred between peers over a fabric. Specifically, a client transfers a byte enable message to a peer device via a mailbox mechanism, where the byte enable message specifies which bytes of the payload data being transferred via the data packet are to be written to the frame buffer on the peer device and which bytes are not to be written. The client transfers the non-contiguous or tiled payload payload data to the peer device. Upon receiving the payload data, the peer device writes bytes from the payload data into the target frame buffer for only those bytes enabled via the byte enable message. One advantage of the present invention is that non-contiguous or tiled data are transferred over a fabric with improved efficiency.
    Type: Application
    Filed: July 3, 2012
    Publication date: January 9, 2014
    Inventors: Samuel H. DUNCAN, Dennis K. MA, Wei-Je HUANG, Gary WARD
  • Patent number: 8547993
    Abstract: Methods, apparatuses, and systems are presented for performing asynchronous communications involving using an asynchronous interface to send signals between a source device and a plurality of client devices, the source device and the plurality of client devices being part of a processing unit capable of performing graphics operations, the source device being coupled to the plurality of client devices using the asynchronous interface, wherein the asynchronous interface includes at least one request signal, at least one address signal, at least one acknowledge signal, and at least one data signal, and wherein the asynchronous interface operates in accordance with at least one programmable timing characteristic associated with the source device.
    Type: Grant
    Filed: August 10, 2006
    Date of Patent: October 1, 2013
    Assignee: NVIDIA Corporation
    Inventors: Lincoln G. Garlick, Richard A. Silkebakken, Prakash G. Apte, Paolo E. Sabella, Samuel H. Duncan, Dennis K. Ma, Sean J. Treichler
  • Patent number: 8539130
    Abstract: The invention sets forth a crossbar unit that includes multiple virtual channels, each virtual channel being a logical flow of data within the crossbar unit. Arbitration logic coupled to source client subsystems is configured to select a virtual channel for transmitting a data request or a data packet to a destination client subsystem based on the type of the source client subsystem and/or the type of data request. Higher priority traffic is transmitted over virtual channels that are configured to transmit data without causing deadlocks and/or stalls. Lower priority traffic is transmitted over virtual channels that can be stalled.
    Type: Grant
    Filed: August 31, 2010
    Date of Patent: September 17, 2013
    Assignee: NVIDIA Corporation
    Inventors: David B. Glasco, Dane T. Mrazek, Samuel H. Duncan, Patrick R. Marchand, Ravi Kiran Manyam, Yin Fung Tang, John H. Edmondson
  • Publication number: 20130152093
    Abstract: A time slice group (TSG) is a grouping of different streams of work (referred to herein as “channels”) that share the same context information. The set of channels belonging to a TSG are processed in a pre-determined order. However, when a channel stalls while processing, the next channel with independent work can be switched to fully load the parallel processing unit. Importantly, because each channel in the TSG shares the same context information, a context switch operation is not needed when the processing of a particular channel in the TSG stops and the processing of a next channel in the TSG begins. Therefore, multiple independent streams of work are allowed to run concurrently within a single context increasing utilization of parallel processing units.
    Type: Application
    Filed: December 9, 2011
    Publication date: June 13, 2013
    Inventors: Samuel H. DUNCAN, Lacky V. SHAH, Sean J. TREICHLER, Daniel Elliot WEXLER, Jerome F. DULUK, JR., Phillip Browning JOHNSON, Jonathon Stuart Ramsay EVANS
  • Publication number: 20130132711
    Abstract: One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity.
    Type: Application
    Filed: November 22, 2011
    Publication date: May 23, 2013
    Inventors: Lacky V. SHAH, Gregory Scott Palmer, Gernot Schaufler, Samuel H. Duncan, Philip Browning Johnson, Shirish Gadre, Timothy John Purcell
  • Publication number: 20130124838
    Abstract: One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity.
    Type: Application
    Filed: November 10, 2011
    Publication date: May 16, 2013
    Inventors: Lacky V. SHAH, Gregory Scott Palmer, Gernot Schaufler, Samuel H. Duncan, Philip Browning Johnson, Shirish Gadre, Robert Ohannessian, Nicholas Wang, Christopher Lamb, Philip Alexander Cuadra, Timothy John Purcell
  • Patent number: 8392667
    Abstract: Deadlocks are avoided by marking read requests issued by a parallel processor to system memory as “special.” Read completions associated with read requests marked as special are routed on virtual channel 1 of the PCIe bus. Data returning on virtual channel 1 cannot become stalled by write requests in virtual channel 0, thus avoiding a potential deadlock.
    Type: Grant
    Filed: December 12, 2008
    Date of Patent: March 5, 2013
    Assignee: NVIDIA Corporation
    Inventors: Samuel H. Duncan, David B. Glasco, Wei-Je Huang, Atul Kalambur, Patrick R. Marchand, Dennis K. Ma
  • Patent number: 8364851
    Abstract: A system that supports a high performance, scalable, and efficient I/O port protocol to connect to I/O devices is disclosed. A distributed multiprocessing computer system contains a number of processors each coupled to an I/O bridge ASIC implementing the I/O port protocol. One or more I/O devices are coupled to the I/O bridge ASIC, each I/O device capable of accessing machine resources in the computer system by transmitting and receiving message packets. Machine resources in the computer system include data blocks, registers and interrupt queues. Each processor in the computer system is coupled to a memory module capable of storing data blocks shared between the processors. Coherence of the shared data blocks in this shared memory system is maintained using a directory based coherence protocol. Coherence of data blocks transferred during I/O device read and write accesses is maintained using the same coherence protocol as for the memory system.
    Type: Grant
    Filed: October 2, 2003
    Date of Patent: January 29, 2013
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Richard E. Kessler, Samuel H. Duncan, David W. Hartwell, David A. J. Webb, Jr., Steve Lang
  • Patent number: 7937606
    Abstract: Generally, the present disclosure concerns systems and methods for shadowing status for a circuit with a shadow unit. In one aspect, a system comprises a first circuit in a first dynamic clock domain of a plurality of dynamic clock domains, a processor configured to execute software instructions to generate a request for a status of the first circuit, and a second circuit coupled to the first circuit and to the processor. The second circuit, outside the first dynamic clock domain, is configured to shadow a status of the first circuit and to respond to the request for the status of the first circuit with the shadowed status.
    Type: Grant
    Filed: May 18, 2006
    Date of Patent: May 3, 2011
    Assignee: NVIDIA Corporation
    Inventors: Lincoln G. Garlick, Paolo E. Sabella, Samuel H. Duncan, Robert J. Hasslen
  • Publication number: 20110072177
    Abstract: The invention sets forth a crossbar unit that includes multiple virtual channels, each virtual channel being a logical flow of data within the crossbar unit. Arbitration logic coupled to source client subsystems is configured to select a virtual channel for transmitting a data request or a data packet to a destination client subsystem based on the type of the source client subsystem and/or the type of data request. Higher priority traffic is transmitted over virtual channels that are configured to transmit data without causing deadlocks and/or stalls. Lower priority traffic is transmitted over virtual channels that can be stalled.
    Type: Application
    Filed: August 31, 2010
    Publication date: March 24, 2011
    Inventors: David B. Glasco, Dane T. Mrazek, Samuel H. Duncan, Patrick R. Marchand, Ravi Kiran Manyam, Yin Fung Tang, John H. Edmondson
  • Publication number: 20100153658
    Abstract: Deadlocks are avoided by marking read requests issued by a parallel processor to system memory as “special.” Read completions associated with read requests marked as special are routed on virtual channel 1 of the PCIe bus. Data returning on virtual channel 1 cannot become stalled by write requests in virtual channel 0, thus avoiding a potential deadlock.
    Type: Application
    Filed: December 12, 2008
    Publication date: June 17, 2010
    Inventors: Samuel H. Duncan, David B. Glasco, Wei-je Huang, Atul Kalambur, Patrick R. Marchand, Dennis K. Ma
  • Patent number: 7451259
    Abstract: A method and apparatus for providing peer-to-peer data transfer through an interconnecting fabric. The method and apparatus enable a first device to read and/or write data to/from a local memory of a second device by communicating read and write requests across the interconnectivity fabric. Such data transfer can be performed even when the communication protocol of the interconnectivity fabric does not permit such transfers.
    Type: Grant
    Filed: December 6, 2004
    Date of Patent: November 11, 2008
    Assignee: NVIDIA Corporation
    Inventors: Samuel H. Duncan, Wei-Je Huang, John H. Edmondson
  • Patent number: 7275123
    Abstract: A method and apparatus for providing peer-to-peer data transfer through an interconnecting fabric. The method and apparatus enable a first device to read and/or write data to/from a local memory of a second device by communicating read and write requests across the interconnectivity fabric. Such data transfer can be performed even when the communication protocol of the interconnectivity fabric does not permit such transfers.
    Type: Grant
    Filed: December 6, 2004
    Date of Patent: September 25, 2007
    Assignee: NVIDIA Corporation
    Inventors: Samuel H. Duncan, Wei-Je Huang, John H. Edmondson
  • Patent number: 7099978
    Abstract: A method and system for completing pending I/O device reads by periodically stalling the issuance of I/O device accesses by a program in a multiple-processor computer system.
    Type: Grant
    Filed: September 15, 2003
    Date of Patent: August 29, 2006
    Assignee: Hewlett-Packard Development Company, LP.
    Inventors: Samuel H. Duncan, Andrej Kocev, David T. Mayo