Patents by Inventor Samuel H. Duncan

Samuel H. Duncan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

PAGE STATE DIRECTORY FOR MANAGING UNIFIED VIRTUAL MEMORY

Publication number: 20140281255

Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.

Type: Application

Filed: October 16, 2013

Publication date: September 18, 2014

Applicant: NVIDIA CORPORATION

Inventors: Jerome F. DULUK, JR., Cameron BUSCHARDT, Sherry CHEUNG, James Leroy DEMING, Samuel H. DUNCAN, Lucien DUNNING, Robert GEORGE, Arvind GOPALAKRISHNAN, Mark HAIRGROVE, Chenghuan JIA, John MASHEY
MIGRATION SCHEME FOR UNIFIED VIRTUAL MEMORY SYSTEM

Publication number: 20140281358

Abstract: A system for managing virtual memory. The system includes a first processing unit configured to execute a first operation that references a first virtual memory address. The system also includes a first memory management unit (MMU) associated with the first processing unit and configured to generate a first page fault upon determining that a first page table that is stored in a first memory unit associated with the first processing unit does not include a mapping corresponding to the first virtual memory address. The system further includes a first copy engine associated with the first processing unit. The first copy engine is configured to read a first command queue to determine a first mapping that corresponds to the first virtual memory address and is included in a first page state directory. The first copy engine is also configured to update the first page table to include the first mapping.

Type: Application

Filed: October 16, 2013

Publication date: September 18, 2014

Applicant: NVIDIA CORPORATION

Inventors: Jerome F. DULUK, JR., Cameron BUSCHARDT, Sherry CHEUNG, James Leroy DEMING, Samuel H. DUNCAN, Lucien DUNNING, Robert GEORGE, Arvind GOPALAKRISHNAN, Mark HAIRGROVE, Chenghuan JIA, John MASHEY
TECHNIQUE FOR IMPROVING PERFORMANCE IN MULTI-THREADED PROCESSING UNITS

Publication number: 20140109102

Abstract: A multi-threaded processing unit includes a hardware pre-processor coupled to one or more processing engines (e.g., copy engines, GPCs, etc.) that implement pre-emption techniques by dividing tasks into smaller subtasks and scheduling subtasks on the processing engines based on the priority of the tasks. By limiting the size of the subtasks, higher priority tasks may be executed quickly without switching the context state of the processing engine. Tasks may be subdivided based on a threshold size or by taking into account other consideration such as physical boundaries of the memory system.

Type: Application

Filed: October 12, 2012

Publication date: April 17, 2014

Applicant: NVIDIA CORPORATION

Inventors: Samuel H. Duncan, Gary WARD, M. Wasiur RASHID, Lincoln G. GARLICK, Wojciech Jan Truty
REPLICATED STATELESS COPY ENGINE

Publication number: 20140095759

Abstract: Techniques are disclosed for performing an auxiliary operation via a compute engine associated with a host computing device. The method includes determining that the auxiliary operation is directed to the compute engine, and determining that the auxiliary operation is associated with a first context comprising a first set of state parameters. The method further includes determining a first subset of state parameters related to the auxiliary operation based on the first set of state parameters. The method further includes transmitting the first subset of state parameters to the compute engine, and transmitting the auxiliary operation to the compute engine. One advantage of the disclosed technique is that surface area and power consumption are reduced within the processor by utilizing copy engines that have no context switching capability.

Type: Application

Filed: September 28, 2012

Publication date: April 3, 2014

Applicant: NVIDIA CORPORATION

Inventors: Lincoln G. GARLICK, Philip Browning JOHNSON, Rafal ZBOINSKI, Jeff TUCKEY, Samuel H. DUNCAN, Peter C. MILLS
Method and apparatus for equalizing a bandwidth impedance mismatch between a client and an interface

Patent number: 8683089

Abstract: One or more client engines issues write transactions to system memory or peer parallel processor (PP) memory across a peripheral component interconnect express (PCIe) interface. The client engines may issue write transactions faster than the PCIe interface can transport those transactions, causing write transactions to accumulate within the PCIe interface. To prevent the accumulation of write transactions within the PCIe interface, an arbiter throttles write transactions received from the client engines based on the number of write transactions currently being transported across the PCIe interface.

Type: Grant

Filed: December 30, 2009

Date of Patent: March 25, 2014

Assignee: Nvidia Corporation

Inventors: Raymond Hoi Man Wong, Samuel H. Duncan, Lukito Muliadi
Fast and highly scalable quota-based weighted arbitration

Patent number: 8667200

Abstract: One embodiment of the present invention sets forth a technique for arbitrating between a set of requesters that transmit data transmission requests to the weighted LRU arbiter. Each data transmission request is associated with a specific amount of data to be transmitted over the crossbar unit. Based on the priority state associated with each requester, the weighted LRU arbiter then selects the requester in the set of requesters with the highest priority. The weighted LRU arbiter then decrements the weight associated with the selected requester stored in a corresponding weight store based on the size of the data to be transmitted. If the decremented weight is equal to or less than zero, then the priority associated with the selected requester is set to a lowest priority. If, however, the decremented weight is greater than zero, then the priority associated with the selected requester is not changed.

Type: Grant

Filed: February 24, 2010

Date of Patent: March 4, 2014

Assignee: NVIDIA Corporation

Inventors: Lukito Muliadi, Raymond Hoi Man Wong, Madhukiran V. Swarna, Samuel H. Duncan
PROVIDING BYTE ENABLES FOR PEER-TO-PEER DATA TRANSFER WITHIN A COMPUTING ENVIRONMENT

Publication number: 20140012904

Abstract: Non-contiguous or tiled payload data are efficiently transferred between peers over a fabric. Specifically, a client transfers a byte enable message to a peer device via a mailbox mechanism, where the byte enable message specifies which bytes of the payload data being transferred via the data packet are to be written to the frame buffer on the peer device and which bytes are not to be written. The client transfers the non-contiguous or tiled payload payload data to the peer device. Upon receiving the payload data, the peer device writes bytes from the payload data into the target frame buffer for only those bytes enabled via the byte enable message. One advantage of the present invention is that non-contiguous or tiled data are transferred over a fabric with improved efficiency.

Type: Application

Filed: July 3, 2012

Publication date: January 9, 2014

Inventors: Samuel H. DUNCAN, Dennis K. MA, Wei-Je HUANG, Gary WARD
Asynchronous interface for communicating between clock domains

Patent number: 8547993

Abstract: Methods, apparatuses, and systems are presented for performing asynchronous communications involving using an asynchronous interface to send signals between a source device and a plurality of client devices, the source device and the plurality of client devices being part of a processing unit capable of performing graphics operations, the source device being coupled to the plurality of client devices using the asynchronous interface, wherein the asynchronous interface includes at least one request signal, at least one address signal, at least one acknowledge signal, and at least one data signal, and wherein the asynchronous interface operates in accordance with at least one programmable timing characteristic associated with the source device.

Type: Grant

Filed: August 10, 2006

Date of Patent: October 1, 2013

Assignee: NVIDIA Corporation

Inventors: Lincoln G. Garlick, Richard A. Silkebakken, Prakash G. Apte, Paolo E. Sabella, Samuel H. Duncan, Dennis K. Ma, Sean J. Treichler
Virtual channels for effective packet transfer

Patent number: 8539130

Abstract: The invention sets forth a crossbar unit that includes multiple virtual channels, each virtual channel being a logical flow of data within the crossbar unit. Arbitration logic coupled to source client subsystems is configured to select a virtual channel for transmitting a data request or a data packet to a destination client subsystem based on the type of the source client subsystem and/or the type of data request. Higher priority traffic is transmitted over virtual channels that are configured to transmit data without causing deadlocks and/or stalls. Lower priority traffic is transmitted over virtual channels that can be stalled.

Type: Grant

Filed: August 31, 2010

Date of Patent: September 17, 2013

Assignee: NVIDIA Corporation

Inventors: David B. Glasco, Dane T. Mrazek, Samuel H. Duncan, Patrick R. Marchand, Ravi Kiran Manyam, Yin Fung Tang, John H. Edmondson
Multi-Channel Time Slice Groups

Publication number: 20130152093

Abstract: A time slice group (TSG) is a grouping of different streams of work (referred to herein as “channels”) that share the same context information. The set of channels belonging to a TSG are processed in a pre-determined order. However, when a channel stalls while processing, the next channel with independent work can be switched to fully load the parallel processing unit. Importantly, because each channel in the TSG shares the same context information, a context switch operation is not needed when the processing of a particular channel in the TSG stops and the processing of a next channel in the TSG begins. Therefore, multiple independent streams of work are allowed to run concurrently within a single context increasing utilization of parallel processing units.

Type: Application

Filed: December 9, 2011

Publication date: June 13, 2013

Inventors: Samuel H. DUNCAN, Lacky V. SHAH, Sean J. TREICHLER, Daniel Elliot WEXLER, Jerome F. DULUK, JR., Phillip Browning JOHNSON, Jonathon Stuart Ramsay EVANS
COMPUTE THREAD ARRAY GRANULARITY EXECUTION PREEMPTION

Publication number: 20130132711

Abstract: One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity.

Type: Application

Filed: November 22, 2011

Publication date: May 23, 2013

Inventors: Lacky V. SHAH, Gregory Scott Palmer, Gernot Schaufler, Samuel H. Duncan, Philip Browning Johnson, Shirish Gadre, Timothy John Purcell
INSTRUCTION LEVEL EXECUTION PREEMPTION

Publication number: 20130124838

Abstract: One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity.

Type: Application

Filed: November 10, 2011

Publication date: May 16, 2013

Inventors: Lacky V. SHAH, Gregory Scott Palmer, Gernot Schaufler, Samuel H. Duncan, Philip Browning Johnson, Shirish Gadre, Robert Ohannessian, Nicholas Wang, Christopher Lamb, Philip Alexander Cuadra, Timothy John Purcell
Deadlock avoidance by marking CPU traffic as special

Patent number: 8392667

Abstract: Deadlocks are avoided by marking read requests issued by a parallel processor to system memory as “special.” Read completions associated with read requests marked as special are routed on virtual channel 1 of the PCIe bus. Data returning on virtual channel 1 cannot become stalled by write requests in virtual channel 0, thus avoiding a potential deadlock.

Type: Grant

Filed: December 12, 2008

Date of Patent: March 5, 2013

Assignee: NVIDIA Corporation

Inventors: Samuel H. Duncan, David B. Glasco, Wei-Je Huang, Atul Kalambur, Patrick R. Marchand, Dennis K. Ma
Scalable efficient I/O port protocol

Patent number: 8364851

Abstract: A system that supports a high performance, scalable, and efficient I/O port protocol to connect to I/O devices is disclosed. A distributed multiprocessing computer system contains a number of processors each coupled to an I/O bridge ASIC implementing the I/O port protocol. One or more I/O devices are coupled to the I/O bridge ASIC, each I/O device capable of accessing machine resources in the computer system by transmitting and receiving message packets. Machine resources in the computer system include data blocks, registers and interrupt queues. Each processor in the computer system is coupled to a memory module capable of storing data blocks shared between the processors. Coherence of the shared data blocks in this shared memory system is maintained using a directory based coherence protocol. Coherence of data blocks transferred during I/O device read and write accesses is maintained using the same coherence protocol as for the memory system.

Type: Grant

Filed: October 2, 2003

Date of Patent: January 29, 2013

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Richard E. Kessler, Samuel H. Duncan, David W. Hartwell, David A. J. Webb, Jr., Steve Lang
Shadow unit for shadowing circuit status

Patent number: 7937606

Abstract: Generally, the present disclosure concerns systems and methods for shadowing status for a circuit with a shadow unit. In one aspect, a system comprises a first circuit in a first dynamic clock domain of a plurality of dynamic clock domains, a processor configured to execute software instructions to generate a request for a status of the first circuit, and a second circuit coupled to the first circuit and to the processor. The second circuit, outside the first dynamic clock domain, is configured to shadow a status of the first circuit and to respond to the request for the status of the first circuit with the shadowed status.

Type: Grant

Filed: May 18, 2006

Date of Patent: May 3, 2011

Assignee: NVIDIA Corporation

Inventors: Lincoln G. Garlick, Paolo E. Sabella, Samuel H. Duncan, Robert J. Hasslen
VIRTUAL CHANNELS FOR EFFECTIVE PACKET TRANSFER

Publication number: 20110072177

Abstract: The invention sets forth a crossbar unit that includes multiple virtual channels, each virtual channel being a logical flow of data within the crossbar unit. Arbitration logic coupled to source client subsystems is configured to select a virtual channel for transmitting a data request or a data packet to a destination client subsystem based on the type of the source client subsystem and/or the type of data request. Higher priority traffic is transmitted over virtual channels that are configured to transmit data without causing deadlocks and/or stalls. Lower priority traffic is transmitted over virtual channels that can be stalled.

Type: Application

Filed: August 31, 2010

Publication date: March 24, 2011

Inventors: David B. Glasco, Dane T. Mrazek, Samuel H. Duncan, Patrick R. Marchand, Ravi Kiran Manyam, Yin Fung Tang, John H. Edmondson
Deadlock Avoidance By Marking CPU Traffic As Special

Publication number: 20100153658

Abstract: Deadlocks are avoided by marking read requests issued by a parallel processor to system memory as “special.” Read completions associated with read requests marked as special are routed on virtual channel 1 of the PCIe bus. Data returning on virtual channel 1 cannot become stalled by write requests in virtual channel 0, thus avoiding a potential deadlock.

Type: Application

Filed: December 12, 2008

Publication date: June 17, 2010

Inventors: Samuel H. Duncan, David B. Glasco, Wei-je Huang, Atul Kalambur, Patrick R. Marchand, Dennis K. Ma
Method and apparatus for providing peer-to-peer data transfer within a computing environment

Patent number: 7451259

Abstract: A method and apparatus for providing peer-to-peer data transfer through an interconnecting fabric. The method and apparatus enable a first device to read and/or write data to/from a local memory of a second device by communicating read and write requests across the interconnectivity fabric. Such data transfer can be performed even when the communication protocol of the interconnectivity fabric does not permit such transfers.

Type: Grant

Filed: December 6, 2004

Date of Patent: November 11, 2008

Assignee: NVIDIA Corporation

Inventors: Samuel H. Duncan, Wei-Je Huang, John H. Edmondson
Method and apparatus for providing peer-to-peer data transfer within a computing environment

Patent number: 7275123

Abstract: A method and apparatus for providing peer-to-peer data transfer through an interconnecting fabric. The method and apparatus enable a first device to read and/or write data to/from a local memory of a second device by communicating read and write requests across the interconnectivity fabric. Such data transfer can be performed even when the communication protocol of the interconnectivity fabric does not permit such transfers.

Type: Grant

Filed: December 6, 2004

Date of Patent: September 25, 2007

Assignee: NVIDIA Corporation

Inventors: Samuel H. Duncan, Wei-Je Huang, John H. Edmondson
Method and system of completing pending I/O device reads in a multiple-processor computer system

Patent number: 7099978

Abstract: A method and system for completing pending I/O device reads by periodically stalling the issuance of I/O device accesses by a program in a multiple-processor computer system.

Type: Grant

Filed: September 15, 2003

Date of Patent: August 29, 2006

Assignee: Hewlett-Packard Development Company, LP.

Inventors: Samuel H. Duncan, Andrej Kocev, David T. Mayo

prev 1 2 3 4 next