Patents by Inventor Peter C. Mills

Peter C. Mills has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Error containment for enabling local checkpoint and recovery

Patent number: 11720440

Abstract: Various embodiments include a parallel processing computer system that detects memory errors as a memory client loads data from memory and disables the memory client from storing data to memory, thereby reducing the likelihood that the memory error propagates to other memory clients. The memory client initiates a stall sequence, while other memory clients continue to execute instructions and the memory continues to service memory load and store operations. When a memory error is detected, a specific bit pattern is stored in conjunction with the data associated with the memory error. When the data is copied from one memory to another memory, the specific bit pattern is also copied, in order to identify the data as having a memory error.

Type: Grant

Filed: July 12, 2021

Date of Patent: August 8, 2023

Assignee: NVIDIA CORPORATION

Inventors: Naveen Cherukuri, Saurabh Hukerikar, Paul Racunas, Nirmal Raj Saxena, David Charles Patrick, Yiyang Feng, Abhijeet Ghadge, Steven James Heinrich, Adam Hendrickson, Gentaro Hirota, Praveen Joginipally, Vaishali Kulkarni, Peter C. Mills, Sandeep Navada, Manan Patel, Liang Yin
ERROR CONTAINMENT FOR ENABLING LOCAL CHECKPOINT AND RECOVERY

Publication number: 20230011863

Abstract: Various embodiments include a parallel processing computer system that detects memory errors as a memory client loads data from memory and disables the memory client from storing data to memory, thereby reducing the likelihood that the memory error propagates to other memory clients. The memory client initiates a stall sequence, while other memory clients continue to execute instructions and the memory continues to service memory load and store operations. When a memory error is detected, a specific bit pattern is stored in conjunction with the data associated with the memory error. When the data is copied from one memory to another memory, the specific bit pattern is also copied, in order to identify the data as having a memory error.

Type: Application

Filed: July 12, 2021

Publication date: January 12, 2023

Inventors: NAVEEN CHERUKURI, SAURABH HUKERIKAR, PAUL RACUNAS, NIRMAL RAJ SAXENA, DAVID CHARLES PATRICK, YIYANG FENG, ABHIJEET GHADGE, STEVEN JAMES HEINRICH, ADAM HENDRICKSON, GENTARO HIROTA, PRAVEEN JOGINIPALLY, VAISHALI KULKARNI, PETER C. MILLS, SANDEEP NAVADA, MANAN PATEL, LIANG YIN
Replicated stateless copy engine

Patent number: 10423424

Abstract: Techniques are disclosed for performing an auxiliary operation via a compute engine associated with a host computing device. The method includes determining that the auxiliary operation is directed to the compute engine, and determining that the auxiliary operation is associated with a first context comprising a first set of state parameters. The method further includes determining a first subset of state parameters related to the auxiliary operation based on the first set of state parameters. The method further includes transmitting the first subset of state parameters to the compute engine, and transmitting the auxiliary operation to the compute engine. One advantage of the disclosed technique is that surface area and power consumption are reduced within the processor by utilizing copy engines that have no context switching capability.

Type: Grant

Filed: September 28, 2012

Date of Patent: September 24, 2019

Assignee: NVIDIA CORPORATION

Inventors: Lincoln G. Garlick, Philip Browning Johnson, Rafal Zboinski, Jeff Tuckey, Samuel H. Duncan, Peter C. Mills
Systems and methods for voting among parallel threads

Patent number: 10152328

Abstract: One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.

Type: Grant

Filed: May 31, 2012

Date of Patent: December 11, 2018

Assignee: NVIDIA CORPORATION

Inventors: John R. Nickolls, Lars Nyland, Peter C. Mills, Jeremy Sugerman, Timothy Foley, Brian Fahs, Michael Garland, David P. Luebke
Indirect function call instructions in a synchronous parallel thread processor

Patent number: 9639365

Abstract: An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.

Type: Grant

Filed: November 12, 2012

Date of Patent: May 2, 2017

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills, John Erik Lindholm
Technique for optimizing the phase of a data signal transmitted across a communication link

Patent number: 9407427

Abstract: A first transceiver is configured to transmit a first data signal to a second transceiver across a communication link. The second transceiver maintains clock data recovery (CDR) lock with the first signal by adjusting a sampling clock configured to sample the first data signal. When the communication link reverses directions, the second transceiver is configured to transmit a second data signal to the first transceiver with the phase of that second data signal adjusted based on the adjustments made to the sampling clock.

Type: Grant

Filed: February 20, 2013

Date of Patent: August 2, 2016

Assignee: NVIDIA Corporation

Inventors: Gregory Kodani, Guatam Bhatia, Peter C. Mills
Flexible threshold counter for clock-and-data recovery

Patent number: 9184907

Abstract: One embodiment provides a data-receiving device component comprising a phase shifter, timer logic, and control logic. The phase shifter is configured to release a train of clock pulses with a controlled phase shift. The timer logic is configured to receive data from a data-sending device, and for each transition of the data received, to determine whether a clock pulse from the train is early or late with respect to the transition, and to tally the late clock pulses relative to the early clock pulses. The control logic, operatively coupled to the phase shifter and to the timer logic, is configured to incrementally advance the phase shift when the late clock pulses outnumber the early clock pulses by a non-integer power of two.

Type: Grant

Filed: December 28, 2012

Date of Patent: November 10, 2015

Assignee: NVIDIA CORPORATION

Inventors: Peter C. Mills, Gautam Bhatia
TECHNIQUE FOR OPTIMIZING THE PHASE OF A DATA SIGNAL TRANSMITTED ACROSS A COMMUNICATION LINK

Publication number: 20140233612

Abstract: A first transceiver is configured to transmit a first data signal to a second transceiver across a communication link. The second transceiver maintains clock data recovery (CDR) lock with the first signal by adjusting a sampling clock configured to sample the first data signal. When the communication link reverses directions, the second transceiver is configured to transmit a second data signal to the first transceiver with the phase of that second data signal adjusted based on the adjustments made to the sampling clock.

Type: Application

Filed: February 20, 2013

Publication date: August 21, 2014

Applicant: NVIDIA CORPORATION

Inventors: Gregory KODANI, Guatam BHATIA, Peter C. MILLS
FLEXIBLE THRESHOLD COUNTER FOR CLOCK-AND-DATA RECOVERY

Publication number: 20140185633

Abstract: One embodiment provides a data-receiving device component comprising a phase shifter, timer logic, and control logic. The phase shifter is configured to release a train of clock pulses with a controlled phase shift. The timer logic is configured to receive data from a data-sending device, and for each transition of the data received, to determine whether a clock pulse from the train is early or late with respect to the transition, and to tally the late clock pulses relative to the early clock pulses. The control logic, operatively coupled to the phase shifter and to the timer logic, is configured to incrementally advance the phase shift when the late clock pulses outnumber the early clock pulses by a non-integer power of two.

Type: Application

Filed: December 28, 2012

Publication date: July 3, 2014

Applicant: NVIDIA CORPORATION

Inventors: Peter C. Mills, Gautam Bhatia
Internal Logic Analyzer with Programmable Window Capture

Publication number: 20140164847

Abstract: One embodiment includes receiving a data signal transmitted to the processing unit, analyzing the data signal and generating feedback information related to the data signal, and capturing the data signal via a write enable during a plurality of clock cycles specified by a programmable controller included within the processing unit. One advantage of the disclosed technique is that the programmable controller can be used to set the capture window for one or more hardwired triggers included within the processing unit. Further, the programmable controller is able to set up additional triggers that separate and apart from the hardwired triggers included within the processing unit and set the capture window for those triggers. Thus, the disclosed technique provides a highly flexible and adaptive approach for capturing and storing on-chip data and feedback information that can be analyzed later when performing diagnostic and debugging operations.

Type: Application

Filed: December 6, 2012

Publication date: June 12, 2014

Applicant: NVIDIA Corporation

Inventors: Peter C. Mills, Gautam Bhatia
REPLICATED STATELESS COPY ENGINE

Publication number: 20140095759

Abstract: Techniques are disclosed for performing an auxiliary operation via a compute engine associated with a host computing device. The method includes determining that the auxiliary operation is directed to the compute engine, and determining that the auxiliary operation is associated with a first context comprising a first set of state parameters. The method further includes determining a first subset of state parameters related to the auxiliary operation based on the first set of state parameters. The method further includes transmitting the first subset of state parameters to the compute engine, and transmitting the auxiliary operation to the compute engine. One advantage of the disclosed technique is that surface area and power consumption are reduced within the processor by utilizing copy engines that have no context switching capability.

Type: Application

Filed: September 28, 2012

Publication date: April 3, 2014

Applicant: NVIDIA CORPORATION

Inventors: Lincoln G. GARLICK, Philip Browning JOHNSON, Rafal ZBOINSKI, Jeff TUCKEY, Samuel H. DUNCAN, Peter C. MILLS
Shared single-access memory with management of multiple parallel requests

Patent number: 8645638

Abstract: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

Type: Grant

Filed: May 7, 2012

Date of Patent: February 4, 2014

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
Dynamic load balancing of instructions for execution by heterogeneous processing engines

Patent number: 8578387

Abstract: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.

Type: Grant

Filed: July 31, 2007

Date of Patent: November 5, 2013

Assignee: Nvidia Corporation

Inventors: Peter C. Mills, Stuart F. Oberman, John Erik Lindholm, Samuel Liu
Lock mechanism to enable atomic updates to shared memory

Patent number: 8375176

Abstract: A system and method for locking and unlocking access to a shared memory for atomic operations provides immediate feedback indicating whether or not the lock was successful. Read data is returned to the requestor with the lock status. The lock status may be changed concurrently when locking during a read or unlocking during a write. Therefore, it is not necessary to check the lock status as a separate transaction prior to or during a read-modify-write operation. Additionally, a lock or unlock may be explicitly specified for each atomic memory operation. Therefore, lock operations are not performed for operations that do not modify the contents of a memory location.

Type: Grant

Filed: October 18, 2011

Date of Patent: February 12, 2013

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills
Indirect function call instructions in a synchronous parallel thread processor

Patent number: 8312254

Abstract: An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.

Type: Grant

Filed: March 24, 2008

Date of Patent: November 13, 2012

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills, John Erik Lindholm
SYSTEMS AND METHODS FOR VOTING AMONG PARALLEL THREADS

Publication number: 20120239909

Abstract: One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.

Type: Application

Filed: May 31, 2012

Publication date: September 20, 2012

Inventors: John R. Nickolls, Lars Nyland, Peter C. Mills, Jeremy Sugerman, Timothy Foley, Brian Fahs, Michael Garland, David P. Luebke
SHARED SINGLE-ACCESS MEMORY WITH MANAGEMENT OF MULTIPLE PARALLEL REQUESTS

Publication number: 20120221808

Abstract: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

Type: Application

Filed: May 7, 2012

Publication date: August 30, 2012

Applicant: NVIDIA Corporation

Inventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
Scoreboard having size indicators for tracking sequential destination register usage in a multi-threaded processor

Patent number: 8225076

Abstract: A scoreboard memory for a processing unit has separate memory regions allocated to each of the multiple threads to be processed. For each thread, the scoreboard memory stores register identifiers of registers that have pending writes. When an instruction is added to an instruction buffer, the register identifiers of the registers specified in the instruction are compared with the register identifiers stored in the scoreboard memory for that instruction's thread, and a multi-bit value representing the comparison result is generated. The multi-bit value is stored with the instruction in the instruction buffer and may be updated as instructions belonging to the same thread complete their execution. Before the instruction is issued for execution, this multi-bit value is checked. If this multi-bit value indicates that none of the registers specified in the instruction have pending writes, the instruction is allowed to issue for execution.

Type: Grant

Filed: September 18, 2008

Date of Patent: July 17, 2012

Assignee: NVIDIA Corporation

Inventors: Brett W. Coon, Peter C. Mills, Stuart F. Oberman, Ming Y. Siu
Systems and methods for voting among parallel threads

Patent number: 8214625

Abstract: One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.

Type: Grant

Filed: November 26, 2008

Date of Patent: July 3, 2012

Assignee: NVIDIA Corporation

Inventors: John R. Nickolls, Lars Nyland, Peter C. Mills, Jeremy Sugerman, Timothy Foley, Brian Fahs, Michael Garland, David P. Luebke
Systems and methods for voting among parallel threads

Patent number: 8200947

Abstract: One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.

Type: Grant

Filed: March 24, 2008

Date of Patent: June 12, 2012

Assignee: NVIDIA Corporation

Inventors: John R. Nickolls, Lars Nyland, Peter C. Mills, Jeremy Sugerman, Timothy Foley, Brian Fahs, Michael Garland, David P. Luebke

1 2 next