Patents by Inventor Peter C. Mills

Peter C. Mills has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11720440
    Abstract: Various embodiments include a parallel processing computer system that detects memory errors as a memory client loads data from memory and disables the memory client from storing data to memory, thereby reducing the likelihood that the memory error propagates to other memory clients. The memory client initiates a stall sequence, while other memory clients continue to execute instructions and the memory continues to service memory load and store operations. When a memory error is detected, a specific bit pattern is stored in conjunction with the data associated with the memory error. When the data is copied from one memory to another memory, the specific bit pattern is also copied, in order to identify the data as having a memory error.
    Type: Grant
    Filed: July 12, 2021
    Date of Patent: August 8, 2023
    Assignee: NVIDIA CORPORATION
    Inventors: Naveen Cherukuri, Saurabh Hukerikar, Paul Racunas, Nirmal Raj Saxena, David Charles Patrick, Yiyang Feng, Abhijeet Ghadge, Steven James Heinrich, Adam Hendrickson, Gentaro Hirota, Praveen Joginipally, Vaishali Kulkarni, Peter C. Mills, Sandeep Navada, Manan Patel, Liang Yin
  • Publication number: 20230011863
    Abstract: Various embodiments include a parallel processing computer system that detects memory errors as a memory client loads data from memory and disables the memory client from storing data to memory, thereby reducing the likelihood that the memory error propagates to other memory clients. The memory client initiates a stall sequence, while other memory clients continue to execute instructions and the memory continues to service memory load and store operations. When a memory error is detected, a specific bit pattern is stored in conjunction with the data associated with the memory error. When the data is copied from one memory to another memory, the specific bit pattern is also copied, in order to identify the data as having a memory error.
    Type: Application
    Filed: July 12, 2021
    Publication date: January 12, 2023
    Inventors: NAVEEN CHERUKURI, SAURABH HUKERIKAR, PAUL RACUNAS, NIRMAL RAJ SAXENA, DAVID CHARLES PATRICK, YIYANG FENG, ABHIJEET GHADGE, STEVEN JAMES HEINRICH, ADAM HENDRICKSON, GENTARO HIROTA, PRAVEEN JOGINIPALLY, VAISHALI KULKARNI, PETER C. MILLS, SANDEEP NAVADA, MANAN PATEL, LIANG YIN
  • Patent number: 10423424
    Abstract: Techniques are disclosed for performing an auxiliary operation via a compute engine associated with a host computing device. The method includes determining that the auxiliary operation is directed to the compute engine, and determining that the auxiliary operation is associated with a first context comprising a first set of state parameters. The method further includes determining a first subset of state parameters related to the auxiliary operation based on the first set of state parameters. The method further includes transmitting the first subset of state parameters to the compute engine, and transmitting the auxiliary operation to the compute engine. One advantage of the disclosed technique is that surface area and power consumption are reduced within the processor by utilizing copy engines that have no context switching capability.
    Type: Grant
    Filed: September 28, 2012
    Date of Patent: September 24, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Lincoln G. Garlick, Philip Browning Johnson, Rafal Zboinski, Jeff Tuckey, Samuel H. Duncan, Peter C. Mills
  • Patent number: 10152328
    Abstract: One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.
    Type: Grant
    Filed: May 31, 2012
    Date of Patent: December 11, 2018
    Assignee: NVIDIA CORPORATION
    Inventors: John R. Nickolls, Lars Nyland, Peter C. Mills, Jeremy Sugerman, Timothy Foley, Brian Fahs, Michael Garland, David P. Luebke
  • Patent number: 9639365
    Abstract: An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.
    Type: Grant
    Filed: November 12, 2012
    Date of Patent: May 2, 2017
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills, John Erik Lindholm
  • Patent number: 9407427
    Abstract: A first transceiver is configured to transmit a first data signal to a second transceiver across a communication link. The second transceiver maintains clock data recovery (CDR) lock with the first signal by adjusting a sampling clock configured to sample the first data signal. When the communication link reverses directions, the second transceiver is configured to transmit a second data signal to the first transceiver with the phase of that second data signal adjusted based on the adjustments made to the sampling clock.
    Type: Grant
    Filed: February 20, 2013
    Date of Patent: August 2, 2016
    Assignee: NVIDIA Corporation
    Inventors: Gregory Kodani, Guatam Bhatia, Peter C. Mills
  • Patent number: 9184907
    Abstract: One embodiment provides a data-receiving device component comprising a phase shifter, timer logic, and control logic. The phase shifter is configured to release a train of clock pulses with a controlled phase shift. The timer logic is configured to receive data from a data-sending device, and for each transition of the data received, to determine whether a clock pulse from the train is early or late with respect to the transition, and to tally the late clock pulses relative to the early clock pulses. The control logic, operatively coupled to the phase shifter and to the timer logic, is configured to incrementally advance the phase shift when the late clock pulses outnumber the early clock pulses by a non-integer power of two.
    Type: Grant
    Filed: December 28, 2012
    Date of Patent: November 10, 2015
    Assignee: NVIDIA CORPORATION
    Inventors: Peter C. Mills, Gautam Bhatia
  • Publication number: 20140233612
    Abstract: A first transceiver is configured to transmit a first data signal to a second transceiver across a communication link. The second transceiver maintains clock data recovery (CDR) lock with the first signal by adjusting a sampling clock configured to sample the first data signal. When the communication link reverses directions, the second transceiver is configured to transmit a second data signal to the first transceiver with the phase of that second data signal adjusted based on the adjustments made to the sampling clock.
    Type: Application
    Filed: February 20, 2013
    Publication date: August 21, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Gregory KODANI, Guatam BHATIA, Peter C. MILLS
  • Publication number: 20140185633
    Abstract: One embodiment provides a data-receiving device component comprising a phase shifter, timer logic, and control logic. The phase shifter is configured to release a train of clock pulses with a controlled phase shift. The timer logic is configured to receive data from a data-sending device, and for each transition of the data received, to determine whether a clock pulse from the train is early or late with respect to the transition, and to tally the late clock pulses relative to the early clock pulses. The control logic, operatively coupled to the phase shifter and to the timer logic, is configured to incrementally advance the phase shift when the late clock pulses outnumber the early clock pulses by a non-integer power of two.
    Type: Application
    Filed: December 28, 2012
    Publication date: July 3, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Peter C. Mills, Gautam Bhatia
  • Publication number: 20140164847
    Abstract: One embodiment includes receiving a data signal transmitted to the processing unit, analyzing the data signal and generating feedback information related to the data signal, and capturing the data signal via a write enable during a plurality of clock cycles specified by a programmable controller included within the processing unit. One advantage of the disclosed technique is that the programmable controller can be used to set the capture window for one or more hardwired triggers included within the processing unit. Further, the programmable controller is able to set up additional triggers that separate and apart from the hardwired triggers included within the processing unit and set the capture window for those triggers. Thus, the disclosed technique provides a highly flexible and adaptive approach for capturing and storing on-chip data and feedback information that can be analyzed later when performing diagnostic and debugging operations.
    Type: Application
    Filed: December 6, 2012
    Publication date: June 12, 2014
    Applicant: NVIDIA Corporation
    Inventors: Peter C. Mills, Gautam Bhatia
  • Publication number: 20140095759
    Abstract: Techniques are disclosed for performing an auxiliary operation via a compute engine associated with a host computing device. The method includes determining that the auxiliary operation is directed to the compute engine, and determining that the auxiliary operation is associated with a first context comprising a first set of state parameters. The method further includes determining a first subset of state parameters related to the auxiliary operation based on the first set of state parameters. The method further includes transmitting the first subset of state parameters to the compute engine, and transmitting the auxiliary operation to the compute engine. One advantage of the disclosed technique is that surface area and power consumption are reduced within the processor by utilizing copy engines that have no context switching capability.
    Type: Application
    Filed: September 28, 2012
    Publication date: April 3, 2014
    Applicant: NVIDIA CORPORATION
    Inventors: Lincoln G. GARLICK, Philip Browning JOHNSON, Rafal ZBOINSKI, Jeff TUCKEY, Samuel H. DUNCAN, Peter C. MILLS
  • Patent number: 8645638
    Abstract: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.
    Type: Grant
    Filed: May 7, 2012
    Date of Patent: February 4, 2014
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
  • Patent number: 8578387
    Abstract: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.
    Type: Grant
    Filed: July 31, 2007
    Date of Patent: November 5, 2013
    Assignee: Nvidia Corporation
    Inventors: Peter C. Mills, Stuart F. Oberman, John Erik Lindholm, Samuel Liu
  • Patent number: 8375176
    Abstract: A system and method for locking and unlocking access to a shared memory for atomic operations provides immediate feedback indicating whether or not the lock was successful. Read data is returned to the requestor with the lock status. The lock status may be changed concurrently when locking during a read or unlocking during a write. Therefore, it is not necessary to check the lock status as a separate transaction prior to or during a read-modify-write operation. Additionally, a lock or unlock may be explicitly specified for each atomic memory operation. Therefore, lock operations are not performed for operations that do not modify the contents of a memory location.
    Type: Grant
    Filed: October 18, 2011
    Date of Patent: February 12, 2013
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills
  • Patent number: 8312254
    Abstract: An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.
    Type: Grant
    Filed: March 24, 2008
    Date of Patent: November 13, 2012
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills, John Erik Lindholm
  • Publication number: 20120239909
    Abstract: One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.
    Type: Application
    Filed: May 31, 2012
    Publication date: September 20, 2012
    Inventors: John R. Nickolls, Lars Nyland, Peter C. Mills, Jeremy Sugerman, Timothy Foley, Brian Fahs, Michael Garland, David P. Luebke
  • Publication number: 20120221808
    Abstract: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.
    Type: Application
    Filed: May 7, 2012
    Publication date: August 30, 2012
    Applicant: NVIDIA Corporation
    Inventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
  • Patent number: 8225076
    Abstract: A scoreboard memory for a processing unit has separate memory regions allocated to each of the multiple threads to be processed. For each thread, the scoreboard memory stores register identifiers of registers that have pending writes. When an instruction is added to an instruction buffer, the register identifiers of the registers specified in the instruction are compared with the register identifiers stored in the scoreboard memory for that instruction's thread, and a multi-bit value representing the comparison result is generated. The multi-bit value is stored with the instruction in the instruction buffer and may be updated as instructions belonging to the same thread complete their execution. Before the instruction is issued for execution, this multi-bit value is checked. If this multi-bit value indicates that none of the registers specified in the instruction have pending writes, the instruction is allowed to issue for execution.
    Type: Grant
    Filed: September 18, 2008
    Date of Patent: July 17, 2012
    Assignee: NVIDIA Corporation
    Inventors: Brett W. Coon, Peter C. Mills, Stuart F. Oberman, Ming Y. Siu
  • Patent number: 8214625
    Abstract: One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.
    Type: Grant
    Filed: November 26, 2008
    Date of Patent: July 3, 2012
    Assignee: NVIDIA Corporation
    Inventors: John R. Nickolls, Lars Nyland, Peter C. Mills, Jeremy Sugerman, Timothy Foley, Brian Fahs, Michael Garland, David P. Luebke
  • Patent number: 8200947
    Abstract: One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.
    Type: Grant
    Filed: March 24, 2008
    Date of Patent: June 12, 2012
    Assignee: NVIDIA Corporation
    Inventors: John R. Nickolls, Lars Nyland, Peter C. Mills, Jeremy Sugerman, Timothy Foley, Brian Fahs, Michael Garland, David P. Luebke