Patents by Inventor Peter C. Mills
Peter C. Mills has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11720440Abstract: Various embodiments include a parallel processing computer system that detects memory errors as a memory client loads data from memory and disables the memory client from storing data to memory, thereby reducing the likelihood that the memory error propagates to other memory clients. The memory client initiates a stall sequence, while other memory clients continue to execute instructions and the memory continues to service memory load and store operations. When a memory error is detected, a specific bit pattern is stored in conjunction with the data associated with the memory error. When the data is copied from one memory to another memory, the specific bit pattern is also copied, in order to identify the data as having a memory error.Type: GrantFiled: July 12, 2021Date of Patent: August 8, 2023Assignee: NVIDIA CORPORATIONInventors: Naveen Cherukuri, Saurabh Hukerikar, Paul Racunas, Nirmal Raj Saxena, David Charles Patrick, Yiyang Feng, Abhijeet Ghadge, Steven James Heinrich, Adam Hendrickson, Gentaro Hirota, Praveen Joginipally, Vaishali Kulkarni, Peter C. Mills, Sandeep Navada, Manan Patel, Liang Yin
-
Publication number: 20230011863Abstract: Various embodiments include a parallel processing computer system that detects memory errors as a memory client loads data from memory and disables the memory client from storing data to memory, thereby reducing the likelihood that the memory error propagates to other memory clients. The memory client initiates a stall sequence, while other memory clients continue to execute instructions and the memory continues to service memory load and store operations. When a memory error is detected, a specific bit pattern is stored in conjunction with the data associated with the memory error. When the data is copied from one memory to another memory, the specific bit pattern is also copied, in order to identify the data as having a memory error.Type: ApplicationFiled: July 12, 2021Publication date: January 12, 2023Inventors: NAVEEN CHERUKURI, SAURABH HUKERIKAR, PAUL RACUNAS, NIRMAL RAJ SAXENA, DAVID CHARLES PATRICK, YIYANG FENG, ABHIJEET GHADGE, STEVEN JAMES HEINRICH, ADAM HENDRICKSON, GENTARO HIROTA, PRAVEEN JOGINIPALLY, VAISHALI KULKARNI, PETER C. MILLS, SANDEEP NAVADA, MANAN PATEL, LIANG YIN
-
Patent number: 10423424Abstract: Techniques are disclosed for performing an auxiliary operation via a compute engine associated with a host computing device. The method includes determining that the auxiliary operation is directed to the compute engine, and determining that the auxiliary operation is associated with a first context comprising a first set of state parameters. The method further includes determining a first subset of state parameters related to the auxiliary operation based on the first set of state parameters. The method further includes transmitting the first subset of state parameters to the compute engine, and transmitting the auxiliary operation to the compute engine. One advantage of the disclosed technique is that surface area and power consumption are reduced within the processor by utilizing copy engines that have no context switching capability.Type: GrantFiled: September 28, 2012Date of Patent: September 24, 2019Assignee: NVIDIA CORPORATIONInventors: Lincoln G. Garlick, Philip Browning Johnson, Rafal Zboinski, Jeff Tuckey, Samuel H. Duncan, Peter C. Mills
-
Patent number: 10152328Abstract: One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.Type: GrantFiled: May 31, 2012Date of Patent: December 11, 2018Assignee: NVIDIA CORPORATIONInventors: John R. Nickolls, Lars Nyland, Peter C. Mills, Jeremy Sugerman, Timothy Foley, Brian Fahs, Michael Garland, David P. Luebke
-
Patent number: 9639365Abstract: An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.Type: GrantFiled: November 12, 2012Date of Patent: May 2, 2017Assignee: NVIDIA CorporationInventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills, John Erik Lindholm
-
Patent number: 9407427Abstract: A first transceiver is configured to transmit a first data signal to a second transceiver across a communication link. The second transceiver maintains clock data recovery (CDR) lock with the first signal by adjusting a sampling clock configured to sample the first data signal. When the communication link reverses directions, the second transceiver is configured to transmit a second data signal to the first transceiver with the phase of that second data signal adjusted based on the adjustments made to the sampling clock.Type: GrantFiled: February 20, 2013Date of Patent: August 2, 2016Assignee: NVIDIA CorporationInventors: Gregory Kodani, Guatam Bhatia, Peter C. Mills
-
Patent number: 9184907Abstract: One embodiment provides a data-receiving device component comprising a phase shifter, timer logic, and control logic. The phase shifter is configured to release a train of clock pulses with a controlled phase shift. The timer logic is configured to receive data from a data-sending device, and for each transition of the data received, to determine whether a clock pulse from the train is early or late with respect to the transition, and to tally the late clock pulses relative to the early clock pulses. The control logic, operatively coupled to the phase shifter and to the timer logic, is configured to incrementally advance the phase shift when the late clock pulses outnumber the early clock pulses by a non-integer power of two.Type: GrantFiled: December 28, 2012Date of Patent: November 10, 2015Assignee: NVIDIA CORPORATIONInventors: Peter C. Mills, Gautam Bhatia
-
Publication number: 20140233612Abstract: A first transceiver is configured to transmit a first data signal to a second transceiver across a communication link. The second transceiver maintains clock data recovery (CDR) lock with the first signal by adjusting a sampling clock configured to sample the first data signal. When the communication link reverses directions, the second transceiver is configured to transmit a second data signal to the first transceiver with the phase of that second data signal adjusted based on the adjustments made to the sampling clock.Type: ApplicationFiled: February 20, 2013Publication date: August 21, 2014Applicant: NVIDIA CORPORATIONInventors: Gregory KODANI, Guatam BHATIA, Peter C. MILLS
-
Publication number: 20140185633Abstract: One embodiment provides a data-receiving device component comprising a phase shifter, timer logic, and control logic. The phase shifter is configured to release a train of clock pulses with a controlled phase shift. The timer logic is configured to receive data from a data-sending device, and for each transition of the data received, to determine whether a clock pulse from the train is early or late with respect to the transition, and to tally the late clock pulses relative to the early clock pulses. The control logic, operatively coupled to the phase shifter and to the timer logic, is configured to incrementally advance the phase shift when the late clock pulses outnumber the early clock pulses by a non-integer power of two.Type: ApplicationFiled: December 28, 2012Publication date: July 3, 2014Applicant: NVIDIA CORPORATIONInventors: Peter C. Mills, Gautam Bhatia
-
Publication number: 20140164847Abstract: One embodiment includes receiving a data signal transmitted to the processing unit, analyzing the data signal and generating feedback information related to the data signal, and capturing the data signal via a write enable during a plurality of clock cycles specified by a programmable controller included within the processing unit. One advantage of the disclosed technique is that the programmable controller can be used to set the capture window for one or more hardwired triggers included within the processing unit. Further, the programmable controller is able to set up additional triggers that separate and apart from the hardwired triggers included within the processing unit and set the capture window for those triggers. Thus, the disclosed technique provides a highly flexible and adaptive approach for capturing and storing on-chip data and feedback information that can be analyzed later when performing diagnostic and debugging operations.Type: ApplicationFiled: December 6, 2012Publication date: June 12, 2014Applicant: NVIDIA CorporationInventors: Peter C. Mills, Gautam Bhatia
-
Publication number: 20140095759Abstract: Techniques are disclosed for performing an auxiliary operation via a compute engine associated with a host computing device. The method includes determining that the auxiliary operation is directed to the compute engine, and determining that the auxiliary operation is associated with a first context comprising a first set of state parameters. The method further includes determining a first subset of state parameters related to the auxiliary operation based on the first set of state parameters. The method further includes transmitting the first subset of state parameters to the compute engine, and transmitting the auxiliary operation to the compute engine. One advantage of the disclosed technique is that surface area and power consumption are reduced within the processor by utilizing copy engines that have no context switching capability.Type: ApplicationFiled: September 28, 2012Publication date: April 3, 2014Applicant: NVIDIA CORPORATIONInventors: Lincoln G. GARLICK, Philip Browning JOHNSON, Rafal ZBOINSKI, Jeff TUCKEY, Samuel H. DUNCAN, Peter C. MILLS
-
Patent number: 8645638Abstract: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.Type: GrantFiled: May 7, 2012Date of Patent: February 4, 2014Assignee: NVIDIA CorporationInventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
-
Patent number: 8578387Abstract: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.Type: GrantFiled: July 31, 2007Date of Patent: November 5, 2013Assignee: Nvidia CorporationInventors: Peter C. Mills, Stuart F. Oberman, John Erik Lindholm, Samuel Liu
-
Patent number: 8375176Abstract: A system and method for locking and unlocking access to a shared memory for atomic operations provides immediate feedback indicating whether or not the lock was successful. Read data is returned to the requestor with the lock status. The lock status may be changed concurrently when locking during a read or unlocking during a write. Therefore, it is not necessary to check the lock status as a separate transaction prior to or during a read-modify-write operation. Additionally, a lock or unlock may be explicitly specified for each atomic memory operation. Therefore, lock operations are not performed for operations that do not modify the contents of a memory location.Type: GrantFiled: October 18, 2011Date of Patent: February 12, 2013Assignee: NVIDIA CorporationInventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills
-
Patent number: 8312254Abstract: An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.Type: GrantFiled: March 24, 2008Date of Patent: November 13, 2012Assignee: NVIDIA CorporationInventors: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills, John Erik Lindholm
-
Publication number: 20120239909Abstract: One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.Type: ApplicationFiled: May 31, 2012Publication date: September 20, 2012Inventors: John R. Nickolls, Lars Nyland, Peter C. Mills, Jeremy Sugerman, Timothy Foley, Brian Fahs, Michael Garland, David P. Luebke
-
Publication number: 20120221808Abstract: A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.Type: ApplicationFiled: May 7, 2012Publication date: August 30, 2012Applicant: NVIDIA CorporationInventors: Brett W. Coon, Ming Y. Siu, Weizhong Xu, Stuart F. Oberman, John R. Nickolls, Peter C. Mills
-
Patent number: 8225076Abstract: A scoreboard memory for a processing unit has separate memory regions allocated to each of the multiple threads to be processed. For each thread, the scoreboard memory stores register identifiers of registers that have pending writes. When an instruction is added to an instruction buffer, the register identifiers of the registers specified in the instruction are compared with the register identifiers stored in the scoreboard memory for that instruction's thread, and a multi-bit value representing the comparison result is generated. The multi-bit value is stored with the instruction in the instruction buffer and may be updated as instructions belonging to the same thread complete their execution. Before the instruction is issued for execution, this multi-bit value is checked. If this multi-bit value indicates that none of the registers specified in the instruction have pending writes, the instruction is allowed to issue for execution.Type: GrantFiled: September 18, 2008Date of Patent: July 17, 2012Assignee: NVIDIA CorporationInventors: Brett W. Coon, Peter C. Mills, Stuart F. Oberman, Ming Y. Siu
-
Patent number: 8214625Abstract: One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.Type: GrantFiled: November 26, 2008Date of Patent: July 3, 2012Assignee: NVIDIA CorporationInventors: John R. Nickolls, Lars Nyland, Peter C. Mills, Jeremy Sugerman, Timothy Foley, Brian Fahs, Michael Garland, David P. Luebke
-
Patent number: 8200947Abstract: One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.Type: GrantFiled: March 24, 2008Date of Patent: June 12, 2012Assignee: NVIDIA CorporationInventors: John R. Nickolls, Lars Nyland, Peter C. Mills, Jeremy Sugerman, Timothy Foley, Brian Fahs, Michael Garland, David P. Luebke