Patents by Inventor Stephen W. Keckler

Stephen W. Keckler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

EFFICIENT NEURAL NETWORK ACCELERATOR DATAFLOWS

Publication number: 20200293867

Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.

Type: Application

Filed: November 4, 2019

Publication date: September 17, 2020

Applicant: NVIDIA Corp.

Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
SYSTEM AND METHODS FOR HARDWARE-SOFTWARE COOPERATIVE PIPELINE ERROR DETECTION

Publication number: 20200210276

Abstract: An error reporting system utilizes a parity checker to receive data results from execution of an original instruction and a parity bit for the data. A decoder receives an error correcting code (ECC) for data resulting from execution of a shadow instruction of the original instruction, and data error correction is initiated on the original instruction result on condition of a mismatch between the parity bit and the original instruction result, and the decoder asserting a correctable error in the original instruction result.

Type: Application

Filed: March 6, 2020

Publication date: July 2, 2020

Applicant: NVIDIA Corp.

Inventors: Michael Sullivan, Siva Hari, Brian Zimmer, Timothy Tsai, Stephen W. Keckler
Data multicasting with router replication and target instruction identification in a distributed multi-core processing architecture

Patent number: 10698859

Abstract: Methods, procedures, apparatuses, computer programs, computer-accessible mediums, processing arrangements and systems generally related to data multi-casting in a distributed processor architecture are described. Various implementations may include identifying a plurality of target instructions that are configured to receive a first message from a source; providing target routing instructions to the first message for each of the target instructions including selected information commonly shared by the target instructions; and, when two of the identified target instructions are located in different directions from one another relative to a router, replicating the first message and routing the replicated messages to each of the identified target instructions in the different directions.

Type: Grant

Filed: September 18, 2009

Date of Patent: June 30, 2020

Assignee: The Board of Regents of the University of Texas System

Inventors: Doug Burger, Stephen W. Keckler, Dong Li
System and methods for hardware-software cooperative pipeline error detection

Patent number: 10621022

Abstract: A family of software-hardware cooperative mechanisms to accelerate intra-thread duplication leverage the register file error detection hardware to implicitly check the data from duplicate instructions, avoiding the overheads of instruction checking and enforcing low-latency error detection with strict error containment guarantees.

Type: Grant

Filed: December 18, 2017

Date of Patent: April 14, 2020

Assignee: NVIDIA Corp.

Inventors: Michael Sullivan, Siva Hari, Brian Zimmer, Timothy Tsai, Stephen W Keckler
SCALABLE MULTI-DIE DEEP LEARNING SYSTEM

Publication number: 20200082246

Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.

Type: Application

Filed: July 19, 2019

Publication date: March 12, 2020

Applicant: NVIDIA Corp.

Inventors: Yakun Shao, Rangharajan Venkatesan, Nan Jiang, Brian Matthew Zimmer, Jason Clemons, Nathaniel Pinckney, Matthew R. Fojtik, William James Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany
DEEP NEURAL NETWORK ACCELERATOR WITH FINE-GRAINED PARALLELISM DISCOVERY

Publication number: 20190370645

Abstract: A sparse convolutional neural network accelerator system that dynamically and efficiently identifies fine-grained parallelism in sparse convolution operations. The system determines matching pairs of non-zero input activations and weights from the compacted input activation and weight arrays utilizing a scalable, dynamic parallelism discovery unit (PDU) that performs a parallel search on the input activation array and the weight array to identify reducible input activation and weight pairs.

Type: Application

Filed: January 23, 2019

Publication date: December 5, 2019

Inventors: Ching-En Lee, Yakun Shao, Angshuman Parashar, Joel Emer, Stephen W. Keckler
SYSTEM AND METHODS FOR HARDWARE-SOFTWARE COOPERATIVE PIPELINE ERROR DETECTION

Publication number: 20190102242

Abstract: A family of software-hardware cooperative mechanisms to accelerate intra-thread duplication leverage the register file error detection hardware to implicitly check the data from duplicate instructions, avoiding the overheads of instruction checking and enforcing low-latency error detection with strict error containment guarantees.

Type: Application

Filed: December 18, 2017

Publication date: April 4, 2019

Inventors: Michael Sullivan, Siva Hari, Brian Zimmer, Timothy Tsai, Stephen W. Keckler
OPTIMIZING SOFTWARE-DIRECTED INSTRUCTION REPLICATION FOR GPU ERROR DETECTION

Publication number: 20190102180

Abstract: Software-only and software-hardware optimizations to reduce the overhead of intra -thread instruction duplication on a GPU or other instruction processor are disclosed. The optimizations trade off error containment for performance and include ISA extensions with limited hardware changes and area costs.

Type: Application

Filed: October 3, 2018

Publication date: April 4, 2019

Inventors: Siva Hari, Michael Sullivan, Timothy Tsai, Stephen W. Keckler, Abdulrahman Mahmoud
Combined branch target and predicate prediction

Patent number: 9703565

Abstract: Embodiments provide methods, apparatus, systems, and computer readable media associated with predicting predicates and branch targets during execution of programs using combined branch target and predicate predictions. The predictions may be made using one or more prediction control flow graphs which represent predicates in instruction blocks and branches between blocks in a program. The prediction control flow graphs may be structured as trees such that each node in the graphs is associated with a predicate instruction, and each leaf associated with a branch target which jumps to another block. During execution of a block, a prediction generator may take a control point history and generate a prediction. Following the path suggested by the prediction through the tree, both predicate values and branch targets may be predicted. Other embodiments may be described and claimed.

Type: Grant

Filed: March 25, 2015

Date of Patent: July 11, 2017

Assignee: The Board of Regents of the University of Texas System

Inventors: Douglas C. Burger, Stephen W. Keckler
Method and apparatus for congestion-aware routing in a computer interconnection network

Patent number: 9571399

Abstract: The present disclosure relates to an example of a method for a first router to adaptively determine status within a network. The network may include the first router, a second router and a third router. The method for the first router may comprise determining status information regarding the second router located in the network, and transmitting the status information to the third router located in the network. The second router and the third router may be indirectly coupled to one another.

Type: Grant

Filed: April 7, 2014

Date of Patent: February 14, 2017

Assignee: THE BOARD OF REGENTS OF THE UNIVERSITY OF TEXAS SYSTEM

Inventors: Paul Gratz, Boris Grot, Stephen W. Keckler
COMBINED BRANCH TARGET AND PREDICATE PREDICTION

Publication number: 20150199199

Abstract: Embodiments provide methods, apparatus, systems, and computer readable media associated with predicting predicates and branch targets during execution of programs using combined branch target and predicate predictions. The predictions may be made using one or more prediction control flow graphs which represent predicates in instruction blocks and branches between blocks in a program. The prediction control flow graphs may be structured as trees such that each node in the graphs is associated with a predicate instruction, and each leaf associated with a branch target which jumps to another block. During execution of a block, a prediction generator may take a control point history and generate a prediction. Following the path suggested by the prediction through the tree, both predicate values and branch targets may be predicted. Other embodiments may be described and claimed.

Type: Application

Filed: March 25, 2015

Publication date: July 16, 2015

Inventors: DOUGLAS C. BURGER, STEPHEN W. KECKLER
Combined branch target and predicate prediction for instruction blocks

Patent number: 9021241

Abstract: Embodiments provide methods, apparatus, systems, and computer readable media associated with predicting predicates and branch targets during execution of programs using combined branch target and predicate predictions. The predictions may be made using one or more prediction control flow graphs which represent predicates in instruction blocks and branches between blocks in a program. The prediction control flow graphs may be structured as trees such that each node in the graphs is associated with a predicate instruction, and each leaf associated with a branch target which jumps to another block. During execution of a block, a prediction generator may take a control point history and generate a prediction. Following the path suggested by the prediction through the tree, both predicate values and branch targets may be predicted. Other embodiments may be described and claimed.

Type: Grant

Filed: June 18, 2010

Date of Patent: April 28, 2015

Assignee: The Board of Regents of The University of Texas System

Inventors: Douglas C. Burger, Stephen W. Keckler
Unordered load/store queue

Patent number: 8447911

Abstract: A method and processor for providing full load/store queue functionality to an unordered load/store queue for a processor with out-of-order execution. Load and store instructions are inserted in a load/store queue in execution order. Each entry in the load/store queue includes an identification corresponding to a program order. Conflict detection in such an unordered load/store queue may be performed by searching a first CAM for all addresses that are the same or overlap with the address of the load or store instruction to be executed. A further search may be performed in a second CAM to identify those entries that are associated with younger or older instructions with respect to the sequence number of the load or store instruction to be executed. The output results of the Address CAM and Age CAM are logically ANDed.

Type: Grant

Filed: July 2, 2008

Date of Patent: May 21, 2013

Assignee: Board of Regents, University of Texas System

Inventors: Douglas C. Burger, Stephen W. Keckler, Robert McDonald, Lakshminarasimhan Sethumadhavan, Franziska Roesner
Method, system and computer-accessible medium for providing a distributed predicate prediction

Patent number: 8433885

Abstract: Examples of a system, method and computer accessible medium are provided to generate a predicate prediction for a distributed multi-core architecture. Using such system, method and computer accessible medium, it is possible to intelligently encode approximate predicate path information on branch instructions. Using this statically generated information, distributed predicate predictors can generate dynamic predicate histories that can facilitate an accurate prediction of high-confidence predicates, while minimizing the communication between the cores.

Type: Grant

Filed: September 9, 2009

Date of Patent: April 30, 2013

Assignee: Board of Regents of the University of Texas System

Inventors: Doug Burger, Stephen W. Keckler, Hadi Esmaeilzadeh
COMBINED BRANCH TARGET AND PREDICATE PREDICTION

Publication number: 20130086370

Abstract: Embodiments provide methods, apparatus, systems, and computer readable media associated with predicting predicates and branch targets during execution of programs using combined branch target and predicate predictions. The predictions may be made using one or more prediction control flow graphs which represent predicates in instruction blocks and branches between blocks in a program. The prediction control flow graphs may be structured as trees such that each node in the graphs is associated with a predicate instruction, and each leaf associated with a branch target which jumps to another block. During execution of a block, a prediction generator may take a control point history and generate a prediction. Following the path suggested by the prediction through the tree, both predicate values and branch targets may be predicted. Other embodiments may be described and claimed.

Type: Application

Filed: June 18, 2010

Publication date: April 4, 2013

Applicant: The Board of Regents of The University of Texas System

Inventors: Douglas C. Burger, Stephen W. Keckler
Scalable bus-based on-chip interconnection networks

Patent number: 8307116

Abstract: The present disclosure generally relates to systems for routing data across a multinodal network. Example systems include a multinodal array having a plurality of nodes and a plurality of physical communication channels connecting the nodes. At least one of the physical communication channels may be configured to route data from a first node to two or more other destination nodes of the plurality of nodes. The present disclosure also generally relates to methods for routing data across a multinodal network and computer accessible mediums having stored thereon computer executable instructions for performing techniques for routing data across a multinodal network.

Type: Grant

Filed: June 19, 2009

Date of Patent: November 6, 2012

Assignee: Board of Regents of the University of Texas System

Inventors: Stephen W. Keckler, Boris Grot
Method and apparatus for congestion-aware routing in a computer interconnection network

Patent number: 8285900

Abstract: The present disclosure relates to an example of a method for a first router to adaptively determine status within a network. The network may include the first router, a second router and a third router. The method for the first router may comprise determining status information regarding the second router located in the network, and transmitting the status information to the third router located in the network. The second router and the third router may be indirectly coupled to one another.

Type: Grant

Filed: February 17, 2009

Date of Patent: October 9, 2012

Assignee: The Board of Regents of the University of Texas System

Inventors: Paul Gratz, Boris Grot, Stephen W. Keckler
Dynamically composing processor cores to form logical processors

Patent number: 8180997

Abstract: A method, system and computer program product for dynamically composing processor cores to form logical processors. Processor cores are composable in that the processor cores are dynamically allocated to form a logical processor to handle a change in the operating status. Once a change in the operating status is detected, a mechanism may be triggered to recompose one or more processor cores into a logical processor to handle the change in the operating status. An analysis may be performed as to how one or more processor cores should be recomposed to handle the change in the operating status. After the analysis, the one or more processor cores are recomposed into the logical processor to handle the change in the operating status. By dynamically allocating the processor cores to handle the change in the operating status, performance and power efficiency is improved.

Type: Grant

Filed: July 2, 2008

Date of Patent: May 15, 2012

Assignee: Board of Regents, University of Texas System

Inventors: Douglas C. Burger, Stephen W. Keckler, Robert McDonald, Paul Gratz, Nitya Ranganathan, Lakshminarasimhan Sethumadhavan, Karthikevan Sankaralingam, Ramadass Nagarajan, Changkyu Kim, Haiming Liu
Control-flow prediction using multiple independent predictors

Patent number: 8127119

Abstract: The present disclosure generally describes computing systems with a multi-core processor comprising one or more branch predictor arrangements. The branch predictor are configured to predict a single and complete flow of program instructions associated therewith and to be performed on at least one processor core of the computing system. Overall processor performance and physical scalability may be improved by the described methods.

Type: Grant

Filed: December 5, 2008

Date of Patent: February 28, 2012

Assignee: The Board of Regents of the University of Texas System

Inventors: Doug Burger, Stephen W. Keckler, Nitya Ranganathan
Computing nodes for executing groups of instructions

Patent number: 8055881

Abstract: A computation node according to various embodiments of the invention includes at least one input port capable of being coupled to at least one first other 5 computation node, a first store coupled to the input port(s) to store input data, a second store to receive and store instructions, an instruction wakeup unit to match the input data to the instructions, at least one execution unit to execute the instructions, using the input data to produce output data, and at least one output port capable of being coupled to at least one second other computation node. The node may also include a router to direct the output data from the output port(s) to the second other node. A system according to various embodiments of the invention includes and external instruction sequencer to fetch a group of instructions, and one or more interconnected, preselected computational nodes.

Type: Grant

Filed: June 10, 2008

Date of Patent: November 8, 2011

Assignee: Board of Regents, University of Texas System

Inventors: Douglas C. Burger, Stephen W. Keckler, Karthikevan Sankaralingam, Ramadass Nagarajan

prev 1 2 3 next