Patents by Inventor Stephen W. Keckler

Stephen W. Keckler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Hierarchical network for stacked memory system

Patent number: 11977766

Abstract: A hierarchical network enables access for a stacked memory system including or more memory dies that each include multiple memory tiles. The processor die includes multiple processing tiles that are stacked with the one or more memory die. The memory tiles that are vertically aligned with a processing tile are directly coupled to the processing tile and comprise the local memory block for the processing tile. The hierarchical network provides access paths for each processing tile to access the processing tile's local memory block, the local memory block coupled to a different processing tile within the same processing die, memory tiles in a different die stack, and memory tiles in a different device. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50× for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10×.

Type: Grant

Filed: February 28, 2022

Date of Patent: May 7, 2024

Assignee: NVIDIA Corporation

Inventors: William James Dally, Carl Thomas Gray, Stephen W. Keckler, James Michael O'Connor
Adversarial scenarios for safety testing of autonomous vehicles

Patent number: 11977386

Abstract: Techniques to generate driving scenarios for autonomous vehicles characterize a path in a driving scenario according to metrics such as narrowness and effort. Nodes of the path are assigned a time for action to avoid collision from the node. The generated scenarios may be simulated in a computer.

Type: Grant

Filed: November 18, 2022

Date of Patent: May 7, 2024

Assignee: NVIDIA CORP.

Inventors: Siva Kumar Sastry Hari, Iuri Frosio, Zahra Ghodsi, Anima Anandkumar, Timothy Tsai, Stephen W. Keckler, Alejandro Troccoli
Deep neural network accelerator with fine-grained parallelism discovery

Patent number: 11966835

Abstract: A sparse convolutional neural network accelerator system that dynamically and efficiently identifies fine-grained parallelism in sparse convolution operations. The system determines matching pairs of non-zero input activations and weights from the compacted input activation and weight arrays utilizing a scalable, dynamic parallelism discovery unit (PDU) that performs a parallel search on the input activation array and the weight array to identify reducible input activation and weight pairs.

Type: Grant

Filed: January 23, 2019

Date of Patent: April 23, 2024

Assignee: NVIDIA CORP.

Inventors: Ching-En Lee, Yakun Shao, Angshuman Parashar, Joel Emer, Stephen W. Keckler
AUGMENTING LEGACY NEURAL NETWORKS FOR FLEXIBLE INFERENCE

Publication number: 20230325670

Abstract: A technique for dynamically configuring and executing an augmented neural network in real-time according to performance constraints also maintains the legacy neural network execution path. A neural network model that has been trained for a task is augmented with low-compute “shallow” phases paired with each legacy phase and the legacy phases of the neural network model are held constant (e.g., unchanged) while the shallow phases are trained. During inference, one or more of the shallow phases can be selectively executed in place of the corresponding legacy phase. Compared with the legacy phases, the shallow phases are typically less accurate, but have reduced latency and consume less power. Therefore, processing using one or more of the shallow phases in place of one or more of the legacy phases enables the augmented neural network to dynamically adapt to changes in the execution environment (e.g., processing load or performance requirement).

Type: Application

Filed: August 18, 2022

Publication date: October 12, 2023

Inventors: Jason Lavar Clemons, Stephen W. Keckler, Iuri Frosio, Jose Manuel Alvarez Lopez, Maying Shen
APPLICATION PARTITIONING FOR LOCALITY IN A STACKED MEMORY SYSTEM

Publication number: 20230315651

Abstract: Embodiments of the present disclosure relate to application partitioning for locality in a stacked memory system. In an embodiment, one or more memory dies are stacked on the processor die. The processor die includes multiple processing tiles and each memory die includes multiple memory tiles. Vertically aligned memory tiles are directly coupled to and comprise the local memory block for a corresponding processing tile. An application program that operates on dense multi-dimensional arrays (matrices) may partition the dense arrays into sub-arrays associated with program tiles. Each program tile is executed by a processing tile using the processing tile's local memory block to process the associated sub-array. Data associated with each sub-array is stored in a local memory block and the processing tile corresponding to the local memory block executes the program tile to process the sub-array data.

Type: Application

Filed: March 30, 2022

Publication date: October 5, 2023

Inventors: William James Dally, Carl Thomas Gray, Stephen W. Keckler, James Michael O'Connor
Scalable multi-die deep learning system

Patent number: 11769040

Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture implemented on a semiconductor package. The package includes multiple chips, each with a central processing element, a global memory buffer, and processing elements. Each processing element includes a weight buffer, an activation buffer, and multiply-accumulate units to combine, in parallel, the weight values and the activation values.

Type: Grant

Filed: July 19, 2019

Date of Patent: September 26, 2023

Assignee: NVIDIA CORP.

Inventors: Yakun Shao, Rangharajan Venkatesan, Nan Jiang, Brian Matthew Zimmer, Jason Clemons, Nathaniel Pinckney, Matthew R Fojtik, William James Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany
HIERARCHICAL NETWORK FOR STACKED MEMORY SYSTEM

Publication number: 20230297269

Abstract: A hierarchical network enables access for a stacked memory system including or more memory dies that each include multiple memory tiles. The processor die includes multiple processing tiles that are stacked with the one or more memory die. The memory tiles that are vertically aligned with a processing tile are directly coupled to the processing tile and comprise the local memory block for the processing tile. The hierarchical network provides access paths for each processing tile to access the processing tile’s local memory block, the local memory block coupled to a different processing tile within the same processing die, memory tiles in a different die stack, and memory tiles in a different device. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50x for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10x.

Type: Application

Filed: February 28, 2022

Publication date: September 21, 2023

Inventors: William James Dally, Carl Thomas Gray, Stephen W. Keckler, James Michael O’Connor
MEMORY STACKED ON PROCESSOR FOR HIGH BANDWIDTH

Publication number: 20230275068

Abstract: Embodiments of the present disclosure relate to memory stacked on processor for high bandwidth. Systems and methods are disclosed for providing a one-level memory for a processing system by stacking bulk memory on a processor die. In an embodiment, one or more memory dies are stacked on the processor die. The processor die includes multiple processing tiles, where each tile includes a processing unit, mapper, and tile network. Each memory die includes multiple memory tiles. The processing tile is coupled to each memory tile that is above or below the processing tile. The vertically aligned memory tiles comprise the local memory block for the processing tile. The ratio of memory bandwidth (byte) to floating-point operation (B:F) may improve 50× for accessing the local memory block compared with conventional memory. Additionally, the energy consumed to transfer each bit may be reduced by 10×.

Type: Application

Filed: February 28, 2022

Publication date: August 31, 2023

Inventors: William James Dally, Carl Thomas Gray, Stephen W. Keckler, James Michael O'Connor
AUGMENTING AND DYNAMICALLY CONFIGURING A NEURAL NETWORK MODEL FOR REAL-TIME SYSTEMS

Publication number: 20230111375

Abstract: A neural network model is augmented for dynamic configuration and execution in real-time according to performance constraints. In an embodiment, the neural network model is a transformer neural network model. The performance constraints may include a metric, such as inferencing execution time or energy consumption and a target value for the metric. The augmented neural network model is characterized for various configurations and settings are determined corresponding to a variety of the performance constraints. One or more performance constraints may be provided as an input to dynamically select a configuration of the augmented neural network model. Through dynamic configuration, the augmented neural network model may adapt to real-time changes in the performance constraints. However, the trained weights for an original (before augmentation) neural network model may be used by the augmented neural network model without modification.

Type: Application

Filed: April 20, 2022

Publication date: April 13, 2023

Inventors: Jason Lavar Clemons, Kavya Sreedhar, Stephen W. Keckler
ADVERSARIAL SCENARIOS FOR SAFETY TESTING OF AUTONOMOUS VEHICLES

Publication number: 20230079196

Abstract: Techniques to generate driving scenarios for autonomous vehicles characterize a path in a driving scenario according to metrics such as narrowness and effort. Nodes of the path are assigned a time for action to avoid collision from the node. The generated scenarios may be simulated in a computer.

Type: Application

Filed: November 18, 2022

Publication date: March 16, 2023

Applicant: NVIDIA Corp.

Inventors: Siva Kumar Sastry Hari, Iuri Frosio, Zahra Ghodsi, Anima Anandkumar, Timothy Tsai, Stephen W. Keckler, Alejandro Troccoli
Adversarial scenarios for safety testing of autonomous vehicles

Patent number: 11550325

Abstract: Techniques to generate driving scenarios for autonomous vehicles characterize a path in a driving scenario according to metrics such as narrowness and effort. Nodes of the path are assigned a time for action to avoid collision from the node. The generated scenarios may be simulated in a computer.

Type: Grant

Filed: June 10, 2020

Date of Patent: January 10, 2023

Assignee: NVIDIA CORP.

Inventors: Siva Kumar Sastry Hari, Iuri Frosio, Zahra Ghodsi, Anima Anandkumar, Timothy Tsai, Stephen W. Keckler, Alejandro Troccoli
SYSTEM AND METHODS FOR HARDWARE-SOFTWARE COOPERATIVE PIPELINE ERROR DETECTION

Publication number: 20220269558

Abstract: An error reporting system utilizes a parity checker to receive data results from execution of an original instruction and a parity bit for the data. A decoder receives an error correcting code (ECC) for data resulting from execution of a shadow instruction of the original instruction, and data error correction is initiated on the original instruction result on condition of a mismatch between the parity bit and the original instruction result, and the decoder asserting a correctable error in the original instruction result.

Type: Application

Filed: May 5, 2022

Publication date: August 25, 2022

Applicant: NVIDIA Corp.

Inventors: Michael Sullivan, Siva Kumar Sastry Hari, Brian Matthew Zimmer, Timothy Tsai, Stephen W. Keckler
System and methods for hardware-software cooperative pipeline error detection

Patent number: 11409597

Abstract: An error reporting system utilizes a parity checker to receive data results from execution of an original instruction and a parity bit for the data. A decoder receives an error correcting code (ECC) for data resulting from execution of a shadow instruction of the original instruction, and data error correction is initiated on the original instruction result on condition of a mismatch between the parity bit and the original instruction result, and the decoder asserting a correctable error in the original instruction result.

Type: Grant

Filed: March 6, 2020

Date of Patent: August 9, 2022

Assignee: NVIDIA Corp.

Inventors: Michael Sullivan, Siva Hari, Brian Zimmer, Timothy Tsai, Stephen W. Keckler
Tensor-based driving scenario characterization

Patent number: 11390301

Abstract: Techniques to characterize driving scenarios for autonomous vehicles characterize a path in a driving scenario according to metrics such as narrowness and effort. The scenarios may be characterized using a tree-based or tensor-based approach.

Type: Grant

Filed: June 10, 2020

Date of Patent: July 19, 2022

Assignee: NVIDIA Corp.

Inventors: Siva Kumar Sastry Hari, Iuri Frosio, Zahra Ghodsi, Anima Anandkumar, Timothy Tsai, Stephen W. Keckler
Efficient Neural Network Accelerator Dataflows

Publication number: 20220076110

Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.

Type: Application

Filed: November 19, 2021

Publication date: March 10, 2022

Applicant: NVIDIA Corp.

Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
Efficient neural network accelerator dataflows

Patent number: 11270197

Abstract: A distributed deep neural net (DNN) utilizing a distributed, tile-based architecture includes multiple chips, each with a central processing element, a global memory buffer, and a plurality of additional processing elements. Each additional processing element includes a weight buffer, an activation buffer, and vector multiply-accumulate units to combine, in parallel, the weight values and the activation values using stationary data flows.

Type: Grant

Filed: November 4, 2019

Date of Patent: March 8, 2022

Assignee: NVIDIA Corp.

Inventors: Yakun Shao, Rangharajan Venkatesan, Miaorong Wang, Daniel Smith, William James Dally, Joel Emer, Stephen W. Keckler, Brucek Khailany
ADVERSARIAL SCENARIOS FOR SAFETY TESTING OF AUTONOMOUS VEHICLES

Publication number: 20210389769

Abstract: Techniques to generate driving scenarios for autonomous vehicles characterize a path in a driving scenario according to metrics such as narrowness and effort. Nodes of the path are assigned a time for action to avoid collision from the node. The generated scenarios may be simulated in a computer.

Type: Application

Filed: June 10, 2020

Publication date: December 16, 2021

Applicant: NVIDIA Corp.

Inventors: Siva Kumar Sastry Hari, Iuri Frosio, Zahra Ghodsi, Anima Anandkumar, Timothy Tsai, Stephen W. Keckler, Alejandro Troccoli
TENSOR-BASED DRIVING SCENARIO CHARACTERIZATION

Publication number: 20210387643

Abstract: Techniques to characterize driving scenarios for autonomous vehicles characterize a path in a driving scenario according to metrics such as narrowness and effort. The scenarios may be characterized using a tree-based or tensor-based approach.

Type: Application

Filed: June 10, 2020

Publication date: December 16, 2021

Applicant: NVIDIA Corp.

Inventors: Siva Kumar Sastry Hari, Iuri Frosio, Zahra Ghodsi, Anima Anandkumar, Timothy Tsai, Stephen W. Keckler
OPTIMIZING SOFTWARE-DIRECTED INSTRUCTION REPLICATION FOR GPU ERROR DETECTION

Publication number: 20210004235

Abstract: A thread execution method in a processor includes executing original instructions of a first thread in a first execution lane of the processor, and interleaving execution of duplicated instructions of the first thread with execution of original instructions of a second thread in a second execution lane of the processor.

Type: Application

Filed: September 17, 2020

Publication date: January 7, 2021

Applicant: NVIDIA Corp.

Inventors: Siva Kumar Sastry Hari, Michael Sullivan, Timothy Tsai, Stephen W. Keckler
Optimizing software-directed instruction replication for GPU error detection

Patent number: 10817289

Abstract: Software-only and software-hardware optimizations to reduce the overhead of intra-thread instruction duplication on a GPU or other instruction processor are disclosed. The optimizations trade off error containment for performance and include ISA extensions with limited hardware changes and area costs.

Type: Grant

Filed: October 3, 2018

Date of Patent: October 27, 2020

Assignee: NVIDIA Corp.

Inventors: Siva Hari, Michael Sullivan, Timothy Tsai, Stephen W. Keckler, Abdulrahman Mahmoud

1 2 3 next