Patents by Inventor Stephen William Keckler

Stephen William Keckler has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Sparse convolutional neural network accelerator

Patent number: 11847550

Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.

Type: Grant

Filed: December 4, 2020

Date of Patent: December 19, 2023

Assignee: NVIDIA Corporation

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
LOCATING A MEMORY UNIT ASSOCIATED WITH A MEMORY ADDRESS UTILIZING A MAPPER

Publication number: 20230297499

Abstract: A mapper within a single-level memory system may facilitate memory localization to reduce the energy and latency of memory accesses within the single-level memory system. The mapper may translate a memory request received from a processor for implementation at a data storage entity, where the translating identifies a data storage entity and a starting location within the data storage entity where the data associated with the memory request is located. This data storage entity may be co-located with the processor that sent the request, which may enable the localization of memory and significantly improve the performance of memory usage by reducing an energy of data access and increasing data bandwidth.

Type: Application

Filed: January 21, 2022

Publication date: September 21, 2023

Inventors: William James Dally, Stephen William Keckler, Carl Thomas Gray, James Michael O’Connor
SINGLE-CYCLE BYTE CORRECTING AND MULTI-BYTE DETECTING ERROR CODE

Publication number: 20230089736

Abstract: A memory device and a system that implements a single symbol correction, double symbol detection (SSC-DSD+) error correction scheme are provided. The scheme is implemented by calculating four syndrome symbols in accordance with a Reed-Solomon (RS) codeword; determining three location bytes in accordance with three corresponding pairs of syndrome symbols in the four syndrome symbols; and generating an output based on a comparison of the three location bytes. The output may include: corrected data responsive to determining that the three location bytes match; an indication of a detected-and-corrected error (DCE) responsive to determining that two of the three location bytes match; or an indication of a detected-yet-uncorrected error (DUE) responsive to determining that none of the three location bytes match. A variant of the SSC-DSD+ decoder may be implemented using a carry-free subtraction operation to perform sanity checking.

Type: Application

Filed: September 21, 2022

Publication date: March 23, 2023

Inventors: Michael Brendan Sullivan, Nirmal R. Saxena, Stephen William Keckler
Techniques for configuring parallel processors for different application domains

Patent number: 11609879

Abstract: In various embodiments, a parallel processor includes a parallel processor module implemented within a first die and a memory system module implemented within a second die. The memory system module is coupled to the parallel processor module via an on-package link. The parallel processor module includes multiple processor cores and multiple cache memories. The memory system module includes a memory controller for accessing a DRAM. Advantageously, the performance of the parallel processor module can be effectively tailored for memory bandwidth demands that typify one or more application domains via the memory system module.

Type: Grant

Filed: July 1, 2021

Date of Patent: March 21, 2023

Assignee: NVIDIA Corporation

Inventors: Yaosheng Fu, Evgeny Bolotin, Niladrish Chatterjee, Stephen William Keckler, David Nellans
TECHNIQUES FOR CONFIGURING PARALLEL PROCESSORS FOR DIFFERENT APPLICATION DOMAINS

Publication number: 20220276984

Abstract: In various embodiments, a parallel processor includes a parallel processor module implemented within a first die and a memory system module implemented within a second die. The memory system module is coupled to the parallel processor module via an on-package link. The parallel processor module includes multiple processor cores and multiple cache memories. The memory system module includes a memory controller for accessing a DRAM. Advantageously, the performance of the parallel processor module can be effectively tailored for memory bandwidth demands that typify one or more application domains via the memory system module.

Type: Application

Filed: July 1, 2021

Publication date: September 1, 2022

Inventors: Yaosheng FU, Evgeny BOLOTIN, Niladrish CHATTERJEE, Stephen William KECKLER, David NELLANS
FLEXIBLE ACCELERATOR FOR A TENSOR WORKLOAD

Publication number: 20220083314

Abstract: Accelerators are generally utilized to provide high performance and energy efficiency for tensor algorithms. Currently, an accelerator will be specifically designed around the fundamental properties of the tensor algorithm and shape it supports, and thus will exhibit sub-optimal performance when used for other tensor algorithms and shapes. The present disclosure provides a flexible accelerator for tensor workloads. The flexible accelerator can be a flexible tensor accelerator or a FPGA having a dynamically configurable inter-PE network supporting different tensor shapes and different tensor algorithms including at least a GEMM algorithm, a 2D CNN algorithm, and a 3D CNN algorithm, and/or having a flexible DPU in which a dot product length of its dot product sub-units is configurable based on a target compute throughput that is less than or equal to a maximum throughput of the flexible DPU.

Type: Application

Filed: June 9, 2021

Publication date: March 17, 2022

Inventors: Po An Tsai, Neal Crago, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler
FLEXIBLE ACCELERATOR FOR A TENSOR WORKLOAD

Publication number: 20220083500

Abstract: Accelerators are generally utilized to provide high performance and energy efficiency for tensor algorithms. Currently, an accelerator will be specifically designed around the fundamental properties of the tensor algorithm and shape it supports, and thus will exhibit sub-optimal performance when used for other tensor algorithms and shapes. The present disclosure provides a flexible accelerator for tensor workloads. The flexible accelerator can be a flexible tensor accelerator or a FPGA having a dynamically configurable inter-PE network supporting different tensor shapes and different tensor algorithms including at least a GEMM algorithm, a 2D CNN algorithm, and a 3D CNN algorithm, and/or having a flexible DPU in which a dot product length of its dot product sub-units is configurable based on a target compute throughput.

Type: Application

Filed: June 9, 2021

Publication date: March 17, 2022

Inventors: Po An Tsai, Neal Crago, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler
Sparse convolutional neural network accelerator

Patent number: 10997496

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. Compressed-sparse data is received for input to a processing element, wherein the compressed-sparse data encodes non-zero elements and corresponding multi-dimensional positions. The non-zero elements are processed in parallel by the processing element to produce a plurality of result values. The corresponding multi-dimensional positions are processed in parallel by the processing element to produce destination addresses for each result value in the plurality of result values. Each result value is transmitted to a destination accumulator associated with the destination address for the result value.

Type: Grant

Filed: March 14, 2017

Date of Patent: May 4, 2021

Assignee: NVIDIA Corporation

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20210089864

Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.

Type: Application

Filed: December 4, 2020

Publication date: March 25, 2021

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
Sparse convolutional neural network accelerator

Patent number: 10891538

Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.

Type: Grant

Filed: July 25, 2017

Date of Patent: January 12, 2021

Assignee: NVIDIA Corporation

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
Sparse convolutional neural network accelerator

Patent number: 10860922

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.

Type: Grant

Filed: November 18, 2019

Date of Patent: December 8, 2020

Assignee: NVIDIA Corporation

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20200082254

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.

Type: Application

Filed: November 18, 2019

Publication date: March 12, 2020

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
Sparse convolutional neural network accelerator

Patent number: 10528864

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.

Type: Grant

Filed: March 14, 2017

Date of Patent: January 7, 2020

Assignee: NVIDIA Corporation

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
Patch memory system

Patent number: 9934153

Abstract: A patch memory system for accessing patches from a memory is disclosed. A patch is an abstraction that refers to a contiguous, array of data that is a subset of an N-dimensional array of data. The patch memory system includes a tile cache, and is configured to fetch data associated with a patch by determining one or more tiles associated with an N-dimensional array of data corresponding to the patch, and loading data for the one or more tiles from the memory into the tile cache. The N-dimensional array of data may be a two-dimensional (2D) digital image comprising a plurality of pixels. A patch of the 2D digital image may refer to a 2D subset of the image.

Type: Grant

Filed: June 30, 2015

Date of Patent: April 3, 2018

Assignee: NVIDIA Corporation

Inventors: Jason Lavar Clemons, Chih-Chi Cheng, Daniel Robert Johnson, Stephen William Keckler, Iuri Frosio, Yun-Ta Tsai
SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20180046916

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. Compressed-sparse data is received for input to a processing element, wherein the compressed-sparse data encodes non-zero elements and corresponding multi-dimensional positions. The non-zero elements are processed in parallel by the processing element to produce a plurality of result values. The corresponding multi-dimensional positions are processed in parallel by the processing element to produce destination addresses for each result value in the plurality of result values. Each result value is transmitted to a destination accumulator associated with the destination address for the result value.

Type: Application

Filed: March 14, 2017

Publication date: February 15, 2018

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20180046900

Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.

Type: Application

Filed: July 25, 2017

Publication date: February 15, 2018

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20180046906

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.

Type: Application

Filed: March 14, 2017

Publication date: February 15, 2018

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
System and method for configuring a channel

Patent number: 9886409

Abstract: An integrated circuit device comprises pin resources, a memory controller circuit, a network interface controller circuit, and transmitter circuitry. The pin resources comprise pads coupled to off-chip pins of the integrated circuit device. The memory controller circuit comprises a first interface and the network interface controller circuit comprises a second interface. The transmitter circuitry is configurable to selectively couple either a first signal of the first interface or a second signal of the second interface to a first pad of the pin resources based on a pin distribution between the first interface and the second interface.

Type: Grant

Filed: May 18, 2015

Date of Patent: February 6, 2018

Assignee: NVIDIA Corporation

Inventors: Stephen William Keckler, William J. Dally, Steven Lee Scott, Brucek Kurdo Khailany, Michael Allen Parker
Approach to adaptive allocation of shared resources in computer systems

Patent number: 9742869

Abstract: A request management subsystem is configured to establish service classes for clients that issue requests for a shared resource on a computer system. The subsystem also is configured to determine the state of the system with respect to bandwidth, current latency, frequency and voltage levels, among other characteristics. Further, the subsystem is configured to evaluate the requirements of each client with respect to latency sensitivity and required bandwidth, among other characteristics. Finally, the subsystem is configured to schedule access to shared resources, based on the priority class of each client, the demands of the application, and the state of the system. With this approach, the subsystem may enable all clients to perform optimally or, alternatively, may cause all clients to experience an equal reduction in performance.

Type: Grant

Filed: December 9, 2013

Date of Patent: August 22, 2017

Assignee: NVIDIA Corporation

Inventors: Evgeny Bolotin, Zvi Guz, Adwait Jog, Stephen William Keckler, Michael Allen Parker
SYSTEM AND METHOD FOR CONFIGURING A CHANNEL

Publication number: 20170212857

Abstract: An integrated circuit device comprises pin resources, a memory controller circuit, a network interface controller circuit, and transmitter circuitry. The pin resources comprise pads coupled to off-chip pins of the integrated circuit device. The memory controller circuit comprises a first interface and the network interface controller circuit comprises a second interface. The transmitter circuitry is configurable to selectively couple either a first signal of the first interface or a second signal of the second interface to a first pad of the pin resources based on a pin distribution between the first interface and the second interface.

Type: Application

Filed: May 18, 2015

Publication date: July 27, 2017

Inventors: Stephen William Keckler, William J. Dally, Steven Lee Scott, Brucek Kurdo Khailany, Michael Allen Parker

1 2 next