Patents by Inventor Angshuman Parashar

Angshuman Parashar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Deep neural network accelerator with fine-grained parallelism discovery

Patent number: 11966835

Abstract: A sparse convolutional neural network accelerator system that dynamically and efficiently identifies fine-grained parallelism in sparse convolution operations. The system determines matching pairs of non-zero input activations and weights from the compacted input activation and weight arrays utilizing a scalable, dynamic parallelism discovery unit (PDU) that performs a parallel search on the input activation array and the weight array to identify reducible input activation and weight pairs.

Type: Grant

Filed: January 23, 2019

Date of Patent: April 23, 2024

Assignee: NVIDIA CORP.

Inventors: Ching-En Lee, Yakun Shao, Angshuman Parashar, Joel Emer, Stephen W. Keckler
Sparse convolutional neural network accelerator

Patent number: 11847550

Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.

Type: Grant

Filed: December 4, 2020

Date of Patent: December 19, 2023

Assignee: NVIDIA Corporation

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
FLEXIBLE ACCELERATOR FOR A TENSOR WORKLOAD

Publication number: 20220083500

Abstract: Accelerators are generally utilized to provide high performance and energy efficiency for tensor algorithms. Currently, an accelerator will be specifically designed around the fundamental properties of the tensor algorithm and shape it supports, and thus will exhibit sub-optimal performance when used for other tensor algorithms and shapes. The present disclosure provides a flexible accelerator for tensor workloads. The flexible accelerator can be a flexible tensor accelerator or a FPGA having a dynamically configurable inter-PE network supporting different tensor shapes and different tensor algorithms including at least a GEMM algorithm, a 2D CNN algorithm, and a 3D CNN algorithm, and/or having a flexible DPU in which a dot product length of its dot product sub-units is configurable based on a target compute throughput.

Type: Application

Filed: June 9, 2021

Publication date: March 17, 2022

Inventors: Po An Tsai, Neal Crago, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler
FLEXIBLE ACCELERATOR FOR A TENSOR WORKLOAD

Publication number: 20220083314

Abstract: Accelerators are generally utilized to provide high performance and energy efficiency for tensor algorithms. Currently, an accelerator will be specifically designed around the fundamental properties of the tensor algorithm and shape it supports, and thus will exhibit sub-optimal performance when used for other tensor algorithms and shapes. The present disclosure provides a flexible accelerator for tensor workloads. The flexible accelerator can be a flexible tensor accelerator or a FPGA having a dynamically configurable inter-PE network supporting different tensor shapes and different tensor algorithms including at least a GEMM algorithm, a 2D CNN algorithm, and a 3D CNN algorithm, and/or having a flexible DPU in which a dot product length of its dot product sub-units is configurable based on a target compute throughput that is less than or equal to a maximum throughput of the flexible DPU.

Type: Application

Filed: June 9, 2021

Publication date: March 17, 2022

Inventors: Po An Tsai, Neal Crago, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler
Sparse convolutional neural network accelerator

Patent number: 10997496

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. Compressed-sparse data is received for input to a processing element, wherein the compressed-sparse data encodes non-zero elements and corresponding multi-dimensional positions. The non-zero elements are processed in parallel by the processing element to produce a plurality of result values. The corresponding multi-dimensional positions are processed in parallel by the processing element to produce destination addresses for each result value in the plurality of result values. Each result value is transmitted to a destination accumulator associated with the destination address for the result value.

Type: Grant

Filed: March 14, 2017

Date of Patent: May 4, 2021

Assignee: NVIDIA Corporation

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20210089864

Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.

Type: Application

Filed: December 4, 2020

Publication date: March 25, 2021

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
Sparse convolutional neural network accelerator

Patent number: 10891538

Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.

Type: Grant

Filed: July 25, 2017

Date of Patent: January 12, 2021

Assignee: NVIDIA Corporation

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
Sparse convolutional neural network accelerator

Patent number: 10860922

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.

Type: Grant

Filed: November 18, 2019

Date of Patent: December 8, 2020

Assignee: NVIDIA Corporation

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
Executing distributed memory operations using processing elements connected by distributed channels

Patent number: 10853276

Abstract: A technology for implementing a method for distributed memory operations. A method of the disclosure includes obtaining distributed channel information for an algorithm to be executed by a plurality of spatially distributed processing elements. For each distributed channel in the distributed channel information, the method further associates one or more of the plurality of spatially distributed processing elements with the distributed channel based on the algorithm.

Type: Grant

Filed: June 17, 2019

Date of Patent: December 1, 2020

Assignee: Intel Corporation

Inventors: Bushra Ahsan, Michael C. Adler, Neal C. Crago, Joel S. Emer, Aamer Jaleel, Angshuman Parashar, Michael I. Pellauer
SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20200082254

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.

Type: Application

Filed: November 18, 2019

Publication date: March 12, 2020

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
Sparse convolutional neural network accelerator

Patent number: 10528864

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.

Type: Grant

Filed: March 14, 2017

Date of Patent: January 7, 2020

Assignee: NVIDIA Corporation

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
DEEP NEURAL NETWORK ACCELERATOR WITH FINE-GRAINED PARALLELISM DISCOVERY

Publication number: 20190370645

Abstract: A sparse convolutional neural network accelerator system that dynamically and efficiently identifies fine-grained parallelism in sparse convolution operations. The system determines matching pairs of non-zero input activations and weights from the compacted input activation and weight arrays utilizing a scalable, dynamic parallelism discovery unit (PDU) that performs a parallel search on the input activation array and the weight array to identify reducible input activation and weight pairs.

Type: Application

Filed: January 23, 2019

Publication date: December 5, 2019

Inventors: Ching-En Lee, Yakun Shao, Angshuman Parashar, Joel Emer, Stephen W. Keckler
EXECUTING DISTRIBUTED MEMORY OPERATIONS USING PROCESSING ELEMENTS CONNECTED BY DISTRIBUTED CHANNELS

Publication number: 20190303312

Abstract: A technology for implementing a method for distributed memory operations. A method of the disclosure includes obtaining distributed channel information for an algorithm to be executed by a plurality of spatially distributed processing elements. For each distributed channel in the distributed channel information, the method further associates one or more of the plurality of spatially distributed processing elements with the distributed channel based on the algorithm.

Type: Application

Filed: June 17, 2019

Publication date: October 3, 2019

Inventors: Bushra Ahsan, Michael C. Adler, Neal C. Crago, Joel S. Emer, Aamer Jaleel, Angshuman Parashar, Michael I. Pellauer
Executing distributed memory operations using processing elements connected by distributed channels

Patent number: 10331583

Abstract: A processing device for executing distributed memory operations using spatial processing units (SPU) connected by distributed channels is disclosed. A distributed channel may or may not be associated with memory operations, such as load operations or store operations. Distributed channel information is obtained for an algorithm to be executed by a group of spatially distributed processing elements. The group of spatially distributed processing elements can be connected to a shared memory controller. For each distributed channel in the distributed channel information, one or more of the group of spatially distributed processing elements may be associated with the distributed channel based on the algorithm. By associating the spatially distributed processing elements to a distributed channel, the functionality of the processing element can vary depending on the algorithm mapped onto the SPU.

Type: Grant

Filed: September 26, 2013

Date of Patent: June 25, 2019

Assignee: Intel Corporation

Inventors: Bushra Ahsan, Michael C. Adler, Neal C. Crago, Joel S. Emer, Aamer Jaleel, Angshuman Parashar, Michael I. Pellauer
SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20180046916

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. Compressed-sparse data is received for input to a processing element, wherein the compressed-sparse data encodes non-zero elements and corresponding multi-dimensional positions. The non-zero elements are processed in parallel by the processing element to produce a plurality of result values. The corresponding multi-dimensional positions are processed in parallel by the processing element to produce destination addresses for each result value in the plurality of result values. Each result value is transmitted to a destination accumulator associated with the destination address for the result value.

Type: Application

Filed: March 14, 2017

Publication date: February 15, 2018

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20180046900

Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.

Type: Application

Filed: July 25, 2017

Publication date: February 15, 2018

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20180046906

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.

Type: Application

Filed: March 14, 2017

Publication date: February 15, 2018

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
DISTRIBUTED MEMORY OPERATIONS

Publication number: 20150089162

Abstract: A technology for implementing a method for distributed memory operations. A method of the disclosure includes obtaining distributed channel information for an algorithm to be executed by a plurality of spatially distributed processing elements. For each distributed channel in the distributed channel information, the method further associates one or more of the plurality of spatially distributed processing elements with the distributed channel based on the algorithm.

Type: Application

Filed: September 26, 2013

Publication date: March 26, 2015

Inventors: Bushra Ahsan, Michael C. Adler, Neal C. Crago, Joel S. Emer, Aamer Jaleel, Angshuman Parashar, Michael I. Pellauer
METHOD FOR DETERMINING INSTRUCTION ORDER USING TRIGGERS

Publication number: 20140201506

Abstract: A processing engine includes separate hardware components for control processing and data processing. The instruction execution order in such a processing engine may be efficiently determined in a control processing engine based on inputs received by the control processing engine. For each instruction of a data processing engine: a status of the instruction may be set to “ready” based on a trigger for the instruction and the input received in the control processing engine; and execution of the instruction in the data processing engine may be enabled if the status of the instruction is set to “ready” and at least one processing element of the data processing engine is available. The trigger for each instruction may be a function of one or more predicate register of the control processing engine, FIFO status signals or information regarding tags.

Type: Application

Filed: December 30, 2011

Publication date: July 17, 2014

Inventors: Angshuman Parashar, Michael Pellauer, Michael Adler, Joel Emer