Patents by Inventor Joel Springer

Joel Springer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Sparse convolutional neural network accelerator

Patent number: 11847550

Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.

Type: Grant

Filed: December 4, 2020

Date of Patent: December 19, 2023

Assignee: NVIDIA Corporation

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
PRUNING AND ACCELERATING NEURAL NETWORKS WITH HIERARCHICAL FINE-GRAINED STRUCTURED SPARSITY

Publication number: 20230062503

Abstract: Hierarchical structured sparse parameter pruning and processing improves runtime performance and energy efficiency of neural networks. In contrast with conventional (non-structured) pruning which allows for any distribution of the non-zero values within a matrix that achieves the desired sparsity degree (e.g., 50%) and is consequently difficult to accelerate, structured hierarchical sparsity requires each multi-element unit at the coarsest granularity of the hierarchy to be pruned to the desired sparsity degree. The global desired sparsity degree is a function of the per-level sparsity degrees. Distribution of non-zero values within each multi-element unit is constrained according to the per-level sparsity degree at the particular level of the hierarchy. Each level of the hierarchy may be associated with a hardware (e.g., logic or circuit) structure that can be enabled or disabled according to the per-level sparsity.

Type: Application

Filed: February 28, 2022

Publication date: March 2, 2023

Inventors: Yannan Wu, Po-An Tsai, Saurav Muralidharan, Joel Springer Emer
FLEXIBLE ACCELERATOR FOR A TENSOR WORKLOAD

Publication number: 20220083314

Abstract: Accelerators are generally utilized to provide high performance and energy efficiency for tensor algorithms. Currently, an accelerator will be specifically designed around the fundamental properties of the tensor algorithm and shape it supports, and thus will exhibit sub-optimal performance when used for other tensor algorithms and shapes. The present disclosure provides a flexible accelerator for tensor workloads. The flexible accelerator can be a flexible tensor accelerator or a FPGA having a dynamically configurable inter-PE network supporting different tensor shapes and different tensor algorithms including at least a GEMM algorithm, a 2D CNN algorithm, and a 3D CNN algorithm, and/or having a flexible DPU in which a dot product length of its dot product sub-units is configurable based on a target compute throughput that is less than or equal to a maximum throughput of the flexible DPU.

Type: Application

Filed: June 9, 2021

Publication date: March 17, 2022

Inventors: Po An Tsai, Neal Crago, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler
FLEXIBLE ACCELERATOR FOR A TENSOR WORKLOAD

Publication number: 20220083500

Abstract: Accelerators are generally utilized to provide high performance and energy efficiency for tensor algorithms. Currently, an accelerator will be specifically designed around the fundamental properties of the tensor algorithm and shape it supports, and thus will exhibit sub-optimal performance when used for other tensor algorithms and shapes. The present disclosure provides a flexible accelerator for tensor workloads. The flexible accelerator can be a flexible tensor accelerator or a FPGA having a dynamically configurable inter-PE network supporting different tensor shapes and different tensor algorithms including at least a GEMM algorithm, a 2D CNN algorithm, and a 3D CNN algorithm, and/or having a flexible DPU in which a dot product length of its dot product sub-units is configurable based on a target compute throughput.

Type: Application

Filed: June 9, 2021

Publication date: March 17, 2022

Inventors: Po An Tsai, Neal Crago, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler
Sparse convolutional neural network accelerator

Patent number: 10997496

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. Compressed-sparse data is received for input to a processing element, wherein the compressed-sparse data encodes non-zero elements and corresponding multi-dimensional positions. The non-zero elements are processed in parallel by the processing element to produce a plurality of result values. The corresponding multi-dimensional positions are processed in parallel by the processing element to produce destination addresses for each result value in the plurality of result values. Each result value is transmitted to a destination accumulator associated with the destination address for the result value.

Type: Grant

Filed: March 14, 2017

Date of Patent: May 4, 2021

Assignee: NVIDIA Corporation

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20210089864

Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.

Type: Application

Filed: December 4, 2020

Publication date: March 25, 2021

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
Sparse convolutional neural network accelerator

Patent number: 10891538

Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.

Type: Grant

Filed: July 25, 2017

Date of Patent: January 12, 2021

Assignee: NVIDIA Corporation

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
Sparse convolutional neural network accelerator

Patent number: 10860922

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.

Type: Grant

Filed: November 18, 2019

Date of Patent: December 8, 2020

Assignee: NVIDIA Corporation

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20200082254

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.

Type: Application

Filed: November 18, 2019

Publication date: March 12, 2020

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
Sparse convolutional neural network accelerator

Patent number: 10528864

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.

Type: Grant

Filed: March 14, 2017

Date of Patent: January 7, 2020

Assignee: NVIDIA Corporation

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20180046900

Abstract: A method, computer program product, and system perform computations using a processor. A first instruction including a first index vector operand and a second index vector operand is received and the first index vector operand is decoded to produce first coordinate sets for a first array, each first coordinate set including at least a first coordinate and a second coordinate of a position of a non-zero element in the first array. The second index vector operand is decoded to produce second coordinate sets for a second array, each second coordinate set including at least a third coordinate and a fourth coordinate of a position of a non-zero element in the second array. The first coordinate sets are summed with the second coordinate sets to produce output coordinate sets and the output coordinate sets are converted into a set of linear indices.

Type: Application

Filed: July 25, 2017

Publication date: February 15, 2018

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20180046906

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. A first vector comprising only non-zero weight values and first associated positions of the non-zero weight values within a 3D space is received. A second vector comprising only non-zero input activation values and second associated positions of the non-zero input activation values within a 2D space is received. The non-zero weight values are multiplied with the non-zero input activation values, within a multiplier array, to produce a third vector of products. The first associated positions are combined with the second associated positions to produce a fourth vector of positions, where each position in the fourth vector is associated with a respective product in the third vector. The products in the third vector are transmitted to adders in an accumulator array, based on the position associated with each one of the products.

Type: Application

Filed: March 14, 2017

Publication date: February 15, 2018

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
SPARSE CONVOLUTIONAL NEURAL NETWORK ACCELERATOR

Publication number: 20180046916

Abstract: A method, computer program product, and system perform computations using a sparse convolutional neural network accelerator. Compressed-sparse data is received for input to a processing element, wherein the compressed-sparse data encodes non-zero elements and corresponding multi-dimensional positions. The non-zero elements are processed in parallel by the processing element to produce a plurality of result values. The corresponding multi-dimensional positions are processed in parallel by the processing element to produce destination addresses for each result value in the plurality of result values. Each result value is transmitted to a destination accumulator associated with the destination address for the result value.

Type: Application

Filed: March 14, 2017

Publication date: February 15, 2018

Inventors: William J. Dally, Angshuman Parashar, Joel Springer Emer, Stephen William Keckler, Larry Robert Dennison
Applicant screening

Publication number: 20070038497

Abstract: Systems and methods for screening applicants are disclosed herein. A method of screening applicants is performed by a screening server. The server begins by receiving a selection of screening services and an applicant profile that identifies an applicant. The screening continues by generating screening results specified by the selection of screening services based on the applicant profile. A property manager is then notified that the screening results are available for the applicant based upon the applicant profile. The screening results are then provided to the property manager based upon the applicant profile. Based on these screening results, the screener or porperty manager can make a decision about the applicant and communicate a decision action to the applicant.

Type: Application

Filed: July 21, 2006

Publication date: February 15, 2007

Inventors: Michael Britti, Robert Thornley, Joel Springer, Michael Mauseth, Michael Collins
Screening using a personal identification code

Publication number: 20070022297

Abstract: A system of screening servers, screener client computers, and screening kiosks distribute an applicant screening process among multiple sites and multiple participants. To facilitate and secure communications of screening results and applicant actions, a personal identification code is provided that identifies individual sets of screening results. In this manner, the applicant is authenticated and can then enter appropriate applicant profile data into a secure screening account, such as via a screening kiosk. Screening results may be generated by the applicant in association with a unique personal identification code. This code can then be communicated to the screener, who can access the screening results along with a recommendation, if desired, by sending the code to a screening server. The screener can also enter appropriate screening information into another secure screening account.

Type: Application

Filed: July 25, 2005

Publication date: January 25, 2007

Inventors: Michael Britti, Michael Mauseth, Joel Springer, Robert Thornley
Implementation of a conditional move instruction in an out-of-order processor

Patent number: 6449713

Abstract: A technique for handling a conditional move instruction in an out-of-order data processor. The technique involves detecting a conditional move instruction within an instruction stream, and generating multiple instructions according to the detected conditional move instruction. The technique further involves replacing the conditional move instruction within the instruction stream with the generated multiple instructions. The generated multiple instructions are generated such that each of the generated multiple instructions executes using no more than two input ports of an execution unit. The generated multiple instructions include a first generated instruction that produces a condition result indicating whether a condition exists, and a second generated instruction that inputs the condition result as a portion of an operand which identifies a register of the out-of-order data processor.

Type: Grant

Filed: November 18, 1998

Date of Patent: September 10, 2002

Assignee: Compaq Information Technologies Group, L.P.

Inventors: Joel Springer Emer, Bruce Edwards, Daniel Lawrence Leibholz, Edward J. McLellan, Derrick R. Meyer
IMPLEMENTATION OF A CONDITIONAL MOVE INSTRUCTION IN AN OUT-OF-ORDER PROCESSOR

Publication number: 20020112142

Abstract: A technique for handling a conditional move instruction in an out-of-order data processor. The technique involves detecting a conditional move instruction within an instruction stream, and generating multiple instructions according to the detected conditional move instruction. The technique further involves replacing the conditional move instruction within the instruction stream with the generated multiple instructions. The generated multiple instructions are generated such that each of the generated multiple instructions executes using no more than two input ports of an execution unit. The generated multiple instructions include a first generated instruction that produces a condition result indicating whether a condition exists, and a second generated instruction that inputs the condition result as a portion of an operand which identifies a register of the out-of-order data processor.

Type: Application

Filed: November 18, 1998

Publication date: August 15, 2002

Inventors: JOEL SPRINGER EMER, BRUCE EDWARDS, DANIEL LAWRENCE LEIBHOLZ, EDWARD J. MCLELLAN, DERRICK R. MEYER