Patents by Inventor Kailash Gopalakrishnan

Kailash Gopalakrishnan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Matrix multiplication on a systolic array

Patent number: 10769238

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Grant

Filed: September 19, 2019

Date of Patent: September 8, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
HYBRID FLOATING POINT REPRESENTATION FOR DEEP LEARNING ACCELERATION

Publication number: 20200249910

Abstract: In an embodiment, a method includes configuring a specialized circuit for floating point computations using numbers represented by a hybrid format, wherein the hybrid format includes a first format and a second format. In the embodiment, the method includes operating the further configured specialized circuit to store an approximation of a numeric value in the first format during a forward pass for training a deep learning network. In the embodiment, the method includes operating the further configured specialized circuit to store an approximation of a second numeric value in the second format during a backward pass for training the deep learning network.

Type: Application

Filed: February 6, 2019

Publication date: August 6, 2020

Applicant: International Business Machines Corporation

Inventors: NAIGANG WANG, Jungwook Choi, Kailash Gopalakrishnan, Ankur Agrawal, Silvia Melitta Mueller
ENHANCED LOW PRECISION BINARY FLOATING-POINT FORMATTING

Publication number: 20200233642

Abstract: Techniques for operating on and calculating binary floating-point numbers using an enhanced floating-point number format are presented. The enhanced format can comprise a single sign bit, six bits for the exponent, and nine bits for the fraction. Using six bits for the exponent can provide an enhanced exponent range that facilitates desirably fast convergence of computing-intensive algorithms and low error rates for computing-intensive applications. The enhanced format can employ a specified definition for the lowest binade that enables the lowest binade to be used for zero and normal numbers; and a specified definition for the highest binade that enables it to be structured to have one data point used for a merged Not-a-Number (NaN)/infinity symbol and remaining data points used for finite numbers. The signs of zero and merged NaN/infinity can be “don't care” terms. The enhanced format employs only one rounding mode, which is for rounding toward nearest up.

Type: Application

Filed: April 6, 2020

Publication date: July 23, 2020

Inventors: Silvia Melitta Mueller, Ankur Agrawal, Bruce Fleischer, Kailash Gopalakrishnan, Dongsoo Lee
ADVERSARIAL INPUT IDENTIFICATION USING REDUCED PRECISION DEEP NEURAL NETWORKS

Publication number: 20200226459

Abstract: A processor receives input data and provides the input data to a first neural network including a first neural network model. The first neural network model has a first numerical precision level. A first feature vector is generated from the input data using the first neural network. The input data is provided to a second neural network including a second neural network model. The second neural network model has a second numerical precision level different from the first numerical precession level. A second feature vector is generated from the input data using the second neural network. A difference metric is computed between the first feature vector and the second feature vector. The difference metric is indicative of whether the input data includes adversarial data.

Type: Application

Filed: January 11, 2019

Publication date: July 16, 2020

Applicant: International Business Machines Corporation

Inventors: Chia-Yu Chen, Pin-Yu Chen, Pierce I-Jen Chuang, Richard Chen, Jungwook Choi, Kailash Gopalakrishnan
Deep learning accelerator architecture with chunking GEMM

Patent number: 10657442

Abstract: A compute matrix is configured to include a set of compute units, each compute unit including a multiplier and an accumulator, each of the multiplier and the accumulator formed using at least one floating point unit (FPU). An accumulator array is configured to include a set of external accumulators. The compute matrix is operated to produce a chunk dot-product using a first chunk of a first input vector and a first chunk of a second input vector. The accumulator array is operated to output a dot-product of the first input vector and the second input vector using the chunk dot-product.

Type: Grant

Filed: April 19, 2018

Date of Patent: May 19, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Naigang Wang, Jungwook Choi, Kailash Gopalakrishnan, Daniel Brand
Enhanced low precision binary floating-point formatting

Patent number: 10656913

Abstract: Techniques for operating on and calculating binary floating-point numbers using an enhanced floating-point number format are presented. The enhanced format can comprise a single sign bit, six bits for the exponent, and nine bits for the fraction. Using six bits for the exponent can provide an enhanced exponent range that facilitates desirably fast convergence of computing-intensive algorithms and low error rates for computing-intensive applications. The enhanced format can employ a specified definition for the lowest binade that enables the lowest binade to be used for zero and normal numbers; and a specified definition for the highest binade that enables it to be structured to have one data point used for a merged Not-a-Number (NaN)/infinity symbol and remaining data points used for finite numbers. The signs of zero and merged NaN/infinity can be “don't care” terms. The enhanced format employs only one rounding mode, which is for rounding toward nearest up.

Type: Grant

Filed: June 5, 2018

Date of Patent: May 19, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Silvia Melitta Mueller, Ankur Agrawal, Bruce Fleischer, Kailash Gopalakrishnan, Dongsoo Lee
Method to Map Convolutional Layers of Deep Neural Network on a Plurality of Processing Elements with SIMD Execution Units, Private Memories, and Connected as a 2D Systolic Processor Array

Publication number: 20200134105

Abstract: A method for improving performance of a predefined Deep Neural Network (DNN) convolution processing on a computing device includes inputting parameters, as input data into a processor on a computer that formalizes a design space exploration of a convolution mapping, on a predefined computer architecture that will execute the predefined convolution processing. The parameters are predefined as guided by a specification for the predefined convolution processing to be implemented by the convolution mapping and by a microarchitectural specification for the processor that will execute the predefined convolution processing. The processor calculates performance metrics for executing the predefined convolution processing on the computing device, as functions of the predefined parameters, as proxy estimates of performance of different possible design choices to implement the predefined convolution processing.

Type: Application

Filed: October 31, 2018

Publication date: April 30, 2020

Inventors: Chia-Yu CHEN, Jungwook Choi, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan, Swagath Venkataramani, Jintao Zhang
Very low precision floating point representation for deep learning acceleration

Patent number: 10592208

Abstract: A specialized circuit is configured for floating point computations using numbers represented by a very low precision format (VLP format). The VLP format includes less than sixteen bits and is apportion into a sign bit, exponent bits (e), and mantissa bits (p). The configured specialized circuit is operated to store an approximation of a numeric value in the VLP format, where the approximation is represented as a function of a multiple of a fraction, where the fraction is an inverse of a number of discrete values that can be represented using only the mantissa bits.

Type: Grant

Filed: May 7, 2018

Date of Patent: March 17, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Naigang Wang, Kailash Gopalakrishnan, Jungwook Choi, Silvia M. Mueller, Ankur Agrawal, Daniel Brand
MATRIX MULTIPLICATION ON A SYSTOLIC ARRAY

Publication number: 20200012706

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Application

Filed: September 19, 2019

Publication date: January 9, 2020

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
Tightly coupled processor arrays using coarse grained reconfigurable architecture with iteration level commits

Patent number: 10528356

Abstract: An apparatus and method for supporting simultaneous multiple iterations (SMI) and iteration level commits (ILC) in a course grained reconfigurable architecture (CGRA). The apparatus includes: Hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight and which are yet to begin, and a control unit including hardware structures that are used to maintain synchronization and initiate and terminate loops within the PEs. The PEs, LSU and control unit are configured to commit instructions, and save and restore context at loop iteration boundaries. In doing so, the apparatus tracks and buffers state of in-flight iterations, and detects conditions that prevent an iteration from completing.

Type: Grant

Filed: November 4, 2015

Date of Patent: January 7, 2020

Assignee: International Business Machines Corporation

Inventors: Chia-yu Chen, Kailash Gopalakrishnan, Jinwook Oh, Lee M. Saltzman, Sunil K. Shukla, Vijayalakshmi Srinivasan
STATISTICS-AWARE WEIGHT QUANTIZATION

Publication number: 20190385050

Abstract: Techniques for statistics-aware weight quantization are presented. To facilitate reducing the bit precision of weights, for a set of weights, a quantizer management component can estimate a quantization scale value to apply to a weight as a linear or non-linear function of the mean of a square of a weight value of the weight and the mean of an absolute value of the weight value, wherein the quantization scale value is determined to have a smaller quantization error than all, or at least almost all, other quantization errors associated with other quantization scale values. A quantizer component applies the quantization scale value to symmetrically and/or uniformly quantize weights of a layer of the set of weights to generate quantized weights, the weights being quantized using rounding. The respective quantized weights can be used to facilitate training and inference of a deep learning system.

Type: Application

Filed: June 13, 2018

Publication date: December 19, 2019

Inventors: Zhuo Wang, Jungwook Choi, Kailash Gopalakrishnan, Pierce I-Jen Chuang
ENHANCED LOW PRECISION BINARY FLOATING-POINT FORMATTING

Publication number: 20190369960

Abstract: Techniques for operating on and calculating binary floating-point numbers using an enhanced floating-point number format are presented. The enhanced format can comprise a single sign bit, six bits for the exponent, and nine bits for the fraction. Using six bits for the exponent can provide an enhanced exponent range that facilitates desirably fast convergence of computing-intensive algorithms and low error rates for computing-intensive applications. The enhanced format can employ a specified definition for the lowest binade that enables the lowest binade to be used for zero and normal numbers; and a specified definition for the highest binade that enables it to be structured to have one data point used for a merged Not-a-Number (NaN)/infinity symbol and remaining data points used for finite numbers. The signs of zero and merged NaN/infinity can be “don't care” terms. The enhanced format employs only one rounding mode, which is for rounding toward nearest up.

Type: Application

Filed: June 5, 2018

Publication date: December 5, 2019

Inventors: Silvia Melitta Mueller, Ankur Agrawal, Bruce Fleischer, Kailash Gopalakrishnan, Dongsoo Lee
Matrix multiplication on a systolic array

Patent number: 10489484

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Grant

Filed: April 11, 2019

Date of Patent: November 26, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
Ambipolar synaptic devices

Patent number: 10476016

Abstract: Device architectures based on trapping and de-trapping holes or electrons and/or recombination of both types of carriers are obtained by carrier trapping either in near-interface deep ambipolar states or in quantum wells/dots, either serving as ambipolar traps in semiconductor layers or in gate dielectric/barrier layers. In either case, the potential barrier for trapping is small and retention is provided by carrier confinement in the deep trap states and/or quantum wells/dots. The device architectures are usable as three terminal or two terminal devices.

Type: Grant

Filed: September 21, 2018

Date of Patent: November 12, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ali Afzali-Ardakani, Tze-Chiang Chen, Kailash Gopalakrishnan, Bahman Hekmatshoartabari
VERY LOW PRECISION FLOATING POINT REPRESENTATION FOR DEEP LEARNING ACCELERATION

Publication number: 20190339938

Abstract: A specialized circuit is configured for floating point computations using numbers represented by a very low precision format (VLP format). The VLP format includes less than sixteen bits and is apportion into a sign bit, exponent bits (e), and mantissa bits (p). The configured specialized circuit is operated to store an approximation of a numeric value in the VLP format, where the approximation is represented as a function of a multiple of a fraction, where the fraction is an inverse of a number of discrete values that can be represented using only the mantissa bits.

Type: Application

Filed: May 7, 2018

Publication date: November 7, 2019

Applicant: International Business Machines Corporation

Inventors: NAIGANG WANG, Kailash Gopalakrishnan, Jungwook Choi, Silvia M. Mueller, Ankur Agrawal, Daniel Brand
DEEP LEARNING ACCELERATOR ARCHITECTURE WITH CHUNKING GEMM

Publication number: 20190325301

Abstract: A compute matrix is configured to include a set of compute units, each compute unit including a multiplier and an accumulator, each of the multiplier and the accumulator formed using at least one floating point unit (FPU). An accumulator array is configured to include a set of external accumulators. The compute matrix is operated to produce a chunk dot-product using a first chunk of a first input vector and a first chunk of a second input vector. The accumulator array is operated to output a dot-product of the first input vector and the second input vector using the chunk dot-product.

Type: Application

Filed: April 19, 2018

Publication date: October 24, 2019

Applicant: International Business Machines Corporation

Inventors: NAIGANG WANG, Jungwook Choi, Kailash Gopalakrishnan, Daniel Brand
MATRIX MULTIPLICATION ON A SYSTOLIC ARRAY

Publication number: 20190236113

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Application

Filed: April 11, 2019

Publication date: August 1, 2019

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
ROBUST GRADIENT WEIGHT COMPRESSION SCHEMES FOR DEEP LEARNING APPLICATIONS

Publication number: 20190171935

Abstract: Embodiments of the present invention provide a computer-implemented method for adaptive residual gradient compression for training of a deep learning neural network (DNN). The method includes obtaining, by a first learner, a current gradient vector for a neural network layer of the DNN, in which the current gradient vector includes gradient weights of parameters of the neural network layer that are calculated from a mini-batch of training data. A current residue vector is generated that includes residual gradient weights for the mini-batch. A compressed current residue vector is generated based on dividing the residual gradient weights of the current residue vector into a plurality of bins of a uniform size and quantizing a subset of the residual gradient weights of one or more bins of the plurality of bins. The compressed current residue vector is then transmitted to a second learner of the plurality of learners or to a parameter server.

Type: Application

Filed: December 4, 2017

Publication date: June 6, 2019

Inventors: Ankur Agrawal, Daniel Brand, Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan
COMPRESSION OF FULLY CONNECTED / RECURRENT LAYERS OF DEEP NETWORK(S) THROUGH ENFORCING SPATIAL LOCALITY TO WEIGHT MATRICES AND EFFECTING FREQUENCY COMPRESSION

Publication number: 20190164050

Abstract: A system, having a memory that stores computer executable components, and a processor that executes the computer executable components, reduces data size in connection with training a neural network by exploiting spatial locality to weight matrices and effecting frequency transformation and compression. A receiving component receives neural network data in the form of a compressed frequency-domain weight matrix. A segmentation component segments the initial weight matrix into original sub-components, wherein respective original sub-components have spatial weights. A sampling component applies a generalized weight distribution to the respective original sub-components to generate respective normalized sub-components. A transform component applies a transform to the respective normalized sub-components.

Type: Application

Filed: November 30, 2017

Publication date: May 30, 2019

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Suyog Gupta, Pritish Narayanan
FACILITATING NEURAL NETWORK EFFICIENCY

Publication number: 20190122116

Abstract: Techniques that facilitate improving an efficiency of a neural network are described. In one embodiment, a system is provided that comprises a memory that stores computer-executable components and a processor that executes computer-executable components stored in the memory. In one implementation, the computer-executable components comprise an initialization component that selects an initial value of an output limit, wherein the output limit indicates a range for an output of an activation function of a neural network. The computer-executable components further comprise a training component that modifies the initial value of the output limit during training to a second value of the output limit, the second value of the output limit being provided as a parameter to the activation function. The computer-executable components further comprise an activation function component that determines the output of the activation function based on the second value of the output limit as the parameter.

Type: Application

Filed: October 24, 2017

Publication date: April 25, 2019

Inventors: Jungwook Choi, Kailash Gopalakrishnan, Charbel Sakr, Swagath Venkataramani, Zhuo Wang

prev 1 2 3 4 5 6 7 … next