Patents by Inventor Kailash Gopalakrishnan

Kailash Gopalakrishnan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10769238
    Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.
    Type: Grant
    Filed: September 19, 2019
    Date of Patent: September 8, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
  • Publication number: 20200249910
    Abstract: In an embodiment, a method includes configuring a specialized circuit for floating point computations using numbers represented by a hybrid format, wherein the hybrid format includes a first format and a second format. In the embodiment, the method includes operating the further configured specialized circuit to store an approximation of a numeric value in the first format during a forward pass for training a deep learning network. In the embodiment, the method includes operating the further configured specialized circuit to store an approximation of a second numeric value in the second format during a backward pass for training the deep learning network.
    Type: Application
    Filed: February 6, 2019
    Publication date: August 6, 2020
    Applicant: International Business Machines Corporation
    Inventors: NAIGANG WANG, Jungwook Choi, Kailash Gopalakrishnan, Ankur Agrawal, Silvia Melitta Mueller
  • Publication number: 20200233642
    Abstract: Techniques for operating on and calculating binary floating-point numbers using an enhanced floating-point number format are presented. The enhanced format can comprise a single sign bit, six bits for the exponent, and nine bits for the fraction. Using six bits for the exponent can provide an enhanced exponent range that facilitates desirably fast convergence of computing-intensive algorithms and low error rates for computing-intensive applications. The enhanced format can employ a specified definition for the lowest binade that enables the lowest binade to be used for zero and normal numbers; and a specified definition for the highest binade that enables it to be structured to have one data point used for a merged Not-a-Number (NaN)/infinity symbol and remaining data points used for finite numbers. The signs of zero and merged NaN/infinity can be “don't care” terms. The enhanced format employs only one rounding mode, which is for rounding toward nearest up.
    Type: Application
    Filed: April 6, 2020
    Publication date: July 23, 2020
    Inventors: Silvia Melitta Mueller, Ankur Agrawal, Bruce Fleischer, Kailash Gopalakrishnan, Dongsoo Lee
  • Publication number: 20200226459
    Abstract: A processor receives input data and provides the input data to a first neural network including a first neural network model. The first neural network model has a first numerical precision level. A first feature vector is generated from the input data using the first neural network. The input data is provided to a second neural network including a second neural network model. The second neural network model has a second numerical precision level different from the first numerical precession level. A second feature vector is generated from the input data using the second neural network. A difference metric is computed between the first feature vector and the second feature vector. The difference metric is indicative of whether the input data includes adversarial data.
    Type: Application
    Filed: January 11, 2019
    Publication date: July 16, 2020
    Applicant: International Business Machines Corporation
    Inventors: Chia-Yu Chen, Pin-Yu Chen, Pierce I-Jen Chuang, Richard Chen, Jungwook Choi, Kailash Gopalakrishnan
  • Patent number: 10657442
    Abstract: A compute matrix is configured to include a set of compute units, each compute unit including a multiplier and an accumulator, each of the multiplier and the accumulator formed using at least one floating point unit (FPU). An accumulator array is configured to include a set of external accumulators. The compute matrix is operated to produce a chunk dot-product using a first chunk of a first input vector and a first chunk of a second input vector. The accumulator array is operated to output a dot-product of the first input vector and the second input vector using the chunk dot-product.
    Type: Grant
    Filed: April 19, 2018
    Date of Patent: May 19, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Naigang Wang, Jungwook Choi, Kailash Gopalakrishnan, Daniel Brand
  • Patent number: 10656913
    Abstract: Techniques for operating on and calculating binary floating-point numbers using an enhanced floating-point number format are presented. The enhanced format can comprise a single sign bit, six bits for the exponent, and nine bits for the fraction. Using six bits for the exponent can provide an enhanced exponent range that facilitates desirably fast convergence of computing-intensive algorithms and low error rates for computing-intensive applications. The enhanced format can employ a specified definition for the lowest binade that enables the lowest binade to be used for zero and normal numbers; and a specified definition for the highest binade that enables it to be structured to have one data point used for a merged Not-a-Number (NaN)/infinity symbol and remaining data points used for finite numbers. The signs of zero and merged NaN/infinity can be “don't care” terms. The enhanced format employs only one rounding mode, which is for rounding toward nearest up.
    Type: Grant
    Filed: June 5, 2018
    Date of Patent: May 19, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Silvia Melitta Mueller, Ankur Agrawal, Bruce Fleischer, Kailash Gopalakrishnan, Dongsoo Lee
  • Publication number: 20200134105
    Abstract: A method for improving performance of a predefined Deep Neural Network (DNN) convolution processing on a computing device includes inputting parameters, as input data into a processor on a computer that formalizes a design space exploration of a convolution mapping, on a predefined computer architecture that will execute the predefined convolution processing. The parameters are predefined as guided by a specification for the predefined convolution processing to be implemented by the convolution mapping and by a microarchitectural specification for the processor that will execute the predefined convolution processing. The processor calculates performance metrics for executing the predefined convolution processing on the computing device, as functions of the predefined parameters, as proxy estimates of performance of different possible design choices to implement the predefined convolution processing.
    Type: Application
    Filed: October 31, 2018
    Publication date: April 30, 2020
    Inventors: Chia-Yu CHEN, Jungwook Choi, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan, Swagath Venkataramani, Jintao Zhang
  • Patent number: 10592208
    Abstract: A specialized circuit is configured for floating point computations using numbers represented by a very low precision format (VLP format). The VLP format includes less than sixteen bits and is apportion into a sign bit, exponent bits (e), and mantissa bits (p). The configured specialized circuit is operated to store an approximation of a numeric value in the VLP format, where the approximation is represented as a function of a multiple of a fraction, where the fraction is an inverse of a number of discrete values that can be represented using only the mantissa bits.
    Type: Grant
    Filed: May 7, 2018
    Date of Patent: March 17, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Naigang Wang, Kailash Gopalakrishnan, Jungwook Choi, Silvia M. Mueller, Ankur Agrawal, Daniel Brand
  • Publication number: 20200012706
    Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.
    Type: Application
    Filed: September 19, 2019
    Publication date: January 9, 2020
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
  • Patent number: 10528356
    Abstract: An apparatus and method for supporting simultaneous multiple iterations (SMI) and iteration level commits (ILC) in a course grained reconfigurable architecture (CGRA). The apparatus includes: Hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight and which are yet to begin, and a control unit including hardware structures that are used to maintain synchronization and initiate and terminate loops within the PEs. The PEs, LSU and control unit are configured to commit instructions, and save and restore context at loop iteration boundaries. In doing so, the apparatus tracks and buffers state of in-flight iterations, and detects conditions that prevent an iteration from completing.
    Type: Grant
    Filed: November 4, 2015
    Date of Patent: January 7, 2020
    Assignee: International Business Machines Corporation
    Inventors: Chia-yu Chen, Kailash Gopalakrishnan, Jinwook Oh, Lee M. Saltzman, Sunil K. Shukla, Vijayalakshmi Srinivasan
  • Publication number: 20190385050
    Abstract: Techniques for statistics-aware weight quantization are presented. To facilitate reducing the bit precision of weights, for a set of weights, a quantizer management component can estimate a quantization scale value to apply to a weight as a linear or non-linear function of the mean of a square of a weight value of the weight and the mean of an absolute value of the weight value, wherein the quantization scale value is determined to have a smaller quantization error than all, or at least almost all, other quantization errors associated with other quantization scale values. A quantizer component applies the quantization scale value to symmetrically and/or uniformly quantize weights of a layer of the set of weights to generate quantized weights, the weights being quantized using rounding. The respective quantized weights can be used to facilitate training and inference of a deep learning system.
    Type: Application
    Filed: June 13, 2018
    Publication date: December 19, 2019
    Inventors: Zhuo Wang, Jungwook Choi, Kailash Gopalakrishnan, Pierce I-Jen Chuang
  • Publication number: 20190369960
    Abstract: Techniques for operating on and calculating binary floating-point numbers using an enhanced floating-point number format are presented. The enhanced format can comprise a single sign bit, six bits for the exponent, and nine bits for the fraction. Using six bits for the exponent can provide an enhanced exponent range that facilitates desirably fast convergence of computing-intensive algorithms and low error rates for computing-intensive applications. The enhanced format can employ a specified definition for the lowest binade that enables the lowest binade to be used for zero and normal numbers; and a specified definition for the highest binade that enables it to be structured to have one data point used for a merged Not-a-Number (NaN)/infinity symbol and remaining data points used for finite numbers. The signs of zero and merged NaN/infinity can be “don't care” terms. The enhanced format employs only one rounding mode, which is for rounding toward nearest up.
    Type: Application
    Filed: June 5, 2018
    Publication date: December 5, 2019
    Inventors: Silvia Melitta Mueller, Ankur Agrawal, Bruce Fleischer, Kailash Gopalakrishnan, Dongsoo Lee
  • Patent number: 10489484
    Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.
    Type: Grant
    Filed: April 11, 2019
    Date of Patent: November 26, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
  • Patent number: 10476016
    Abstract: Device architectures based on trapping and de-trapping holes or electrons and/or recombination of both types of carriers are obtained by carrier trapping either in near-interface deep ambipolar states or in quantum wells/dots, either serving as ambipolar traps in semiconductor layers or in gate dielectric/barrier layers. In either case, the potential barrier for trapping is small and retention is provided by carrier confinement in the deep trap states and/or quantum wells/dots. The device architectures are usable as three terminal or two terminal devices.
    Type: Grant
    Filed: September 21, 2018
    Date of Patent: November 12, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ali Afzali-Ardakani, Tze-Chiang Chen, Kailash Gopalakrishnan, Bahman Hekmatshoartabari
  • Publication number: 20190339938
    Abstract: A specialized circuit is configured for floating point computations using numbers represented by a very low precision format (VLP format). The VLP format includes less than sixteen bits and is apportion into a sign bit, exponent bits (e), and mantissa bits (p). The configured specialized circuit is operated to store an approximation of a numeric value in the VLP format, where the approximation is represented as a function of a multiple of a fraction, where the fraction is an inverse of a number of discrete values that can be represented using only the mantissa bits.
    Type: Application
    Filed: May 7, 2018
    Publication date: November 7, 2019
    Applicant: International Business Machines Corporation
    Inventors: NAIGANG WANG, Kailash Gopalakrishnan, Jungwook Choi, Silvia M. Mueller, Ankur Agrawal, Daniel Brand
  • Publication number: 20190325301
    Abstract: A compute matrix is configured to include a set of compute units, each compute unit including a multiplier and an accumulator, each of the multiplier and the accumulator formed using at least one floating point unit (FPU). An accumulator array is configured to include a set of external accumulators. The compute matrix is operated to produce a chunk dot-product using a first chunk of a first input vector and a first chunk of a second input vector. The accumulator array is operated to output a dot-product of the first input vector and the second input vector using the chunk dot-product.
    Type: Application
    Filed: April 19, 2018
    Publication date: October 24, 2019
    Applicant: International Business Machines Corporation
    Inventors: NAIGANG WANG, Jungwook Choi, Kailash Gopalakrishnan, Daniel Brand
  • Publication number: 20190236113
    Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.
    Type: Application
    Filed: April 11, 2019
    Publication date: August 1, 2019
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
  • Publication number: 20190171935
    Abstract: Embodiments of the present invention provide a computer-implemented method for adaptive residual gradient compression for training of a deep learning neural network (DNN). The method includes obtaining, by a first learner, a current gradient vector for a neural network layer of the DNN, in which the current gradient vector includes gradient weights of parameters of the neural network layer that are calculated from a mini-batch of training data. A current residue vector is generated that includes residual gradient weights for the mini-batch. A compressed current residue vector is generated based on dividing the residual gradient weights of the current residue vector into a plurality of bins of a uniform size and quantizing a subset of the residual gradient weights of one or more bins of the plurality of bins. The compressed current residue vector is then transmitted to a second learner of the plurality of learners or to a parameter server.
    Type: Application
    Filed: December 4, 2017
    Publication date: June 6, 2019
    Inventors: Ankur Agrawal, Daniel Brand, Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan
  • Publication number: 20190164050
    Abstract: A system, having a memory that stores computer executable components, and a processor that executes the computer executable components, reduces data size in connection with training a neural network by exploiting spatial locality to weight matrices and effecting frequency transformation and compression. A receiving component receives neural network data in the form of a compressed frequency-domain weight matrix. A segmentation component segments the initial weight matrix into original sub-components, wherein respective original sub-components have spatial weights. A sampling component applies a generalized weight distribution to the respective original sub-components to generate respective normalized sub-components. A transform component applies a transform to the respective normalized sub-components.
    Type: Application
    Filed: November 30, 2017
    Publication date: May 30, 2019
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Suyog Gupta, Pritish Narayanan
  • Publication number: 20190122116
    Abstract: Techniques that facilitate improving an efficiency of a neural network are described. In one embodiment, a system is provided that comprises a memory that stores computer-executable components and a processor that executes computer-executable components stored in the memory. In one implementation, the computer-executable components comprise an initialization component that selects an initial value of an output limit, wherein the output limit indicates a range for an output of an activation function of a neural network. The computer-executable components further comprise a training component that modifies the initial value of the output limit during training to a second value of the output limit, the second value of the output limit being provided as a parameter to the activation function. The computer-executable components further comprise an activation function component that determines the output of the activation function based on the second value of the output limit as the parameter.
    Type: Application
    Filed: October 24, 2017
    Publication date: April 25, 2019
    Inventors: Jungwook Choi, Kailash Gopalakrishnan, Charbel Sakr, Swagath Venkataramani, Zhuo Wang