Patents by Inventor Kailash Gopalakrishnan

Kailash Gopalakrishnan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20190339938
    Abstract: A specialized circuit is configured for floating point computations using numbers represented by a very low precision format (VLP format). The VLP format includes less than sixteen bits and is apportion into a sign bit, exponent bits (e), and mantissa bits (p). The configured specialized circuit is operated to store an approximation of a numeric value in the VLP format, where the approximation is represented as a function of a multiple of a fraction, where the fraction is an inverse of a number of discrete values that can be represented using only the mantissa bits.
    Type: Application
    Filed: May 7, 2018
    Publication date: November 7, 2019
    Applicant: International Business Machines Corporation
    Inventors: NAIGANG WANG, Kailash Gopalakrishnan, Jungwook Choi, Silvia M. Mueller, Ankur Agrawal, Daniel Brand
  • Publication number: 20190325301
    Abstract: A compute matrix is configured to include a set of compute units, each compute unit including a multiplier and an accumulator, each of the multiplier and the accumulator formed using at least one floating point unit (FPU). An accumulator array is configured to include a set of external accumulators. The compute matrix is operated to produce a chunk dot-product using a first chunk of a first input vector and a first chunk of a second input vector. The accumulator array is operated to output a dot-product of the first input vector and the second input vector using the chunk dot-product.
    Type: Application
    Filed: April 19, 2018
    Publication date: October 24, 2019
    Applicant: International Business Machines Corporation
    Inventors: NAIGANG WANG, Jungwook Choi, Kailash Gopalakrishnan, Daniel Brand
  • Publication number: 20190236113
    Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.
    Type: Application
    Filed: April 11, 2019
    Publication date: August 1, 2019
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
  • Publication number: 20190171935
    Abstract: Embodiments of the present invention provide a computer-implemented method for adaptive residual gradient compression for training of a deep learning neural network (DNN). The method includes obtaining, by a first learner, a current gradient vector for a neural network layer of the DNN, in which the current gradient vector includes gradient weights of parameters of the neural network layer that are calculated from a mini-batch of training data. A current residue vector is generated that includes residual gradient weights for the mini-batch. A compressed current residue vector is generated based on dividing the residual gradient weights of the current residue vector into a plurality of bins of a uniform size and quantizing a subset of the residual gradient weights of one or more bins of the plurality of bins. The compressed current residue vector is then transmitted to a second learner of the plurality of learners or to a parameter server.
    Type: Application
    Filed: December 4, 2017
    Publication date: June 6, 2019
    Inventors: Ankur Agrawal, Daniel Brand, Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan
  • Publication number: 20190164050
    Abstract: A system, having a memory that stores computer executable components, and a processor that executes the computer executable components, reduces data size in connection with training a neural network by exploiting spatial locality to weight matrices and effecting frequency transformation and compression. A receiving component receives neural network data in the form of a compressed frequency-domain weight matrix. A segmentation component segments the initial weight matrix into original sub-components, wherein respective original sub-components have spatial weights. A sampling component applies a generalized weight distribution to the respective original sub-components to generate respective normalized sub-components. A transform component applies a transform to the respective normalized sub-components.
    Type: Application
    Filed: November 30, 2017
    Publication date: May 30, 2019
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Suyog Gupta, Pritish Narayanan
  • Publication number: 20190122116
    Abstract: Techniques that facilitate improving an efficiency of a neural network are described. In one embodiment, a system is provided that comprises a memory that stores computer-executable components and a processor that executes computer-executable components stored in the memory. In one implementation, the computer-executable components comprise an initialization component that selects an initial value of an output limit, wherein the output limit indicates a range for an output of an activation function of a neural network. The computer-executable components further comprise a training component that modifies the initial value of the output limit during training to a second value of the output limit, the second value of the output limit being provided as a parameter to the activation function. The computer-executable components further comprise an activation function component that determines the output of the activation function based on the second value of the output limit as the parameter.
    Type: Application
    Filed: October 24, 2017
    Publication date: April 25, 2019
    Inventors: Jungwook Choi, Kailash Gopalakrishnan, Charbel Sakr, Swagath Venkataramani, Zhuo Wang
  • Patent number: 10261978
    Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.
    Type: Grant
    Filed: December 14, 2017
    Date of Patent: April 16, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
  • Patent number: 10241972
    Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.
    Type: Grant
    Filed: March 16, 2017
    Date of Patent: March 26, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
  • Publication number: 20190036053
    Abstract: Device architectures based on trapping and de-trapping holes or electrons and/or recombination of both types of carriers are obtained by carrier trapping either in near-interface deep ambipolar states or in quantum wells/dots, either serving as ambipolar traps in semiconductor layers or in gate dielectric/barrier layers. In either case, the potential barrier for trapping is small and retention is provided by carrier confinement in the deep trap states and/or quantum wells/dots. The device architectures are usable as three terminal or two terminal devices.
    Type: Application
    Filed: September 21, 2018
    Publication date: January 31, 2019
    Inventors: Ali Afzali-Ardakani, Tze-Chiang Chen, Kailash Gopalakrishnan, Bahman Hekmatshoartabari
  • Patent number: 10120685
    Abstract: An apparatus and method for supporting simultaneous multiple iterations (SMI) in a course grained reconfigurable architecture (CGRA). In support of SMI, the apparatus includes: Hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight and which are yet to begin, and a control unit including hardware structures that are used to maintain synchronization and initiate and terminate loops within the PEs. SMI permits execution of the next instruction within any iteration (in flight). If instructions from multiple iterations are ready for execution (and are pre-decoded), then the hardware selects the lowest iteration number ready for execution. If in a particular clock cycle, a loop iteration with a lower iteration number is stalled (i.e.
    Type: Grant
    Filed: November 4, 2015
    Date of Patent: November 6, 2018
    Assignee: International Business Machines Corporation
    Inventors: Chia-yu Chen, Kailash Gopalakrishnan, Jinwook Oh, Sunil K. Shukla, Vijayalakshmi Srinivasan
  • Patent number: 10115916
    Abstract: Device architectures based on trapping and de-trapping holes or electrons and/or recombination of both types of carriers are obtained by carrier trapping either in near-interface deep ambipolar states or in quantum wells/dots, either serving as ambipolar traps in semiconductor layers or in gate dielectric/barrier layers. In either case, the potential barrier for trapping is small and retention is provided by carrier confinement in the deep trap states and/or quantum wells/dots. The device architectures are usable as three terminal or two terminal devices.
    Type: Grant
    Filed: April 11, 2015
    Date of Patent: October 30, 2018
    Assignee: International Business Machines Corporation
    Inventors: Ali Afzali-Ardakani, Tze-Chiang Chen, Kailash Gopalakrishnan, Bahman Hekmatshoartabari
  • Publication number: 20180267938
    Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.
    Type: Application
    Filed: December 14, 2017
    Publication date: September 20, 2018
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
  • Publication number: 20180267936
    Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.
    Type: Application
    Filed: March 16, 2017
    Publication date: September 20, 2018
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
  • Patent number: 9972797
    Abstract: Device architectures based on trapping and de-trapping holes or electrons and/or recombination of both types of carriers are obtained by carrier trapping either in near-interface deep ambipolar states or in quantum wells/dots, either serving as ambipolar traps in semiconductor layers or in gate dielectric/barrier layers. In either case, the potential barrier for trapping is small and retention is provided by carrier confinement in the deep trap states and/or quantum wells/dots. The device architectures are usable as three terminal or two terminal devices.
    Type: Grant
    Filed: July 15, 2017
    Date of Patent: May 15, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ali Afzali-Ardakani, Tze-Chiang Chen, Kailash Gopalakrishnan, Bahman Hekmatshoartabari
  • Patent number: 9812638
    Abstract: A device has a M8XY6 layer in between a first conductive layer on the top and a second conductive layer on the bottom, wherein (i) M includes at least one element selected from the following: Cu, Ag, Li, and Zn, (ii) X includes at least one Group XIV element, and (iii) Y includes at least one Group XVI element. Another device has MaXbYc material contacted on opposite sides by respective layers of conductive material, wherein: (i) M includes at least one element selected from the following: Cu, Ag, Li, and Zn, (ii) X includes at least one Group XIV element, and (iii) Y includes at least one Group XVI element, and a is in the range of 48-60 atomic percent, b is in the range of 4-10 atomic percent, c is in the range of 30-45 atomic percent, and a+b+c is at least 90 atomic percent.
    Type: Grant
    Filed: March 19, 2010
    Date of Patent: November 7, 2017
    Assignee: GLOBALFOUNDRIES Inc.
    Inventors: Donald S Bethune, Kailash Gopalakrishnan, Andrew J Kellock, Rohit S Shenoy
  • Publication number: 20170317304
    Abstract: Device architectures based on trapping and de-trapping holes or electrons and/or recombination of both types of carriers are obtained by carrier trapping either in near-interface deep ambipolar states or in quantum wells/dots, either serving as ambipolar traps in semiconductor layers or in gate dielectric/barrier layers. In either case, the potential barrier for trapping is small and retention is provided by carrier confinement in the deep trap states and/or quantum wells/dots. The device architectures are usable as three terminal or two terminal devices.
    Type: Application
    Filed: July 15, 2017
    Publication date: November 2, 2017
    Inventors: Ali Afzali-Ardakani, Tze-Chiang Chen, Kailash Gopalakrishnan, Bahman Hekmatshoartabari
  • Patent number: 9735383
    Abstract: Device architectures based on trapping and de-trapping holes or electrons and/or recombination of both types of carriers are obtained by carrier trapping either in near-interface deep ambipolar states or in quantum wells/dots, either serving as ambipolar traps in semiconductor layers or in gate dielectric/barrier layers. In either case, the potential barrier for trapping is small and retention is provided by carrier confinement in the deep trap states and/or quantum wells/dots. The device architectures are usable as three terminal or two terminal devices.
    Type: Grant
    Filed: September 12, 2016
    Date of Patent: August 15, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ali Afzali-Ardakani, Tze-Chiang Chen, Kailash Gopalakrishnan, Bahman Hekmatshoartabari
  • Publication number: 20170123795
    Abstract: An apparatus and method for supporting simultaneous multiple iterations (SMI) in a course grained reconfigurable architecture (CGRA). In support of SMI, the apparatus includes: Hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight and which are yet to begin, and a control unit including hardware structures that are used to maintain synchronization and initiate and terminate loops within the PEs. SMI permits execution of the next instruction within any iteration (in flight). If instructions from multiple iterations are ready for execution (and are pre-decoded), then the hardware selects the lowest iteration number ready for execution. If in a particular clock cycle, a loop iteration with a lower iteration number is stalled (i.e.
    Type: Application
    Filed: November 4, 2015
    Publication date: May 4, 2017
    Inventors: Chia-yu Chen, Kailash Gopalakrishnan, Jinwook Oh, Sunil K. Shukla, Vijayalakshmi Srinivasan
  • Publication number: 20170123794
    Abstract: An apparatus and method for supporting simultaneous multiple iterations (SMI) and iteration level commits (ILC) in a course grained reconfigurable architecture (CGRA). The apparatus includes: Hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight and which are yet to begin, and a control unit including hardware structures that are used to maintain synchronization and initiate and terminate loops within the PEs. The processing elements, LSU and control unit are configured to commit instructions, and save and restore context at loop iteration boundaries. In doing so, the apparatus tracks and buffers state of in-flight iterations, and detects conditions that prevents an iteration from completion.
    Type: Application
    Filed: November 4, 2015
    Publication date: May 4, 2017
    Inventors: Chia-yu Chen, Kailash Gopalakrishnan, Jinwook Oh, Lee M. Saltzman, Sunil K. Shukla, Vijayalakshmi Srinivasan
  • Publication number: 20170005281
    Abstract: Device architectures based on trapping and de-trapping holes or electrons and/or recombination of both types of carriers are obtained by carrier trapping either in near-interface deep ambipolar states or in quantum wells/dots, either serving as ambipolar traps in semiconductor layers or in gate dielectric/barrier layers. In either case, the potential barrier for trapping is small and retention is provided by carrier confinement in the deep trap states and/or quantum wells/dots. The device architectures are usable as three terminal or two terminal devices.
    Type: Application
    Filed: September 12, 2016
    Publication date: January 5, 2017
    Inventors: Ali Afzali-Ardakani, Tze-Chiang Chen, Kailash Gopalakrishnan, Bahman Hekmatshoartabari