Patents by Inventor Kailash Gopalakrishnan
Kailash Gopalakrishnan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20190339938Abstract: A specialized circuit is configured for floating point computations using numbers represented by a very low precision format (VLP format). The VLP format includes less than sixteen bits and is apportion into a sign bit, exponent bits (e), and mantissa bits (p). The configured specialized circuit is operated to store an approximation of a numeric value in the VLP format, where the approximation is represented as a function of a multiple of a fraction, where the fraction is an inverse of a number of discrete values that can be represented using only the mantissa bits.Type: ApplicationFiled: May 7, 2018Publication date: November 7, 2019Applicant: International Business Machines CorporationInventors: NAIGANG WANG, Kailash Gopalakrishnan, Jungwook Choi, Silvia M. Mueller, Ankur Agrawal, Daniel Brand
-
Publication number: 20190325301Abstract: A compute matrix is configured to include a set of compute units, each compute unit including a multiplier and an accumulator, each of the multiplier and the accumulator formed using at least one floating point unit (FPU). An accumulator array is configured to include a set of external accumulators. The compute matrix is operated to produce a chunk dot-product using a first chunk of a first input vector and a first chunk of a second input vector. The accumulator array is operated to output a dot-product of the first input vector and the second input vector using the chunk dot-product.Type: ApplicationFiled: April 19, 2018Publication date: October 24, 2019Applicant: International Business Machines CorporationInventors: NAIGANG WANG, Jungwook Choi, Kailash Gopalakrishnan, Daniel Brand
-
Publication number: 20190236113Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.Type: ApplicationFiled: April 11, 2019Publication date: August 1, 2019Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
-
Publication number: 20190171935Abstract: Embodiments of the present invention provide a computer-implemented method for adaptive residual gradient compression for training of a deep learning neural network (DNN). The method includes obtaining, by a first learner, a current gradient vector for a neural network layer of the DNN, in which the current gradient vector includes gradient weights of parameters of the neural network layer that are calculated from a mini-batch of training data. A current residue vector is generated that includes residual gradient weights for the mini-batch. A compressed current residue vector is generated based on dividing the residual gradient weights of the current residue vector into a plurality of bins of a uniform size and quantizing a subset of the residual gradient weights of one or more bins of the plurality of bins. The compressed current residue vector is then transmitted to a second learner of the plurality of learners or to a parameter server.Type: ApplicationFiled: December 4, 2017Publication date: June 6, 2019Inventors: Ankur Agrawal, Daniel Brand, Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan
-
Publication number: 20190164050Abstract: A system, having a memory that stores computer executable components, and a processor that executes the computer executable components, reduces data size in connection with training a neural network by exploiting spatial locality to weight matrices and effecting frequency transformation and compression. A receiving component receives neural network data in the form of a compressed frequency-domain weight matrix. A segmentation component segments the initial weight matrix into original sub-components, wherein respective original sub-components have spatial weights. A sampling component applies a generalized weight distribution to the respective original sub-components to generate respective normalized sub-components. A transform component applies a transform to the respective normalized sub-components.Type: ApplicationFiled: November 30, 2017Publication date: May 30, 2019Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Suyog Gupta, Pritish Narayanan
-
Publication number: 20190122116Abstract: Techniques that facilitate improving an efficiency of a neural network are described. In one embodiment, a system is provided that comprises a memory that stores computer-executable components and a processor that executes computer-executable components stored in the memory. In one implementation, the computer-executable components comprise an initialization component that selects an initial value of an output limit, wherein the output limit indicates a range for an output of an activation function of a neural network. The computer-executable components further comprise a training component that modifies the initial value of the output limit during training to a second value of the output limit, the second value of the output limit being provided as a parameter to the activation function. The computer-executable components further comprise an activation function component that determines the output of the activation function based on the second value of the output limit as the parameter.Type: ApplicationFiled: October 24, 2017Publication date: April 25, 2019Inventors: Jungwook Choi, Kailash Gopalakrishnan, Charbel Sakr, Swagath Venkataramani, Zhuo Wang
-
Patent number: 10261978Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.Type: GrantFiled: December 14, 2017Date of Patent: April 16, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
-
Patent number: 10241972Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.Type: GrantFiled: March 16, 2017Date of Patent: March 26, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
-
Publication number: 20190036053Abstract: Device architectures based on trapping and de-trapping holes or electrons and/or recombination of both types of carriers are obtained by carrier trapping either in near-interface deep ambipolar states or in quantum wells/dots, either serving as ambipolar traps in semiconductor layers or in gate dielectric/barrier layers. In either case, the potential barrier for trapping is small and retention is provided by carrier confinement in the deep trap states and/or quantum wells/dots. The device architectures are usable as three terminal or two terminal devices.Type: ApplicationFiled: September 21, 2018Publication date: January 31, 2019Inventors: Ali Afzali-Ardakani, Tze-Chiang Chen, Kailash Gopalakrishnan, Bahman Hekmatshoartabari
-
Patent number: 10120685Abstract: An apparatus and method for supporting simultaneous multiple iterations (SMI) in a course grained reconfigurable architecture (CGRA). In support of SMI, the apparatus includes: Hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight and which are yet to begin, and a control unit including hardware structures that are used to maintain synchronization and initiate and terminate loops within the PEs. SMI permits execution of the next instruction within any iteration (in flight). If instructions from multiple iterations are ready for execution (and are pre-decoded), then the hardware selects the lowest iteration number ready for execution. If in a particular clock cycle, a loop iteration with a lower iteration number is stalled (i.e.Type: GrantFiled: November 4, 2015Date of Patent: November 6, 2018Assignee: International Business Machines CorporationInventors: Chia-yu Chen, Kailash Gopalakrishnan, Jinwook Oh, Sunil K. Shukla, Vijayalakshmi Srinivasan
-
Patent number: 10115916Abstract: Device architectures based on trapping and de-trapping holes or electrons and/or recombination of both types of carriers are obtained by carrier trapping either in near-interface deep ambipolar states or in quantum wells/dots, either serving as ambipolar traps in semiconductor layers or in gate dielectric/barrier layers. In either case, the potential barrier for trapping is small and retention is provided by carrier confinement in the deep trap states and/or quantum wells/dots. The device architectures are usable as three terminal or two terminal devices.Type: GrantFiled: April 11, 2015Date of Patent: October 30, 2018Assignee: International Business Machines CorporationInventors: Ali Afzali-Ardakani, Tze-Chiang Chen, Kailash Gopalakrishnan, Bahman Hekmatshoartabari
-
Publication number: 20180267938Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.Type: ApplicationFiled: December 14, 2017Publication date: September 20, 2018Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
-
Publication number: 20180267936Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.Type: ApplicationFiled: March 16, 2017Publication date: September 20, 2018Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
-
Patent number: 9972797Abstract: Device architectures based on trapping and de-trapping holes or electrons and/or recombination of both types of carriers are obtained by carrier trapping either in near-interface deep ambipolar states or in quantum wells/dots, either serving as ambipolar traps in semiconductor layers or in gate dielectric/barrier layers. In either case, the potential barrier for trapping is small and retention is provided by carrier confinement in the deep trap states and/or quantum wells/dots. The device architectures are usable as three terminal or two terminal devices.Type: GrantFiled: July 15, 2017Date of Patent: May 15, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Ali Afzali-Ardakani, Tze-Chiang Chen, Kailash Gopalakrishnan, Bahman Hekmatshoartabari
-
Patent number: 9812638Abstract: A device has a M8XY6 layer in between a first conductive layer on the top and a second conductive layer on the bottom, wherein (i) M includes at least one element selected from the following: Cu, Ag, Li, and Zn, (ii) X includes at least one Group XIV element, and (iii) Y includes at least one Group XVI element. Another device has MaXbYc material contacted on opposite sides by respective layers of conductive material, wherein: (i) M includes at least one element selected from the following: Cu, Ag, Li, and Zn, (ii) X includes at least one Group XIV element, and (iii) Y includes at least one Group XVI element, and a is in the range of 48-60 atomic percent, b is in the range of 4-10 atomic percent, c is in the range of 30-45 atomic percent, and a+b+c is at least 90 atomic percent.Type: GrantFiled: March 19, 2010Date of Patent: November 7, 2017Assignee: GLOBALFOUNDRIES Inc.Inventors: Donald S Bethune, Kailash Gopalakrishnan, Andrew J Kellock, Rohit S Shenoy
-
Publication number: 20170317304Abstract: Device architectures based on trapping and de-trapping holes or electrons and/or recombination of both types of carriers are obtained by carrier trapping either in near-interface deep ambipolar states or in quantum wells/dots, either serving as ambipolar traps in semiconductor layers or in gate dielectric/barrier layers. In either case, the potential barrier for trapping is small and retention is provided by carrier confinement in the deep trap states and/or quantum wells/dots. The device architectures are usable as three terminal or two terminal devices.Type: ApplicationFiled: July 15, 2017Publication date: November 2, 2017Inventors: Ali Afzali-Ardakani, Tze-Chiang Chen, Kailash Gopalakrishnan, Bahman Hekmatshoartabari
-
Patent number: 9735383Abstract: Device architectures based on trapping and de-trapping holes or electrons and/or recombination of both types of carriers are obtained by carrier trapping either in near-interface deep ambipolar states or in quantum wells/dots, either serving as ambipolar traps in semiconductor layers or in gate dielectric/barrier layers. In either case, the potential barrier for trapping is small and retention is provided by carrier confinement in the deep trap states and/or quantum wells/dots. The device architectures are usable as three terminal or two terminal devices.Type: GrantFiled: September 12, 2016Date of Patent: August 15, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Ali Afzali-Ardakani, Tze-Chiang Chen, Kailash Gopalakrishnan, Bahman Hekmatshoartabari
-
Publication number: 20170123795Abstract: An apparatus and method for supporting simultaneous multiple iterations (SMI) in a course grained reconfigurable architecture (CGRA). In support of SMI, the apparatus includes: Hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight and which are yet to begin, and a control unit including hardware structures that are used to maintain synchronization and initiate and terminate loops within the PEs. SMI permits execution of the next instruction within any iteration (in flight). If instructions from multiple iterations are ready for execution (and are pre-decoded), then the hardware selects the lowest iteration number ready for execution. If in a particular clock cycle, a loop iteration with a lower iteration number is stalled (i.e.Type: ApplicationFiled: November 4, 2015Publication date: May 4, 2017Inventors: Chia-yu Chen, Kailash Gopalakrishnan, Jinwook Oh, Sunil K. Shukla, Vijayalakshmi Srinivasan
-
Publication number: 20170123794Abstract: An apparatus and method for supporting simultaneous multiple iterations (SMI) and iteration level commits (ILC) in a course grained reconfigurable architecture (CGRA). The apparatus includes: Hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight and which are yet to begin, and a control unit including hardware structures that are used to maintain synchronization and initiate and terminate loops within the PEs. The processing elements, LSU and control unit are configured to commit instructions, and save and restore context at loop iteration boundaries. In doing so, the apparatus tracks and buffers state of in-flight iterations, and detects conditions that prevents an iteration from completion.Type: ApplicationFiled: November 4, 2015Publication date: May 4, 2017Inventors: Chia-yu Chen, Kailash Gopalakrishnan, Jinwook Oh, Lee M. Saltzman, Sunil K. Shukla, Vijayalakshmi Srinivasan
-
Publication number: 20170005281Abstract: Device architectures based on trapping and de-trapping holes or electrons and/or recombination of both types of carriers are obtained by carrier trapping either in near-interface deep ambipolar states or in quantum wells/dots, either serving as ambipolar traps in semiconductor layers or in gate dielectric/barrier layers. In either case, the potential barrier for trapping is small and retention is provided by carrier confinement in the deep trap states and/or quantum wells/dots. The device architectures are usable as three terminal or two terminal devices.Type: ApplicationFiled: September 12, 2016Publication date: January 5, 2017Inventors: Ali Afzali-Ardakani, Tze-Chiang Chen, Kailash Gopalakrishnan, Bahman Hekmatshoartabari