Patents by Inventor Kailash Gopalakrishnan

Kailash Gopalakrishnan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11977974
    Abstract: A system, having a memory that stores computer executable components, and a processor that executes the computer executable components, reduces data size in connection with training a neural network by exploiting spatial locality to weight matrices and effecting frequency transformation and compression. A receiving component receives neural network data in the form of a compressed frequency-domain weight matrix. A segmentation component segments the initial weight matrix into original sub-components, wherein respective original sub-components have spatial weights. A sampling component applies a generalized weight distribution to the respective original sub-components to generate respective normalized sub-components. A transform component applies a transform to the respective normalized sub-components.
    Type: Grant
    Filed: November 30, 2017
    Date of Patent: May 7, 2024
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Suyog Gupta, Pritish Narayanan
  • Publication number: 20240134601
    Abstract: Provided are a floating-point unit, a system, and method for fused multiply-add logic to process input operands including floating-point values and integer values. A first input operand comprising an integer value and second and third input operands comprising floating-point values are received. The first, second, and third input operands are processed to produce a floating-point result.
    Type: Application
    Filed: November 11, 2022
    Publication date: April 25, 2024
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ankur AGRAWAL, Kailash GOPALAKRISHNAN
  • Publication number: 20240134600
    Abstract: Provided are a floating-point unit, a system, and method for generating binary integer output or floating-point output based on a selector. A first input operand, a second input operand, a third input operand, and a result format selector value are received. The first input operand, the second input operand, and the third input operand comprise floating-point values. The first input operand, the second input operand, and the third input operand are processed to produce a final result comprising one of a binary integer value and a floating point value based on the result format selector value.
    Type: Application
    Filed: December 30, 2022
    Publication date: April 25, 2024
    Inventors: Ankur AGRAWAL, Kailash GOPALAKRISHNAN, Hung Hoang TRAN, Vijayalakshmi SRINIVASAN
  • Patent number: 11941111
    Abstract: Indices of non-zero weights may be stored in an index register file included within each of a plurality of processor elements in a systolic array. Non-zero weights may be stored in a register file associated with the index register file. Input values (e.g., dense input values) corresponding to a single block in a data structure may be sent to the plurality of processor elements. Those of the input values corresponding to the indices of non-zero weights in the index register file may be selected for performing multiply-accumulate (“MAC”) operation based on sending the plurality of input values to one or more of the plurality of processor elements. The indices of the plurality of non-zero weight are stored in an index data stick. The values of the plurality of non-zero weights are stored in a value data stick.
    Type: Grant
    Filed: July 31, 2021
    Date of Patent: March 26, 2024
    Assignee: International Business Machines Corporation
    Inventors: Sanchari Sen, Swagath Venkataramani, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan, Sunil K. Shukla
  • Publication number: 20240083341
    Abstract: A device is provided for compensating cut-off line drift of a lamp in a vehicle. An adjustment unit adjusts the lighting direction of the lamp. A carrier frame carries a lighting unit of the lamp. A triangularly shaped connecting means mechanically connects the adjustment unit with the carrier frame. The connecting means includes a first joint at a first corner of the connecting means, a second joint at a second corner of the connecting means, a third joint at a third corner of the connecting means, a first conjunction unit connected between the first joint and the second joint, a second conjunction unit connected between the first joint and the third joint, and a third conjunction unit connected between the second joint and the third joint. The joints move the conjunction units relative to each other for compensating the cut-off line drift.
    Type: Application
    Filed: November 14, 2023
    Publication date: March 14, 2024
    Inventors: Kailash Gopalakrishnan, Prethyush Tazath Pullaikudy
  • Publication number: 20230344667
    Abstract: Embodiments for providing single-producer-multiple consumers synchronization and multicast data transfer by a processor are disclosed. Multicast data transfer is synchronized based on an identification tag and a request from each one of a plurality of recipients for the multicast data. The multicast data is transferred to each of the plurality of recipients based on the identification tag, the request from each one of the plurality of recipients, and a list of the plurality of recipients.
    Type: Application
    Filed: April 22, 2022
    Publication date: October 26, 2023
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Vijayalakshmi SRINIVASAN, Scot RIDER, Swagath VENKATARAMANI, Kailash GOPALAKRISHNAN, Sunil K. SHUKLA, Brian William CURRAN, Martin A. LUTZ
  • Patent number: 11775257
    Abstract: Techniques for operating on and calculating binary floating-point numbers using an enhanced floating-point number format are presented. The enhanced format can comprise a single sign bit, six bits for the exponent, and nine bits for the fraction. Using six bits for the exponent can provide an enhanced exponent range that facilitates desirably fast convergence of computing-intensive algorithms and low error rates for computing-intensive applications. The enhanced format can employ a specified definition for the lowest binade that enables the lowest binade to be used for zero and normal numbers; and a specified definition for the highest binade that enables it to be structured to have one data point used for a merged Not-a-Number (NaN)/infinity symbol and remaining data points used for finite numbers. The signs of zero and merged NaN/infinity can be “don't care” terms. The enhanced format employs only one rounding mode, which is for rounding toward nearest up.
    Type: Grant
    Filed: April 6, 2020
    Date of Patent: October 3, 2023
    Assignee: International Business Machines Corporation
    Inventors: Silvia Melitta Mueller, Ankur Agrawal, Bruce Fleischer, Kailash Gopalakrishnan, Dongsoo Lee
  • Publication number: 20230267003
    Abstract: Processing input data for transmittal to a data consumer such as an artificial intelligence engine is performed by arranging the input data into a uniform structure made up of sticks of data combined to form pages of sticks. A stick is any well-sized set of input data elements whereby the size of the stick is fixed. A masking pattern is established for sticks of data having certain ranges of invalid data for consumption of partial sticks while maintaining validity of the input data being transferred. The mask pattern is derived based on set-active-mask-and-value (SAMV) instructions. The derived mask pattern is carried forward for subsequent load instructions to the data consumer.
    Type: Application
    Filed: February 23, 2022
    Publication date: August 24, 2023
    Inventors: Cedric Lichtenau, Vijayalakshmi Srinivasan, Sunil K Shukla, Swagath Venkataramani, Kailash Gopalakrishnan, Holger Horbach, Razvan Peter Figuli, Wei Wang, YULONG LI, Martin A Lutz
  • Patent number: 11669489
    Abstract: A systolic array can be configured to skip distributed operands that have zero-values, resulting in improved resource efficiency. A skip module is introduced to receive operands from memory, identify whether they have a zero value or not, and, if they are nonzero, generate an operand vector including an index before sending the operand vector to a processing element.
    Type: Grant
    Filed: September 30, 2021
    Date of Patent: June 6, 2023
    Assignee: International Business Machines Corporation
    Inventors: Swagath Venkataramani, Sanchari Sen, Vijayalakshmi Srinivasan, Ankur Agrawal, Sunil K Shukla, Bruce Fleischer, Kailash Gopalakrishnan
  • Publication number: 20230109301
    Abstract: A systolic array can be configured to skip distributed operands that have zero-values, resulting in improved resource efficiency. A skip module is introduced to receive operands from memory, identify whether they have a zero value or not, and, if they are nonzero, generate an operand vector including an index before sending the operand vector to a processing element.
    Type: Application
    Filed: September 30, 2021
    Publication date: April 6, 2023
    Inventors: Swagath Venkataramani, Sanchari Sen, Vijayalakshmi Srinivasan, Ankur Agrawal, Sunil K Shukla, Bruce Fleischer, Kailash Gopalakrishnan
  • Patent number: 11620105
    Abstract: In an embodiment, a method includes configuring a specialized circuit for floating point computations using numbers represented by a hybrid format, wherein the hybrid format includes a first format and a second format. In the embodiment, the method includes operating the further configured specialized circuit to store an approximation of a numeric value in the first format during a forward pass for training a deep learning network. In the embodiment, the method includes operating the further configured specialized circuit to store an approximation of a second numeric value in the second format during a backward pass for training the deep learning network.
    Type: Grant
    Filed: December 21, 2020
    Date of Patent: April 4, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Naigang Wang, Jungwook Choi, Kailash Gopalakrishnan, Ankur Agrawal, Silvia Melitta Mueller
  • Patent number: 11604647
    Abstract: An apparatus includes a memory and a processor coupled to the memory. The processor includes first and second sets of arithmetic units having first and second precision for floating-point computations, the second precision being lower than the first precision. The processor is configured to obtain a machine learning model trained in the first precision, to utilize the second set of arithmetic units to perform inference on input data, to utilize the first set of arithmetic units to generate feedback for updating parameters of the second set of arithmetic units based on the inference performed on the input data by the second set of arithmetic units, to tune parameters of the second set of arithmetic units based at least in part on the feedback generated by the first set of arithmetic units, and to utilize the second set of arithmetic units with the tuned parameters to generate inference results.
    Type: Grant
    Filed: September 3, 2019
    Date of Patent: March 14, 2023
    Assignee: International Business Machines Corporation
    Inventors: Xiao Sun, Chia-Yu Chen, Naigang Wang, Jungwook Choi, Kailash Gopalakrishnan
  • Publication number: 20230030287
    Abstract: Indices of non-zero weights may be stored in an index register file included within each of a plurality of processor elements in a systolic array. Non-zero weights may be stored in a register file associated with the index register file. Input values (e.g., dense input values) corresponding to a single block in a data structure may be sent to the plurality of processor elements. Those of the input values corresponding to the indices of non-zero weights in the index register file may be selected for performing multiply-accumulate (“MAC”) operation based on sending the plurality of input values to one or more of the plurality of processor elements. The indices of the plurality of non-zero weight are stored in an index data stick. The values of the plurality of non-zero weights are stored in a value data stick.
    Type: Application
    Filed: July 31, 2021
    Publication date: February 2, 2023
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sanchari SEN, Swagath VENKATARAMANI, Vijayalakshmi SRINIVASAN, Kailash GOPALAKRISHNAN, Sunil K. SHUKLA
  • Patent number: 11551054
    Abstract: A convolutional neural network includes a front layer, a back layer, and a plurality of other layers that are connected between the front layer and the back layer. One of the other layers is a transition layer. A first precision is assigned to activations of neurons from the front layer back to the transition layer and a second precision is assigned to activations of the neurons from the transition layer back to the back layer. A third precision is assigned to weights of inputs to neurons from the front layer back to the transition layer and a fourth precision is assigned to weights of inputs to the neurons from the transition layer back to the back layer. In some embodiments the layers forward of the transition layer have a different convolutional kernel than the layers rearward of the transition layer.
    Type: Grant
    Filed: August 27, 2019
    Date of Patent: January 10, 2023
    Assignee: International Business Machines Corporation
    Inventors: Jungwook Choi, Swagath Venkataramani, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan
  • Patent number: 11551077
    Abstract: Techniques for statistics-aware weight quantization are presented. To facilitate reducing the bit precision of weights, for a set of weights, a quantizer management component can estimate a quantization scale value to apply to a weight as a linear or non-linear function of the mean of a square of a weight value of the weight and the mean of an absolute value of the weight value, wherein the quantization scale value is determined to have a smaller quantization error than all, or at least almost all, other quantization errors associated with other quantization scale values. A quantizer component applies the quantization scale value to symmetrically and/or uniformly quantize weights of a layer of the set of weights to generate quantized weights, the weights being quantized using rounding. The respective quantized weights can be used to facilitate training and inference of a deep learning system.
    Type: Grant
    Filed: June 13, 2018
    Date of Patent: January 10, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Zhuo Wang, Jungwook Choi, Kailash Gopalakrishnan, Pierce I-Jen Chuang
  • Publication number: 20220405348
    Abstract: A tensor of a first select dimension is reformatted to provide one or more sub-tensors of a second select dimension. The reformatting includes determining a number of sub-tensors to be used to represent the tensor. The reformatting further includes creating the number of sub-tensors, in which a sub-tensor is to start on a boundary of a memory unit. Data of the tensor is rearranged to fit within the number of sub-tensors.
    Type: Application
    Filed: June 17, 2021
    Publication date: December 22, 2022
    Inventors: Cedric Lichtenau, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan, Anthony Saporito, Sunil K. Shukla, Swagath Venkataramani
  • Publication number: 20220405556
    Abstract: A combined function specified by an instruction is performed. The combined function includes a plurality of operations performed as part of one invocation of the combined function. The performing the combined function includes performing a matrix multiplication of a first tensor and a second tensor to obtain one or more intermediate results. The second tensor includes an adjusted weight tensor created using a multiplier. Values of a bias tensor are added to the one or more intermediate results to obtain one or more results for the combined function. The one or more results are at least a part of an output tensor.
    Type: Application
    Filed: June 17, 2021
    Publication date: December 22, 2022
    Inventors: Cedric Lichtenau, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan, Sunil K. Shukla, Swagath Venkataramani
  • Publication number: 20220405555
    Abstract: A combined function specified by an instruction is performed. The combined function includes a plurality of operations performed as part of one invocation of the combined function. The performing the combined function includes performing a convolution using a first tensor and a second tensor to obtain one or more intermediate results, in which the second tensor includes an adjusted weight tensor created using a plurality of multipliers. Values of a bias tensor are added to the one or more intermediate results to obtain one or more combined function results for the combined function.
    Type: Application
    Filed: June 17, 2021
    Publication date: December 22, 2022
    Inventors: Cedric Lichtenau, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan, Sunil K. Shukla, Swagath Venkataramani
  • Patent number: 11455142
    Abstract: Embodiments for implementing a fused multiply-multiply-accumulate (“FMMA”) unit by one or more processors in a computing system. Mantissas for two products, an exponent difference of the two products serving as an alignment shift amount for a product of the two products having a smallest exponent, and an alignment shift amount for an addend relative to an alternative product of the two product having a larger exponent may be determined in parallel. The addend may be aligned relative to the alternative product having the larger exponent. The product having the smallest exponent may be aligned relative to the alternative product having the larger exponent according to the alignment shift amount.
    Type: Grant
    Filed: June 5, 2019
    Date of Patent: September 27, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ankur Agrawal, Silvia Mueller, Kailash Gopalakrishnan, Bruce Fleischer, Balaram Sinharoy, Mingu Kang
  • Publication number: 20220180171
    Abstract: An apparatus includes a floating-point gradient register; an integer register; a memory bank; and an array of processing units. Each of the units includes a plurality of binary shifters having an integer input configured to obtain corresponding bits of a 4-bit integer multiplicand, and a shift-specifying input configured to obtain corresponding bits in an exponent field of a 4-bit floating point multiplier. The multiplier is specified in a mantissaless four-bit floating point format including a sign bit, three exponent bits, and no mantissa bits. An adder tree has a plurality of inputs coupled to outputs of the plurality of shifters, and a rounder has an input coupled to an output of the adder tree. The integer inputs are connected to the integer register; the shift-specifying inputs are connected to the floating-point gradient register; and outputs of the rounders are coupled to the memory bank.
    Type: Application
    Filed: December 4, 2020
    Publication date: June 9, 2022
    Inventors: Xiao Sun, Ankur Agrawal, Kailash Gopalakrishnan, Naigang Wang, Chia-Yu Chen, Jiamin Ni