Patents by Inventor Kailash Gopalakrishnan

Kailash Gopalakrishnan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Reduced precision based programmable and SIMD dataflow architecture

Patent number: 11347517

Abstract: A reduced precision based programmable and single instruction multiple data (SIMD) dataflow architecture includes reduced precision execution units with a majority of the execution units operating at reduced precision and a minority of the execution units are capable of operating at higher precision. The execution units operate in parallel within a programmable execution element to share instruction fetch, decode, and issue pipelines and operate on the same instruction in lock-step to minimize instruction-related overhead.

Type: Grant

Filed: June 20, 2019

Date of Patent: May 31, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kailash Gopalakrishnan, Sunil Shukla, Jungwook Choi, Silvia Mueller, Bruce Fleischer, Vijayalakshmi Srinivasan, Ankur Agrawal, Jinwook Oh
Robust gradient weight compression schemes for deep learning applications

Patent number: 11295208

Abstract: Embodiments of the present invention provide a computer-implemented method for adaptive residual gradient compression for training of a deep learning neural network (DNN). The method includes obtaining, by a first learner, a current gradient vector for a neural network layer of the DNN, in which the current gradient vector includes gradient weights of parameters of the neural network layer that are calculated from a mini-batch of training data. A current residue vector is generated that includes residual gradient weights for the mini-batch. A compressed current residue vector is generated based on dividing the residual gradient weights of the current residue vector into a plurality of bins of a uniform size and quantizing a subset of the residual gradient weights of one or more bins of the plurality of bins. The compressed current residue vector is then transmitted to a second learner of the plurality of learners or to a parameter server.

Type: Grant

Filed: December 4, 2017

Date of Patent: April 5, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ankur Agrawal, Daniel Brand, Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan
Instruction initialization in a dataflow architecture

Patent number: 11223703

Abstract: Various embodiments are provided for implementing instruction initialization in a dataflow architecture in a computing environment. A data packet may be transmitted from a selected node to one or more of a plurality of nodes using one or more existing data paths in an initialization network. A determination operation is performed to determine whether one or more of a plurality of nodes is a target node intended for the data packet. Those of the plurality of nodes determined to be a target node initialize one or more components of the target node using the data packet. The data packet may be forwarded by each of the one or more of a plurality of nodes to a subsequent node in the initialization network.

Type: Grant

Filed: March 19, 2019

Date of Patent: January 11, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Brian Curran, Bruce Fleischer, Kailash Gopalakrishnan, Sunil K Shukla
Facilitating data processing using SIMD reduction operations across SIMD lanes

Patent number: 11216281

Abstract: Various embodiments are provided for facilitating data processing by one or more processors in a computing system. An instruction to be executed may be obtained. The instruction is a single instruction multiple data (SIMD) reduction operation of an operand vector with a plurality of vector elements. The SIMD reduction operation may be executed to produce a result vector with a plurality of alternative vector elements. One or more reduction functions may be performed on each of a pair of vector elements from the plurality of vector elements of the operand vector and a result of the one or more reduction functions may be placed in a corresponding vector element of the result vector.

Type: Grant

Filed: May 14, 2019

Date of Patent: January 4, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bruce Fleischer, Kailash Gopalakrishnan, Jinwook Oh, Sunil Shukla, Silvia Mueller
Facilitating neural network efficiency

Patent number: 11195096

Abstract: Techniques that facilitate improving an efficiency of a neural network are described. In one embodiment, a system is provided that comprises a memory that stores computer-executable components and a processor that executes computer-executable components stored in the memory. In one implementation, the computer-executable components comprise an initialization component that selects an initial value of an output limit, wherein the output limit indicates a range for an output of an activation function of a neural network. The computer-executable components further comprise a training component that modifies the initial value of the output limit during training to a second value of the output limit, the second value of the output limit being provided as a parameter to the activation function. The computer-executable components further comprise an activation function component that determines the output of the activation function based on the second value of the output limit as the parameter.

Type: Grant

Filed: October 24, 2017

Date of Patent: December 7, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jungwook Choi, Kailash Gopalakrishnan, Charbel Sakr, Swagath Venkataramani, Zhuo Wang
Binary floating-point multiply and scale operation for compute-intensive numerical applications and apparatuses

Patent number: 11182127

Abstract: Techniques facilitating binary floating-point multiply and scale operation for compute-intensive numerical applications and apparatuses are provided. An embodiment relates to a system that can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a receiver component that receives an instruction to perform a multiply and scale operation of the first floating point operand value, the second floating point operand value, and the integer operand value, wherein the multiplication component obtains the floating-point product in response to the instruction to perform the multiply and scale operation. The multiplication can be performed as a single instruction.

Type: Grant

Filed: March 25, 2019

Date of Patent: November 23, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Silvia Melitta Mueller, Bruce Fleischer, Ankur Agrawal, Kailash Gopalakrishnan
Floating point unit for exponential function implementation

Patent number: 11163533

Abstract: A computer-implemented method for performing an exponential calculation using only two fully-pipelined instructions in a floating point unit that includes. The method includes computing an intermediate value y? by multiplying an input operand with a predetermined constant value. The input operand is received in floating point representation. The method further includes computing an exponential result for the input operand by executing a fused instruction. The fused instructions includes converting the intermediate value y? to an integer representation z represented by v most significant bits (MSB), and w least significant bits (LSB). The fused instruction further includes determining exponent bits of the exponential result based on the v MSB from the integer representation z. The method further includes determining mantissa bits of the exponential result according to a piece-wise linear mapping function using a predetermined number of segments based on the w LSB from the integer representation z.

Type: Grant

Filed: July 18, 2019

Date of Patent: November 2, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Xiao Sun, Ankur Agrawal, Kailash Gopalakrishnan, Silvia Melitta Mueller, Kerstin Claudia Schelm
Loop management in multi-processor dataflow architecture

Patent number: 11138010

Abstract: Embodiments of the present invention include a computer system that manages execution of one or more programs with one or more loops where each loop having a loop level. Embodiments that manage loops that can skip execution and the number of loops changing during execution are also disclosed. A loop level register (LLEV) stores the loop level for a currently executing loop. A Loop-Back Program Counter Register (LBPR) has a table of one or more Loop-Back Registers. Each Loop-Back Register stores the loop level for a LBPR respective loop and a loop back PC location for the LBPR respective loop. A Program Counter points back to the PC location for each iteration of the loop. A Loop Current Count Register table (LCCR) tracks a number of iterations remaining to executed for of the loop. A loop management process causes one of the CPUs to execute all the one or more instructions of an iteration of the currently executing program loop.

Type: Grant

Filed: October 1, 2020

Date of Patent: October 5, 2021

Assignee: International Business Machines Corporation

Inventors: Chia-Yu Chen, Jungwook Choi, Brian William Curran, Bruce Fleischer, Kailash Gopalakrishnan, Jinwook Oh, Sunil K Shukla, Vijayalakshmi Srinivasan
HYBRID FLOATING POINT REPRESENTATION FOR DEEP LEARNING ACCELERATION

Publication number: 20210109709

Abstract: In an embodiment, a method includes configuring a specialized circuit for floating point computations using numbers represented by a hybrid format, wherein the hybrid format includes a first format and a second format. In the embodiment, the method includes operating the further configured specialized circuit to store an approximation of a numeric value in the first format during a forward pass for training a deep learning network. In the embodiment, the method includes operating the further configured specialized circuit to store an approximation of a second numeric value in the second format during a backward pass for training the deep learning network.

Type: Application

Filed: December 21, 2020

Publication date: April 15, 2021

Applicant: International Business Machines Corporation

Inventors: Naigang Wang, Jungwook Choi, Kailash Gopalakrishnan, Ankur Agrawal, Silvia Melitta Mueller
Hybrid floating point representation for deep learning acceleration

Patent number: 10963219

Abstract: In an embodiment, a method includes configuring a specialized circuit for floating point computations using numbers represented by a hybrid format, wherein the hybrid format includes a first format and a second format. In the embodiment, the method includes operating the further configured specialized circuit to store an approximation of a numeric value in the first format during a forward pass for training a deep learning network. In the embodiment, the method includes operating the further configured specialized circuit to store an approximation of a second numeric value in the second format during a backward pass for training the deep learning network.

Type: Grant

Filed: February 6, 2019

Date of Patent: March 30, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Naigang Wang, Jungwook Choi, Kailash Gopalakrishnan, Ankur Agrawal, Silvia Melitta Mueller
SYSTEM-AWARE SELECTIVE QUANTIZATION FOR PERFORMANCE OPTIMIZED DISTRIBUTED DEEP LEARNING

Publication number: 20210064954

Abstract: A convolutional neural network includes a front layer, a back layer, and a plurality of other layers that are connected between the front layer and the back layer. One of the other layers is a transition layer. A first precision is assigned to activations of neurons from the front layer back to the transition layer and a second precision is assigned to activations of the neurons from the transition layer back to the back layer. A third precision is assigned to weights of inputs to neurons from the front layer back to the transition layer and a fourth precision is assigned to weights of inputs to the neurons from the transition layer back to the back layer. In some embodiments the layers forward of the transition layer have a different convolutional kernel than the layers rearward of the transition layer.

Type: Application

Filed: August 27, 2019

Publication date: March 4, 2021

Inventors: Jungwook Choi, Swagath Venkataramani, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan
NEURAL NETWORK CIRCUITRY HAVING FLOATING POINT FORMAT WITH ASYMMETRIC RANGE

Publication number: 20210064976

Abstract: An apparatus includes circuitry for a neural network that is configured to perform forward propagation neural network operations on floating point numbers having a first n-bit floating point format. The first n-bit floating point format has a configuration consisting of a sign bit, m exponent bits and p mantissa bits where m is greater than p. The circuitry is further configured to perform backward propagation neural network operations on floating point numbers having a second n-bit floating point format that is different than the first n-bit floating point format. The second n-bit floating point format has a configuration consisting of a sign bit, q exponent bits and r mantissa bits where q is greater than m and r is less than p.

Type: Application

Filed: September 3, 2019

Publication date: March 4, 2021

Inventors: Xiao Sun, Jungwook Choi, Naigang Wang, Chia-Yu Chen, Kailash Gopalakrishnan
MIXED PRECISION CAPABLE HARDWARE FOR TUNING A MACHINE LEARNING MODEL

Publication number: 20210064372

Abstract: An apparatus includes a memory and a processor coupled to the memory. The processor includes first and second sets of arithmetic units having first and second precision for floating-point computations, the second precision being lower than the first precision. The processor is configured to obtain a machine learning model trained in the first precision, to utilize the second set of arithmetic units to perform inference on input data, to utilize the first set of arithmetic units to generate feedback for updating parameters of the second set of arithmetic units based on the inference performed on the input data by the second set of arithmetic units, to tune parameters of the second set of arithmetic units based at least in part on the feedback generated by the first set of arithmetic units, and to utilize the second set of arithmetic units with the tuned parameters to generate inference results.

Type: Application

Filed: September 3, 2019

Publication date: March 4, 2021

Inventors: Xiao Sun, Chia-Yu Chen, Naigang Wang, Jungwook Choi, Kailash Gopalakrishnan
MACHINE LEARNING HARDWARE HAVING REDUCED PRECISION PARAMETER COMPONENTS FOR EFFICIENT PARAMETER UPDATE

Publication number: 20210064985

Abstract: An apparatus for training and inferencing a neural network includes circuitry that is configured to generate a first weight having a first format including a first number of bits based at least in part on a second weight having a second format including a second number of bits and a residual having a third format including a third number of bits. The second number of bits and the third number of bits are each less than the first number of bits. The circuitry is further configured to update the second weight based at least in part on the first weight and to update the residual based at least in part on the updated second weight and the first weight. The circuitry is further configured to update the first weight based at least in part on the updated second weight and the updated residual.

Type: Application

Filed: September 3, 2019

Publication date: March 4, 2021

Inventors: Xiao Sun, Jungwook Choi, Naigang Wang, Chia-Yu Chen, Kailash Gopalakrishnan
FLOATING POINT UNIT FOR EXPONENTIAL FUNCTION IMPLEMENTATION

Publication number: 20210019116

Abstract: A computer-implemented method for performing an exponential calculation using only two fully-pipelined instructions in a floating point unit that includes. The method includes computing an intermediate value y? by multiplying an input operand with a predetermined constant value. The input operand is received in floating point representation. The method further includes computing an exponential result for the input operand by executing a fused instruction. The fused instructions includes converting the intermediate value y? to an integer representation z represented by v most significant bits (MSB), and w least significant bits (LSB). The fused instruction further includes determining exponent bits of the exponential result based on the v MSB from the integer representation z. The method further includes determining mantissa bits of the exponential result according to a piece-wise linear mapping function using a predetermined number of segments based on the w LSB from the integer representation z.

Type: Application

Filed: July 18, 2019

Publication date: January 21, 2021

Inventors: Xiao Sun, Ankur Agrawal, Kailash Gopalakrishnan, Silvia Melitta Mueller, Kerstin Claudia Schelm
REDUCED PRECISION BASED PROGRAMMABLE AND SIMD DATAFLOW ARCHITECTURE

Publication number: 20200401413

Abstract: Various embodiments are provided for using a reduced precision based programmable and single instruction multiple data (SIMD) dataflow architecture in a computing environment. One or more instructions between a plurality of execution units (EUs) operating in parallel within each one of a plurality of execution elements (EEs).

Type: Application

Filed: June 20, 2019

Publication date: December 24, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kailash GOPALAKRISHNAN, Sunil SHUKLA, Jungwook CHOI, Silvia MUELLER, Bruce FLEISCHER, Vijayalakshmi SRINIVASAN, Ankur AGRAWAL, Jinwook OH
ULTRA-LOW PRECISION FLOATING-POINT FUSED MULTIPLY-ACCUMULATE UNIT

Publication number: 20200387351

Abstract: Embodiments for implementing a fused multiply-multiply-accumulate (“FMMA”) unit by one or more processors in a computing system. Mantissas for two products, an exponent difference of the two products serving as an alignment shift amount for a product of the two products having a smallest exponent, and an alignment shift amount for an addend relative to an alternative product of the two product having a larger exponent may be determined in parallel. The addend may be aligned relative to the alternative product having the larger exponent. The product having the smallest exponent may be aligned relative to the alternative product having the larger exponent according to the alignment shift amount.

Type: Application

Filed: June 5, 2019

Publication date: December 10, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ankur AGRAWAL, Silvia MUELLER, Kailash GOPALAKRISHNAN, Bruce FLEISCHER, Balaram SINHAROY, Mingu KANG
FACILITATING DATA PROCESSING USING SIMD REDUCTION OPERATIONS ACROSS SIMD LANES

Publication number: 20200364056

Abstract: Various embodiments are provided for facilitating data processing by one or more processors in a computing system. An instruction to be executed may be obtained. The instruction is a single instruction multiple data (SIMD) reduction operation of an operand vector with a plurality of vector elements. The SIMD reduction operation may be executed to produce a result vector with a plurality of alternative vector elements. One or more reduction functions may be performed on each of a pair of vector elements from the plurality of vector elements of the operand vector and a result of the one or more reduction functions may be placed in a corresponding vector element of the result vector.

Type: Application

Filed: May 14, 2019

Publication date: November 19, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bruce FLEISCHER, Kailash GOPALAKRISHNAN, Jinwook OH, Sunil SHUKLA, Silvia MUELLER
BINARY FLOATING-POINT MULTIPLY AND SCALE OPERATION FOR COMPUTE-INTENSIVE NUMERICAL APPLICATIONS AND APPARATUSES

Publication number: 20200310755

Abstract: Techniques facilitating binary floating-point multiply and scale operation for compute-intensive numerical applications and apparatuses are provided. An embodiment relates to a system that can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a receiver component that receives an instruction to perform a multiply and scale operation of the first floating point operand value, the second floating point operand value, and the integer operand value, wherein the multiplication component obtains the floating-point product in response to the instruction to perform the multiply and scale operation. The multiplication can be performed as a single instruction.

Type: Application

Filed: March 25, 2019

Publication date: October 1, 2020

Inventors: Silvia Melitta Mueller, Bruce Fleischer, Ankur Agrawal, Kailash Gopalakrishnan
INSTRUCTION INITIALIZATION IN A DATAFLOW ARCHITECTURE

Publication number: 20200304598

Abstract: Various embodiments are provided for implementing instruction initialization in a dataflow architecture in a computing environment. A data packet may be transmitted from a selected node to one or more of a plurality of nodes using one or more existing data paths in an initialization network. A determination operation is performed to determine whether one or more of a plurality of nodes is a target node intended for the data packet. Those of the plurality of nodes determined to be a target node initialize one or more components of the target node using the data packet. The data packet may be forwarded by each of the one or more of a plurality of nodes to a subsequent node in the initialization network.

Type: Application

Filed: March 19, 2019

Publication date: September 24, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Brian CURRAN, Bruce FLEISCHER, Kailash GOPALAKRISHNAN, Sunil K SHUKLA

prev 1 2 3 4 5 6 … next