Patents by Inventor Vijayalakshmi Srinivasan

Vijayalakshmi Srinivasan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Fused Convolutions for Fast Deep Neural Network

Publication number: 20240143982

Abstract: Fused channel and/or fused filter convolutions for fast deep neural network execution are provided. In one aspect, a system includes: a processor, connected to a memory, configured to: implement an approximated datapath in a deep neural network having a sequence of adders and multipliers for adding up operands to provide accumulated sums for two or more groups of neurons in the deep neural network, and multiplying the accumulated sums to obtain a product; and make an inference using the deep neural network based on the product from the approximated datapath. A method for approximation in a deep neural network is also provided.

Type: Application

Filed: October 26, 2022

Publication date: May 2, 2024

Inventors: Swagath Venkataramani, Sarada Krithivasan, Vijayalakshmi Srinivasan
FLOATING-POINT UNIT WITH A FUSED MULTIPLY-ADD (FMA) ENGINE FOR GENERATING BINARY INTEGER OUTPUT OR FLOATING POINT OUTPUT BASED ON A SELECTOR

Publication number: 20240134600

Abstract: Provided are a floating-point unit, a system, and method for generating binary integer output or floating-point output based on a selector. A first input operand, a second input operand, a third input operand, and a result format selector value are received. The first input operand, the second input operand, and the third input operand comprise floating-point values. The first input operand, the second input operand, and the third input operand are processed to produce a final result comprising one of a binary integer value and a floating point value based on the result format selector value.

Type: Application

Filed: December 30, 2022

Publication date: April 25, 2024

Inventors: Ankur AGRAWAL, Kailash GOPALAKRISHNAN, Hung Hoang TRAN, Vijayalakshmi SRINIVASAN
Exploiting fine-grained structured weight sparsity in systolic arrays

Patent number: 11941111

Abstract: Indices of non-zero weights may be stored in an index register file included within each of a plurality of processor elements in a systolic array. Non-zero weights may be stored in a register file associated with the index register file. Input values (e.g., dense input values) corresponding to a single block in a data structure may be sent to the plurality of processor elements. Those of the input values corresponding to the indices of non-zero weights in the index register file may be selected for performing multiply-accumulate (“MAC”) operation based on sending the plurality of input values to one or more of the plurality of processor elements. The indices of the plurality of non-zero weight are stored in an index data stick. The values of the plurality of non-zero weights are stored in a value data stick.

Type: Grant

Filed: July 31, 2021

Date of Patent: March 26, 2024

Assignee: International Business Machines Corporation

Inventors: Sanchari Sen, Swagath Venkataramani, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan, Sunil K. Shukla
MULTICHANNEL MEMORY TO AUGMENT LOCAL MEMORY

Publication number: 20240029786

Abstract: A memory system, a method of assembling the memory system, and a computer system. The memory system includes a global memory device coupled to a plurality of processing elements. The global memory device is positioned external to a chip on which the plurality of processing devices reside. The memory system also includes at least one main scratchpad coupled to the at least one processing element of the plurality of processing devices and the global memory device. The memory system further includes a plurality of auxiliary scratchpads coupled to the plurality of processing elements and the global memory device. The one or more auxiliary scratchpads are configured to store static tensors. At least a portion of the plurality of auxiliary scratchpads are configured as a unitary multichannel device.

Type: Application

Filed: July 22, 2022

Publication date: January 25, 2024

Inventors: Ravi Nair, Swagath Venkataramani, Vijayalakshmi Srinivasan, Arvind Kumar
STICKIFICATION USING ANYWHERE PADDING TO ACCELERATE DATA MANIPULATION

Publication number: 20240028899

Abstract: Embodiments are provided for efficient realization of memory-bound operations in a computing system by a processor. Data may be read from and written to a memory at a granular level using a stickification operation. One or more regions of activation and weight tensor data on the memory may be annotated by coupling the stickification operation with padding.

Type: Application

Filed: July 25, 2022

Publication date: January 25, 2024

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Swagath VENKATARAMANI, Vijayalakshmi SRINIVASAN, Shubham JAIN, Sarada KRITHIVASAN, Sanchari SEN
Programmable multicast protocol for ring-topology based artificial intelligence systems

Patent number: 11831467

Abstract: Embodiments for providing enhanced multicast data transfer for ring topology based artificial intelligence systems are disclosed. Multicast data is sent to a plurality of disjointed cores in a multicast group according to a first multicast mode, a second multicast mode, or a third multicast mode, where the first multicast mode sends a first half the multicast data on first multicast ring and a second half on a second multicast ring, the second multicast mode sends the multicast data on either the first multicast ring or the second multicast ring, and the third multicast mode replicates the multicast data and sends the multicast data to both the first multicast ring and the second multicast ring.

Type: Grant

Filed: May 13, 2022

Date of Patent: November 28, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Shubham Jain, Swagath Venkataramani, Vijayalakshmi Srinivasan, Sunil K Shukla, Martin A Lutz
PROGRAMMABLE MULTICAST PROTOCOL FOR RING-TOPOLOGY BASED ARTIFICIAL INTELLIGENCE SYSTEMS

Publication number: 20230370304

Abstract: Embodiments for providing enhanced multicast data transfer for ring topology based artificial intelligence systems are disclosed. Multicast data is sent to a plurality of disjointed cores in a multicast group according to a first multicast mode, a second multicast mode, or a third multicast mode, where the first multicast mode sends a first half the multicast data on first multicast ring and a second half on a second multicast ring, the second multicast mode sends the multicast data on either the first multicast ring or the second multicast ring, and the third multicast mode replicates the multicast data and sends the multicast data to both the first multicast ring and the second multicast ring.

Type: Application

Filed: May 13, 2022

Publication date: November 16, 2023

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Shubham JAIN, Swagath VENKATARAMANI, Vijayalakshmi SRINIVASAN, Sunil K. SHUKLA, Martin A. LUTZ
SINGLE-PRODUCER-MULTIPLE CONSUMERS SYNCHRONIZATION AND MULTICAST DATA TRANSFER

Publication number: 20230344667

Abstract: Embodiments for providing single-producer-multiple consumers synchronization and multicast data transfer by a processor are disclosed. Multicast data transfer is synchronized based on an identification tag and a request from each one of a plurality of recipients for the multicast data. The multicast data is transferred to each of the plurality of recipients based on the identification tag, the request from each one of the plurality of recipients, and a list of the plurality of recipients.

Type: Application

Filed: April 22, 2022

Publication date: October 26, 2023

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Vijayalakshmi SRINIVASAN, Scot RIDER, Swagath VENKATARAMANI, Kailash GOPALAKRISHNAN, Sunil K. SHUKLA, Brian William CURRAN, Martin A. LUTZ
PADDING INPUT DATA FOR ARTIFICIAL INTELLIGENCE ACCELERATORS

Publication number: 20230267003

Abstract: Processing input data for transmittal to a data consumer such as an artificial intelligence engine is performed by arranging the input data into a uniform structure made up of sticks of data combined to form pages of sticks. A stick is any well-sized set of input data elements whereby the size of the stick is fixed. A masking pattern is established for sticks of data having certain ranges of invalid data for consumption of partial sticks while maintaining validity of the input data being transferred. The mask pattern is derived based on set-active-mask-and-value (SAMV) instructions. The derived mask pattern is carried forward for subsequent load instructions to the data consumer.

Type: Application

Filed: February 23, 2022

Publication date: August 24, 2023

Inventors: Cedric Lichtenau, Vijayalakshmi Srinivasan, Sunil K Shukla, Swagath Venkataramani, Kailash Gopalakrishnan, Holger Horbach, Razvan Peter Figuli, Wei Wang, YULONG LI, Martin A Lutz
Sparse systolic array design

Patent number: 11669489

Abstract: A systolic array can be configured to skip distributed operands that have zero-values, resulting in improved resource efficiency. A skip module is introduced to receive operands from memory, identify whether they have a zero value or not, and, if they are nonzero, generate an operand vector including an index before sending the operand vector to a processing element.

Type: Grant

Filed: September 30, 2021

Date of Patent: June 6, 2023

Assignee: International Business Machines Corporation

Inventors: Swagath Venkataramani, Sanchari Sen, Vijayalakshmi Srinivasan, Ankur Agrawal, Sunil K Shukla, Bruce Fleischer, Kailash Gopalakrishnan
SPARSE SYSTOLIC ARRAY DESIGN

Publication number: 20230109301

Abstract: A systolic array can be configured to skip distributed operands that have zero-values, resulting in improved resource efficiency. A skip module is introduced to receive operands from memory, identify whether they have a zero value or not, and, if they are nonzero, generate an operand vector including an index before sending the operand vector to a processing element.

Type: Application

Filed: September 30, 2021

Publication date: April 6, 2023

Inventors: Swagath Venkataramani, Sanchari Sen, Vijayalakshmi Srinivasan, Ankur Agrawal, Sunil K Shukla, Bruce Fleischer, Kailash Gopalakrishnan
Reusing an operand received from a first-in-first-out (FIFO) buffer according to an operand specifier value specified in a predefined field of an instruction

Patent number: 11620132

Abstract: Various embodiments are provided reusing an operand in an instruction set architecture (ISA) by one or more processors in a computing system. An instruction may specify that an operand register for a selected operand retain operand data used by a previous instruction. The operand data in the operand register may be reused by the instruction.

Type: Grant

Filed: May 8, 2019

Date of Patent: April 4, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bruce Fleischer, Sunil Shukla, Vijayalakshmi Srinivasan, Jungwook Choi
EXPLOITING FINE-GRAINED STRUCTURED WEIGHT SPARSITY IN SYSTOLIC ARRAYS

Publication number: 20230030287

Abstract: Indices of non-zero weights may be stored in an index register file included within each of a plurality of processor elements in a systolic array. Non-zero weights may be stored in a register file associated with the index register file. Input values (e.g., dense input values) corresponding to a single block in a data structure may be sent to the plurality of processor elements. Those of the input values corresponding to the indices of non-zero weights in the index register file may be selected for performing multiply-accumulate (“MAC”) operation based on sending the plurality of input values to one or more of the plurality of processor elements. The indices of the plurality of non-zero weight are stored in an index data stick. The values of the plurality of non-zero weights are stored in a value data stick.

Type: Application

Filed: July 31, 2021

Publication date: February 2, 2023

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sanchari SEN, Swagath VENKATARAMANI, Vijayalakshmi SRINIVASAN, Kailash GOPALAKRISHNAN, Sunil K. SHUKLA
Hybrid data-model parallelism for efficient deep learning

Patent number: 11556450

Abstract: The embodiments herein describe hybrid parallelism techniques where a mix of data and model parallelism techniques are used to split the workload of a layer across an array of processors. When configuring the array, the bandwidth of the processors in one direction may be greater than the bandwidth in the other direction. Each layer is characterized according to whether they are more feature heavy or weight heavy. Depending on this characterization, the workload of an NN layer can be assigned to the array using a hybrid parallelism technique rather than using solely the data parallelism technique or solely the model parallelism technique. For example, if an NN layer is more weight heavy than feature heavy, data parallelism is used in the direction with the greater bandwidth (to minimize the negative impact of weight reduction) while model parallelism is used in the direction with the smaller bandwidth.

Type: Grant

Filed: October 11, 2019

Date of Patent: January 17, 2023

Assignee: International Business Machines Corporation

Inventors: Swagath Venkataramani, Vijayalakshmi Srinivasan, Philip Heidelberger
System-aware selective quantization for performance optimized distributed deep learning

Patent number: 11551054

Abstract: A convolutional neural network includes a front layer, a back layer, and a plurality of other layers that are connected between the front layer and the back layer. One of the other layers is a transition layer. A first precision is assigned to activations of neurons from the front layer back to the transition layer and a second precision is assigned to activations of the neurons from the transition layer back to the back layer. A third precision is assigned to weights of inputs to neurons from the front layer back to the transition layer and a fourth precision is assigned to weights of inputs to the neurons from the transition layer back to the back layer. In some embodiments the layers forward of the transition layer have a different convolutional kernel than the layers rearward of the transition layer.

Type: Grant

Filed: August 27, 2019

Date of Patent: January 10, 2023

Assignee: International Business Machines Corporation

Inventors: Jungwook Choi, Swagath Venkataramani, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan
SINGLE FUNCTION TO PERFORM COMBINED CONVOLUTION AND SELECT OPERATIONS

Publication number: 20220405555

Abstract: A combined function specified by an instruction is performed. The combined function includes a plurality of operations performed as part of one invocation of the combined function. The performing the combined function includes performing a convolution using a first tensor and a second tensor to obtain one or more intermediate results, in which the second tensor includes an adjusted weight tensor created using a plurality of multipliers. Values of a bias tensor are added to the one or more intermediate results to obtain one or more combined function results for the combined function.

Type: Application

Filed: June 17, 2021

Publication date: December 22, 2022

Inventors: Cedric Lichtenau, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan, Sunil K. Shukla, Swagath Venkataramani
SINGLE FUNCTION TO PERFORM COMBINED MATRIX MULTIPLICATION AND BIAS ADD OPERATIONS

Publication number: 20220405556

Abstract: A combined function specified by an instruction is performed. The combined function includes a plurality of operations performed as part of one invocation of the combined function. The performing the combined function includes performing a matrix multiplication of a first tensor and a second tensor to obtain one or more intermediate results. The second tensor includes an adjusted weight tensor created using a multiplier. Values of a bias tensor are added to the one or more intermediate results to obtain one or more results for the combined function. The one or more results are at least a part of an output tensor.

Type: Application

Filed: June 17, 2021

Publication date: December 22, 2022

Inventors: Cedric Lichtenau, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan, Sunil K. Shukla, Swagath Venkataramani
REFORMATTING OF TENSORS TO PROVIDE SUB-TENSORS

Publication number: 20220405348

Abstract: A tensor of a first select dimension is reformatted to provide one or more sub-tensors of a second select dimension. The reformatting includes determining a number of sub-tensors to be used to represent the tensor. The reformatting further includes creating the number of sub-tensors, in which a sub-tensor is to start on a boundary of a memory unit. Data of the tensor is rearranged to fit within the number of sub-tensors.

Type: Application

Filed: June 17, 2021

Publication date: December 22, 2022

Inventors: Cedric Lichtenau, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan, Anthony Saporito, Sunil K. Shukla, Swagath Venkataramani
Dynamically resizing minibatch in neural network execution

Patent number: 11354573

Abstract: A minibatch in a neural network execution may be dynamically resized based on on-chip memory. For example, a size of the minibatch is configured such that the minibatch fits within on-chip memory. The size of the minibatch may be resized for a sequence of layers in the neural network execution. A next layer's execution can commence responsive to the resized minibatch being completed in a previous layer without having to wait for all of the minibatch to be completed in the previous layer.

Type: Grant

Filed: March 25, 2019

Date of Patent: June 7, 2022

Assignee: International Business Machines Corporation

Inventors: Swagath Venkataramani, Vijayalakshmi Srinivasan, Jungwook Choi
Reduced precision based programmable and SIMD dataflow architecture

Patent number: 11347517

Abstract: A reduced precision based programmable and single instruction multiple data (SIMD) dataflow architecture includes reduced precision execution units with a majority of the execution units operating at reduced precision and a minority of the execution units are capable of operating at higher precision. The execution units operate in parallel within a programmable execution element to share instruction fetch, decode, and issue pipelines and operate on the same instruction in lock-step to minimize instruction-related overhead.

Type: Grant

Filed: June 20, 2019

Date of Patent: May 31, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kailash Gopalakrishnan, Sunil Shukla, Jungwook Choi, Silvia Mueller, Bruce Fleischer, Vijayalakshmi Srinivasan, Ankur Agrawal, Jinwook Oh

1 2 3 4 5 … next