Patents by Inventor Vijayalakshmi Srinivasan

Vijayalakshmi Srinivasan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System-aware selective quantization for performance optimized distributed deep learning

Patent number: 11551054

Abstract: A convolutional neural network includes a front layer, a back layer, and a plurality of other layers that are connected between the front layer and the back layer. One of the other layers is a transition layer. A first precision is assigned to activations of neurons from the front layer back to the transition layer and a second precision is assigned to activations of the neurons from the transition layer back to the back layer. A third precision is assigned to weights of inputs to neurons from the front layer back to the transition layer and a fourth precision is assigned to weights of inputs to the neurons from the transition layer back to the back layer. In some embodiments the layers forward of the transition layer have a different convolutional kernel than the layers rearward of the transition layer.

Type: Grant

Filed: August 27, 2019

Date of Patent: January 10, 2023

Assignee: International Business Machines Corporation

Inventors: Jungwook Choi, Swagath Venkataramani, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan
SINGLE FUNCTION TO PERFORM COMBINED MATRIX MULTIPLICATION AND BIAS ADD OPERATIONS

Publication number: 20220405556

Abstract: A combined function specified by an instruction is performed. The combined function includes a plurality of operations performed as part of one invocation of the combined function. The performing the combined function includes performing a matrix multiplication of a first tensor and a second tensor to obtain one or more intermediate results. The second tensor includes an adjusted weight tensor created using a multiplier. Values of a bias tensor are added to the one or more intermediate results to obtain one or more results for the combined function. The one or more results are at least a part of an output tensor.

Type: Application

Filed: June 17, 2021

Publication date: December 22, 2022

Inventors: Cedric Lichtenau, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan, Sunil K. Shukla, Swagath Venkataramani
REFORMATTING OF TENSORS TO PROVIDE SUB-TENSORS

Publication number: 20220405348

Abstract: A tensor of a first select dimension is reformatted to provide one or more sub-tensors of a second select dimension. The reformatting includes determining a number of sub-tensors to be used to represent the tensor. The reformatting further includes creating the number of sub-tensors, in which a sub-tensor is to start on a boundary of a memory unit. Data of the tensor is rearranged to fit within the number of sub-tensors.

Type: Application

Filed: June 17, 2021

Publication date: December 22, 2022

Inventors: Cedric Lichtenau, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan, Anthony Saporito, Sunil K. Shukla, Swagath Venkataramani
SINGLE FUNCTION TO PERFORM COMBINED CONVOLUTION AND SELECT OPERATIONS

Publication number: 20220405555

Abstract: A combined function specified by an instruction is performed. The combined function includes a plurality of operations performed as part of one invocation of the combined function. The performing the combined function includes performing a convolution using a first tensor and a second tensor to obtain one or more intermediate results, in which the second tensor includes an adjusted weight tensor created using a plurality of multipliers. Values of a bias tensor are added to the one or more intermediate results to obtain one or more combined function results for the combined function.

Type: Application

Filed: June 17, 2021

Publication date: December 22, 2022

Inventors: Cedric Lichtenau, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan, Sunil K. Shukla, Swagath Venkataramani
Dynamically resizing minibatch in neural network execution

Patent number: 11354573

Abstract: A minibatch in a neural network execution may be dynamically resized based on on-chip memory. For example, a size of the minibatch is configured such that the minibatch fits within on-chip memory. The size of the minibatch may be resized for a sequence of layers in the neural network execution. A next layer's execution can commence responsive to the resized minibatch being completed in a previous layer without having to wait for all of the minibatch to be completed in the previous layer.

Type: Grant

Filed: March 25, 2019

Date of Patent: June 7, 2022

Assignee: International Business Machines Corporation

Inventors: Swagath Venkataramani, Vijayalakshmi Srinivasan, Jungwook Choi
Reduced precision based programmable and SIMD dataflow architecture

Patent number: 11347517

Abstract: A reduced precision based programmable and single instruction multiple data (SIMD) dataflow architecture includes reduced precision execution units with a majority of the execution units operating at reduced precision and a minority of the execution units are capable of operating at higher precision. The execution units operate in parallel within a programmable execution element to share instruction fetch, decode, and issue pipelines and operate on the same instruction in lock-step to minimize instruction-related overhead.

Type: Grant

Filed: June 20, 2019

Date of Patent: May 31, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kailash Gopalakrishnan, Sunil Shukla, Jungwook Choi, Silvia Mueller, Bruce Fleischer, Vijayalakshmi Srinivasan, Ankur Agrawal, Jinwook Oh
Bi-scaled deep neural networks

Patent number: 11263518

Abstract: A method is provided for forming a Deep Neural Network (DNN). The method includes quantizing deep learning data structures of the DNN into at least two modes using at least two scale factors, respectively. Each of the at least two modes corresponds to a respective one of the at least two scale factors. The method further includes identifying which of the at least two scale factors to use for a given one of the data structures based on a data distribution of the given one of the data structures. The quantizing step includes identifying when a tail of the given one of the data structures starts by (i) building a histogram of values in the given one of the data structures using successive bins; (ii) identifying a ratio of density between the successive bins; and (iii) checking whether the ratio of density is greater than a ratio of density threshold.

Type: Grant

Filed: October 4, 2019

Date of Patent: March 1, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Swagath Venkataramani, Shubham Jain, Vijayalakshmi Srinivasan, Leland Chang
Deep neural network performance analysis on shared memory accelerator systems

Patent number: 11188820

Abstract: A Deep Neural Networks (DNN) analysis method, system, and computer program product include characterizing a space of possible configurations for a DNN, evaluating a metric-of-interest for a configuration of the possible configurations, and searching the space to identify a configuration of the possible configurations that maximizes the metric-of-interest.

Type: Grant

Filed: September 8, 2017

Date of Patent: November 30, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jungwook Choi, Vijayalakshmi Srinivasan, Swagath Venkataramani
Loop management in multi-processor dataflow architecture

Patent number: 11138010

Abstract: Embodiments of the present invention include a computer system that manages execution of one or more programs with one or more loops where each loop having a loop level. Embodiments that manage loops that can skip execution and the number of loops changing during execution are also disclosed. A loop level register (LLEV) stores the loop level for a currently executing loop. A Loop-Back Program Counter Register (LBPR) has a table of one or more Loop-Back Registers. Each Loop-Back Register stores the loop level for a LBPR respective loop and a loop back PC location for the LBPR respective loop. A Program Counter points back to the PC location for each iteration of the loop. A Loop Current Count Register table (LCCR) tracks a number of iterations remaining to executed for of the loop. A loop management process causes one of the CPUs to execute all the one or more instructions of an iteration of the currently executing program loop.

Type: Grant

Filed: October 1, 2020

Date of Patent: October 5, 2021

Assignee: International Business Machines Corporation

Inventors: Chia-Yu Chen, Jungwook Choi, Brian William Curran, Bruce Fleischer, Kailash Gopalakrishnan, Jinwook Oh, Sunil K Shukla, Vijayalakshmi Srinivasan
HYBRID DATA-MODEL PARALLELISM FOR EFFICIENT DEEP LEARNING

Publication number: 20210110247

Abstract: The embodiments herein describe hybrid parallelism techniques where a mix of data and model parallelism techniques are used to split the workload of a layer across an array of processors. When configuring the array, the bandwidth of the processors in one direction may be greater than the bandwidth in the other direction. Each layer is characterized according to whether they are more feature heavy or weight heavy. Depending on this characterization, the workload of an NN layer can be assigned to the array using a hybrid parallelism technique rather than using solely the data parallelism technique or solely the model parallelism technique. For example, if an NN layer is more weight heavy than feature heavy, data parallelism is used in the direction with the greater bandwidth (to minimize the negative impact of weight reduction) while model parallelism is used in the direction with the smaller bandwidth.

Type: Application

Filed: October 11, 2019

Publication date: April 15, 2021

Inventors: Swagath Venkataramani, Vijayalakshmi Srinivasan, Philip Heidelberger
BI-SCALED DEEP NEURAL NETWORKS

Publication number: 20210103799

Abstract: A method is provided for forming a Deep Neural Network (DNN). The method includes quantizing deep learning data structures of the DNN into at least two modes using at least two scale factors, respectively. Each of the at least two modes corresponds to a respective one of the at least two scale factors. The method further includes identifying which of the at least two scale factors to use for a given one of the data structures based on a data distribution of the given one of the data structures. The quantizing step includes identifying when a tail of the given one of the data structures starts by (i) building a histogram of values in the given one of the data structures using successive bins; (ii) identifying a ratio of density between the successive bins; and (iii) checking whether the ratio of density is greater than a ratio of density threshold.

Type: Application

Filed: October 4, 2019

Publication date: April 8, 2021

Inventors: Swagath Venkataramani, Shubham Jain, Vijayalakshmi Srinivasan, Leland Chang
Methods of cache preloading on a partition or a context switch

Patent number: 10963387

Abstract: A scheme referred to as a “Region-based cache restoration prefetcher” (RECAP) is employed for cache preloading on a partition or a context switch. The RECAP exploits spatial locality to provide a bandwidth-efficient prefetcher to reduce the “cold” cache effect caused by multiprogrammed virtualization. The RECAP groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks that should be prefetched the next time the current virtual machine executes. Based on these predictions, and using a simple compression technique that also exploits spatial locality, the RECAP provides a robust prefetcher that improves performance without excessive bandwidth overhead or slowdown.

Type: Grant

Filed: March 14, 2019

Date of Patent: March 30, 2021

Assignee: International Business Machines Corporation

Inventors: Harold W. Cain, III, Vijayalakshmi Srinivasan, Jason Zebchuk
SYSTEM-AWARE SELECTIVE QUANTIZATION FOR PERFORMANCE OPTIMIZED DISTRIBUTED DEEP LEARNING

Publication number: 20210064954

Abstract: A convolutional neural network includes a front layer, a back layer, and a plurality of other layers that are connected between the front layer and the back layer. One of the other layers is a transition layer. A first precision is assigned to activations of neurons from the front layer back to the transition layer and a second precision is assigned to activations of the neurons from the transition layer back to the back layer. A third precision is assigned to weights of inputs to neurons from the front layer back to the transition layer and a fourth precision is assigned to weights of inputs to the neurons from the transition layer back to the back layer. In some embodiments the layers forward of the transition layer have a different convolutional kernel than the layers rearward of the transition layer.

Type: Application

Filed: August 27, 2019

Publication date: March 4, 2021

Inventors: Jungwook Choi, Swagath Venkataramani, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan
Predicting cache misses using data access behavior and instruction address

Patent number: 10936319

Abstract: In a decode stage of hardware processor pipeline, one particular instruction of a plurality of instructions is decoded. It is determined that the particular instruction requires a memory access. Responsive to such determination, it is predicted whether the memory access will result in a cache miss. The predicting in turn includes accessing one of a plurality of entries in a pattern history table stored as a hardware table in the decode stage. The accessing is based, at least in part, upon at least a most recent entry in a global history buffer. The pattern history table stores a plurality of predictions. The global history buffer stores actual results of previous memory accesses as one of cache hits and cache misses.

Type: Grant

Filed: June 16, 2018

Date of Patent: March 2, 2021

Assignee: International Business Machines Corporation

Inventors: Vijayalakshmi Srinivasan, Brian R. Prasky
REDUCED PRECISION BASED PROGRAMMABLE AND SIMD DATAFLOW ARCHITECTURE

Publication number: 20200401413

Abstract: Various embodiments are provided for using a reduced precision based programmable and single instruction multiple data (SIMD) dataflow architecture in a computing environment. One or more instructions between a plurality of execution units (EUs) operating in parallel within each one of a plurality of execution elements (EEs).

Type: Application

Filed: June 20, 2019

Publication date: December 24, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kailash GOPALAKRISHNAN, Sunil SHUKLA, Jungwook CHOI, Silvia MUELLER, Bruce FLEISCHER, Vijayalakshmi SRINIVASAN, Ankur AGRAWAL, Jinwook OH
Programmable data delivery by load and store agents on a processing chip interfacing with on-chip memory components and directing data to external memory components

Patent number: 10838868

Abstract: Embodiments for implementing a communicating memory between a plurality of computing components are provided. In one embodiment, an apparatus comprises a plurality of memory components residing on a processing chip, the plurality of memory components interconnected between a plurality of processing elements of at least one processing core of the processing chip and at least one external memory component external to the processing chip. The apparatus further comprises a plurality of load agents and a plurality of store agents on the processing chip, each interfacing with the plurality of memory components. Each of the plurality of load agents and the plurality of store agents execute an independent program specifying a destination of data transacted between the plurality of memory components, the at least one external memory component, and the plurality of processing elements.

Type: Grant

Filed: March 7, 2019

Date of Patent: November 17, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu Chen, Jungwook Choi, Brian Curran, Bruce Fleischer, Kailash Gopalakrishan, Jinwook Oh, Sunil K Shukla, Vijayalakshmi Srinivasan, Swagath Venkataramani
REUSING AN OPERAND IN AN INSTRUCTION SET ARCHITECTURE (ISA)

Publication number: 20200356371

Abstract: Various embodiments are provided reusing an operand in an instruction set architecture (ISA) by one or more processors in a computing system. An instruction may specify that an operand register for a selected operand retain operand data used by a previous instruction. The operand data in the operand register may be reused by the instruction.

Type: Application

Filed: May 8, 2019

Publication date: November 12, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bruce FLEISCHER, Sunil SHUKLA, Vijayalakshmi SRINIVASAN, Jungwook CHOI
DYNAMICALLY RESIZING MINIBATCH IN NEURAL NETWORK EXECUTION

Publication number: 20200311536

Abstract: A minibatch in a neural network execution may be dynamically resized based on on-chip memory. For example, a size of the minibatch is configured such that the minibatch fits within on-chip memory. The size of the minibatch may be resized for a sequence of layers in the neural network execution. A next layer's execution can commence responsive to the resized minibatch being completed in a previous layer without having to wait for all of the minibatch to be completed in the previous layer.

Type: Application

Filed: March 25, 2019

Publication date: October 1, 2020

Inventors: Swagath Venkataramani, Vijayalakshmi Srinivasan, Jungwook Choi
PROGRAMMABLE DATA DELIVERY TO A SYSTEM OF SHARED PROCESSING ELEMENTS WITH SHARED MEMORY

Publication number: 20200285579

Abstract: Embodiments for implementing a communicating memory between a plurality of computing components are provided. In one embodiment, an apparatus comprises a plurality of memory components residing on a processing chip, the plurality of memory components interconnected between a plurality of processing elements of at least one processing core of the processing chip and at least one external memory component external to the processing chip. The apparatus further comprises a plurality of load agents and a plurality of store agents on the processing chip, each interfacing with the plurality of memory components. Each of the plurality of load agents and the plurality of store agents execute an independent program specifying a destination of data transacted between the plurality of memory components, the at least one external memory component, and the plurality of processing elements.

Type: Application

Filed: March 7, 2019

Publication date: September 10, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu CHEN, Jungwook CHOI, Brian CURRAN, Bruce FLEISCHER, Kailash GOPALAKRISHAN, Jinwook OH, Sunil K. SHUKLA, Vijayalakshmi SRINIVASAN, Swagath VENKATARAMANI
Matrix multiplication on a systolic array

Patent number: 10769238

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Grant

Filed: September 19, 2019

Date of Patent: September 8, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang

prev 1 2 3 4 5 6 … next