Patents by Inventor Vijayalakshmi Srinivasan

Vijayalakshmi Srinivasan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Bi-scaled deep neural networks

Patent number: 11263518

Abstract: A method is provided for forming a Deep Neural Network (DNN). The method includes quantizing deep learning data structures of the DNN into at least two modes using at least two scale factors, respectively. Each of the at least two modes corresponds to a respective one of the at least two scale factors. The method further includes identifying which of the at least two scale factors to use for a given one of the data structures based on a data distribution of the given one of the data structures. The quantizing step includes identifying when a tail of the given one of the data structures starts by (i) building a histogram of values in the given one of the data structures using successive bins; (ii) identifying a ratio of density between the successive bins; and (iii) checking whether the ratio of density is greater than a ratio of density threshold.

Type: Grant

Filed: October 4, 2019

Date of Patent: March 1, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Swagath Venkataramani, Shubham Jain, Vijayalakshmi Srinivasan, Leland Chang
Deep neural network performance analysis on shared memory accelerator systems

Patent number: 11188820

Abstract: A Deep Neural Networks (DNN) analysis method, system, and computer program product include characterizing a space of possible configurations for a DNN, evaluating a metric-of-interest for a configuration of the possible configurations, and searching the space to identify a configuration of the possible configurations that maximizes the metric-of-interest.

Type: Grant

Filed: September 8, 2017

Date of Patent: November 30, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jungwook Choi, Vijayalakshmi Srinivasan, Swagath Venkataramani
Loop management in multi-processor dataflow architecture

Patent number: 11138010

Abstract: Embodiments of the present invention include a computer system that manages execution of one or more programs with one or more loops where each loop having a loop level. Embodiments that manage loops that can skip execution and the number of loops changing during execution are also disclosed. A loop level register (LLEV) stores the loop level for a currently executing loop. A Loop-Back Program Counter Register (LBPR) has a table of one or more Loop-Back Registers. Each Loop-Back Register stores the loop level for a LBPR respective loop and a loop back PC location for the LBPR respective loop. A Program Counter points back to the PC location for each iteration of the loop. A Loop Current Count Register table (LCCR) tracks a number of iterations remaining to executed for of the loop. A loop management process causes one of the CPUs to execute all the one or more instructions of an iteration of the currently executing program loop.

Type: Grant

Filed: October 1, 2020

Date of Patent: October 5, 2021

Assignee: International Business Machines Corporation

Inventors: Chia-Yu Chen, Jungwook Choi, Brian William Curran, Bruce Fleischer, Kailash Gopalakrishnan, Jinwook Oh, Sunil K Shukla, Vijayalakshmi Srinivasan
HYBRID DATA-MODEL PARALLELISM FOR EFFICIENT DEEP LEARNING

Publication number: 20210110247

Abstract: The embodiments herein describe hybrid parallelism techniques where a mix of data and model parallelism techniques are used to split the workload of a layer across an array of processors. When configuring the array, the bandwidth of the processors in one direction may be greater than the bandwidth in the other direction. Each layer is characterized according to whether they are more feature heavy or weight heavy. Depending on this characterization, the workload of an NN layer can be assigned to the array using a hybrid parallelism technique rather than using solely the data parallelism technique or solely the model parallelism technique. For example, if an NN layer is more weight heavy than feature heavy, data parallelism is used in the direction with the greater bandwidth (to minimize the negative impact of weight reduction) while model parallelism is used in the direction with the smaller bandwidth.

Type: Application

Filed: October 11, 2019

Publication date: April 15, 2021

Inventors: Swagath Venkataramani, Vijayalakshmi Srinivasan, Philip Heidelberger
BI-SCALED DEEP NEURAL NETWORKS

Publication number: 20210103799

Abstract: A method is provided for forming a Deep Neural Network (DNN). The method includes quantizing deep learning data structures of the DNN into at least two modes using at least two scale factors, respectively. Each of the at least two modes corresponds to a respective one of the at least two scale factors. The method further includes identifying which of the at least two scale factors to use for a given one of the data structures based on a data distribution of the given one of the data structures. The quantizing step includes identifying when a tail of the given one of the data structures starts by (i) building a histogram of values in the given one of the data structures using successive bins; (ii) identifying a ratio of density between the successive bins; and (iii) checking whether the ratio of density is greater than a ratio of density threshold.

Type: Application

Filed: October 4, 2019

Publication date: April 8, 2021

Inventors: Swagath Venkataramani, Shubham Jain, Vijayalakshmi Srinivasan, Leland Chang
Methods of cache preloading on a partition or a context switch

Patent number: 10963387

Abstract: A scheme referred to as a “Region-based cache restoration prefetcher” (RECAP) is employed for cache preloading on a partition or a context switch. The RECAP exploits spatial locality to provide a bandwidth-efficient prefetcher to reduce the “cold” cache effect caused by multiprogrammed virtualization. The RECAP groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks that should be prefetched the next time the current virtual machine executes. Based on these predictions, and using a simple compression technique that also exploits spatial locality, the RECAP provides a robust prefetcher that improves performance without excessive bandwidth overhead or slowdown.

Type: Grant

Filed: March 14, 2019

Date of Patent: March 30, 2021

Assignee: International Business Machines Corporation

Inventors: Harold W. Cain, III, Vijayalakshmi Srinivasan, Jason Zebchuk
SYSTEM-AWARE SELECTIVE QUANTIZATION FOR PERFORMANCE OPTIMIZED DISTRIBUTED DEEP LEARNING

Publication number: 20210064954

Abstract: A convolutional neural network includes a front layer, a back layer, and a plurality of other layers that are connected between the front layer and the back layer. One of the other layers is a transition layer. A first precision is assigned to activations of neurons from the front layer back to the transition layer and a second precision is assigned to activations of the neurons from the transition layer back to the back layer. A third precision is assigned to weights of inputs to neurons from the front layer back to the transition layer and a fourth precision is assigned to weights of inputs to the neurons from the transition layer back to the back layer. In some embodiments the layers forward of the transition layer have a different convolutional kernel than the layers rearward of the transition layer.

Type: Application

Filed: August 27, 2019

Publication date: March 4, 2021

Inventors: Jungwook Choi, Swagath Venkataramani, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan
Predicting cache misses using data access behavior and instruction address

Patent number: 10936319

Abstract: In a decode stage of hardware processor pipeline, one particular instruction of a plurality of instructions is decoded. It is determined that the particular instruction requires a memory access. Responsive to such determination, it is predicted whether the memory access will result in a cache miss. The predicting in turn includes accessing one of a plurality of entries in a pattern history table stored as a hardware table in the decode stage. The accessing is based, at least in part, upon at least a most recent entry in a global history buffer. The pattern history table stores a plurality of predictions. The global history buffer stores actual results of previous memory accesses as one of cache hits and cache misses.

Type: Grant

Filed: June 16, 2018

Date of Patent: March 2, 2021

Assignee: International Business Machines Corporation

Inventors: Vijayalakshmi Srinivasan, Brian R. Prasky
REDUCED PRECISION BASED PROGRAMMABLE AND SIMD DATAFLOW ARCHITECTURE

Publication number: 20200401413

Abstract: Various embodiments are provided for using a reduced precision based programmable and single instruction multiple data (SIMD) dataflow architecture in a computing environment. One or more instructions between a plurality of execution units (EUs) operating in parallel within each one of a plurality of execution elements (EEs).

Type: Application

Filed: June 20, 2019

Publication date: December 24, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kailash GOPALAKRISHNAN, Sunil SHUKLA, Jungwook CHOI, Silvia MUELLER, Bruce FLEISCHER, Vijayalakshmi SRINIVASAN, Ankur AGRAWAL, Jinwook OH
Programmable data delivery by load and store agents on a processing chip interfacing with on-chip memory components and directing data to external memory components

Patent number: 10838868

Abstract: Embodiments for implementing a communicating memory between a plurality of computing components are provided. In one embodiment, an apparatus comprises a plurality of memory components residing on a processing chip, the plurality of memory components interconnected between a plurality of processing elements of at least one processing core of the processing chip and at least one external memory component external to the processing chip. The apparatus further comprises a plurality of load agents and a plurality of store agents on the processing chip, each interfacing with the plurality of memory components. Each of the plurality of load agents and the plurality of store agents execute an independent program specifying a destination of data transacted between the plurality of memory components, the at least one external memory component, and the plurality of processing elements.

Type: Grant

Filed: March 7, 2019

Date of Patent: November 17, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu Chen, Jungwook Choi, Brian Curran, Bruce Fleischer, Kailash Gopalakrishan, Jinwook Oh, Sunil K Shukla, Vijayalakshmi Srinivasan, Swagath Venkataramani
REUSING AN OPERAND IN AN INSTRUCTION SET ARCHITECTURE (ISA)

Publication number: 20200356371

Abstract: Various embodiments are provided reusing an operand in an instruction set architecture (ISA) by one or more processors in a computing system. An instruction may specify that an operand register for a selected operand retain operand data used by a previous instruction. The operand data in the operand register may be reused by the instruction.

Type: Application

Filed: May 8, 2019

Publication date: November 12, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bruce FLEISCHER, Sunil SHUKLA, Vijayalakshmi SRINIVASAN, Jungwook CHOI
DYNAMICALLY RESIZING MINIBATCH IN NEURAL NETWORK EXECUTION

Publication number: 20200311536

Abstract: A minibatch in a neural network execution may be dynamically resized based on on-chip memory. For example, a size of the minibatch is configured such that the minibatch fits within on-chip memory. The size of the minibatch may be resized for a sequence of layers in the neural network execution. A next layer's execution can commence responsive to the resized minibatch being completed in a previous layer without having to wait for all of the minibatch to be completed in the previous layer.

Type: Application

Filed: March 25, 2019

Publication date: October 1, 2020

Inventors: Swagath Venkataramani, Vijayalakshmi Srinivasan, Jungwook Choi
PROGRAMMABLE DATA DELIVERY TO A SYSTEM OF SHARED PROCESSING ELEMENTS WITH SHARED MEMORY

Publication number: 20200285579

Abstract: Embodiments for implementing a communicating memory between a plurality of computing components are provided. In one embodiment, an apparatus comprises a plurality of memory components residing on a processing chip, the plurality of memory components interconnected between a plurality of processing elements of at least one processing core of the processing chip and at least one external memory component external to the processing chip. The apparatus further comprises a plurality of load agents and a plurality of store agents on the processing chip, each interfacing with the plurality of memory components. Each of the plurality of load agents and the plurality of store agents execute an independent program specifying a destination of data transacted between the plurality of memory components, the at least one external memory component, and the plurality of processing elements.

Type: Application

Filed: March 7, 2019

Publication date: September 10, 2020

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu CHEN, Jungwook CHOI, Brian CURRAN, Bruce FLEISCHER, Kailash GOPALAKRISHAN, Jinwook OH, Sunil K. SHUKLA, Vijayalakshmi SRINIVASAN, Swagath VENKATARAMANI
Matrix multiplication on a systolic array

Patent number: 10769238

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Grant

Filed: September 19, 2019

Date of Patent: September 8, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
Method to Map Convolutional Layers of Deep Neural Network on a Plurality of Processing Elements with SIMD Execution Units, Private Memories, and Connected as a 2D Systolic Processor Array

Publication number: 20200134105

Abstract: A method for improving performance of a predefined Deep Neural Network (DNN) convolution processing on a computing device includes inputting parameters, as input data into a processor on a computer that formalizes a design space exploration of a convolution mapping, on a predefined computer architecture that will execute the predefined convolution processing. The parameters are predefined as guided by a specification for the predefined convolution processing to be implemented by the convolution mapping and by a microarchitectural specification for the processor that will execute the predefined convolution processing. The processor calculates performance metrics for executing the predefined convolution processing on the computing device, as functions of the predefined parameters, as proxy estimates of performance of different possible design choices to implement the predefined convolution processing.

Type: Application

Filed: October 31, 2018

Publication date: April 30, 2020

Inventors: Chia-Yu CHEN, Jungwook Choi, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan, Swagath Venkataramani, Jintao Zhang
Processor and memory transparent convolutional lowering and auto zero padding for deep neural network implementations

Patent number: 10565285

Abstract: A convolutional lowering component (CoLor component) between processor and memory units (or within a memory hierarchy) maps location in a lowered matrix to an equivalent location in a non-lowered matrix and provides auto zero padding in computational heavy convolutional layers. An identification component identifies processing components that execute computations in deep neural networks (DNNs) in which convolutions are realized as general matrix to matrix multiplications (GEMM) operations, and identifies a subset of the processing components that store deep neural network (DNN) features in a non-lowered form component that determines output for successively larger neural networks of a set. An address translation component translates address requests, generated by the subset of processing components to a memory subsystem, from a lowered index form to a non-lowered index form.

Type: Grant

Filed: December 18, 2017

Date of Patent: February 18, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jungwook Choi, Bruce Fleischer, Vijayalakshmi Srinivasan, Swagath Venkataramani
MATRIX MULTIPLICATION ON A SYSTOLIC ARRAY

Publication number: 20200012706

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Application

Filed: September 19, 2019

Publication date: January 9, 2020

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
Tightly coupled processor arrays using coarse grained reconfigurable architecture with iteration level commits

Patent number: 10528356

Abstract: An apparatus and method for supporting simultaneous multiple iterations (SMI) and iteration level commits (ILC) in a course grained reconfigurable architecture (CGRA). The apparatus includes: Hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight and which are yet to begin, and a control unit including hardware structures that are used to maintain synchronization and initiate and terminate loops within the PEs. The PEs, LSU and control unit are configured to commit instructions, and save and restore context at loop iteration boundaries. In doing so, the apparatus tracks and buffers state of in-flight iterations, and detects conditions that prevent an iteration from completing.

Type: Grant

Filed: November 4, 2015

Date of Patent: January 7, 2020

Assignee: International Business Machines Corporation

Inventors: Chia-yu Chen, Kailash Gopalakrishnan, Jinwook Oh, Lee M. Saltzman, Sunil K. Shukla, Vijayalakshmi Srinivasan
LOW PRECISION DEEP NEURAL NETWORK ENABLED BY COMPENSATION INSTRUCTIONS

Publication number: 20200005125

Abstract: A compensated deep neural network (compensated-DNN) is provided. A first vector having a set of components and a second vector having a set of corresponding components are received. A component of the first vector includes a first quantized value and a first compensation instruction, and a corresponding component of the second vector includes a second quantized value and a second compensation instruction. The first quantized value is multiplied with the second quantized value to compute a raw product value. The raw product value is compensated for a quantization error according to the first and second compensation instructions to produce a compensated product value. The compensated product value is added into an accumulated value for the dot product. The accumulated value is converted into an output vector of the dot product. The output vector includes an output quantized value and an output compensation instruction.

Type: Application

Filed: June 27, 2018

Publication date: January 2, 2020

Inventors: Swagath Venkataramani, Shubham Jain, Vijayalakshmi Srinivasan, Jungwook Choi, Leland Chang
Matrix multiplication on a systolic array

Patent number: 10489484

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Grant

Filed: April 11, 2019

Date of Patent: November 26, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang

prev 1 2 3 4 5 6 … next