Patents by Inventor Vijayalakshmi Srinivasan

Vijayalakshmi Srinivasan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method to Map Convolutional Layers of Deep Neural Network on a Plurality of Processing Elements with SIMD Execution Units, Private Memories, and Connected as a 2D Systolic Processor Array

Publication number: 20200134105

Abstract: A method for improving performance of a predefined Deep Neural Network (DNN) convolution processing on a computing device includes inputting parameters, as input data into a processor on a computer that formalizes a design space exploration of a convolution mapping, on a predefined computer architecture that will execute the predefined convolution processing. The parameters are predefined as guided by a specification for the predefined convolution processing to be implemented by the convolution mapping and by a microarchitectural specification for the processor that will execute the predefined convolution processing. The processor calculates performance metrics for executing the predefined convolution processing on the computing device, as functions of the predefined parameters, as proxy estimates of performance of different possible design choices to implement the predefined convolution processing.

Type: Application

Filed: October 31, 2018

Publication date: April 30, 2020

Inventors: Chia-Yu CHEN, Jungwook Choi, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan, Swagath Venkataramani, Jintao Zhang
Processor and memory transparent convolutional lowering and auto zero padding for deep neural network implementations

Patent number: 10565285

Abstract: A convolutional lowering component (CoLor component) between processor and memory units (or within a memory hierarchy) maps location in a lowered matrix to an equivalent location in a non-lowered matrix and provides auto zero padding in computational heavy convolutional layers. An identification component identifies processing components that execute computations in deep neural networks (DNNs) in which convolutions are realized as general matrix to matrix multiplications (GEMM) operations, and identifies a subset of the processing components that store deep neural network (DNN) features in a non-lowered form component that determines output for successively larger neural networks of a set. An address translation component translates address requests, generated by the subset of processing components to a memory subsystem, from a lowered index form to a non-lowered index form.

Type: Grant

Filed: December 18, 2017

Date of Patent: February 18, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jungwook Choi, Bruce Fleischer, Vijayalakshmi Srinivasan, Swagath Venkataramani
MATRIX MULTIPLICATION ON A SYSTOLIC ARRAY

Publication number: 20200012706

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Application

Filed: September 19, 2019

Publication date: January 9, 2020

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
Tightly coupled processor arrays using coarse grained reconfigurable architecture with iteration level commits

Patent number: 10528356

Abstract: An apparatus and method for supporting simultaneous multiple iterations (SMI) and iteration level commits (ILC) in a course grained reconfigurable architecture (CGRA). The apparatus includes: Hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight and which are yet to begin, and a control unit including hardware structures that are used to maintain synchronization and initiate and terminate loops within the PEs. The PEs, LSU and control unit are configured to commit instructions, and save and restore context at loop iteration boundaries. In doing so, the apparatus tracks and buffers state of in-flight iterations, and detects conditions that prevent an iteration from completing.

Type: Grant

Filed: November 4, 2015

Date of Patent: January 7, 2020

Assignee: International Business Machines Corporation

Inventors: Chia-yu Chen, Kailash Gopalakrishnan, Jinwook Oh, Lee M. Saltzman, Sunil K. Shukla, Vijayalakshmi Srinivasan
LOW PRECISION DEEP NEURAL NETWORK ENABLED BY COMPENSATION INSTRUCTIONS

Publication number: 20200005125

Abstract: A compensated deep neural network (compensated-DNN) is provided. A first vector having a set of components and a second vector having a set of corresponding components are received. A component of the first vector includes a first quantized value and a first compensation instruction, and a corresponding component of the second vector includes a second quantized value and a second compensation instruction. The first quantized value is multiplied with the second quantized value to compute a raw product value. The raw product value is compensated for a quantization error according to the first and second compensation instructions to produce a compensated product value. The compensated product value is added into an accumulated value for the dot product. The accumulated value is converted into an output vector of the dot product. The output vector includes an output quantized value and an output compensation instruction.

Type: Application

Filed: June 27, 2018

Publication date: January 2, 2020

Inventors: Swagath Venkataramani, Shubham Jain, Vijayalakshmi Srinivasan, Jungwook Choi, Leland Chang
Matrix multiplication on a systolic array

Patent number: 10489484

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Grant

Filed: April 11, 2019

Date of Patent: November 26, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
MATRIX MULTIPLICATION ON A SYSTOLIC ARRAY

Publication number: 20190236113

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Application

Filed: April 11, 2019

Publication date: August 1, 2019

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
METHODS OF CACHE PRELOADING ON A PARTITION OR A CONTEXT SWITCH

Publication number: 20190213132

Abstract: A scheme referred to as a “Region-based cache restoration prefetcher” (RECAP) is employed for cache preloading on a partition or a context switch. The RECAP exploits spatial locality to provide a bandwidth-efficient prefetcher to reduce the “cold” cache effect caused by multiprogrammed virtualization. The RECAP groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks that should be prefetched the next time the current virtual machine executes. Based on these predictions, and using a simple compression technique that also exploits spatial locality, the RECAP provides a robust prefetcher that improves performance without excessive bandwidth overhead or slowdown.

Type: Application

Filed: March 14, 2019

Publication date: July 11, 2019

Inventors: Harold W. CAIN, III, Vijayalakshmi SRINIVASAN, Jason ZEBCHUK
PROCESSOR AND MEMORY TRANSPARENT CONVOLUTIONAL LOWERING AND AUTO ZERO PADDING FOR DEEP NEURAL NETWORK IMPLEMENTATIONS

Publication number: 20190188240

Abstract: A convolutional lowering component (CoLor component) between processor and memory units (or within a memory hierarchy) maps location in a lowered matrix to an equivalent location in a non-lowered matrix and provides auto zero padding in computational heavy convolutional layers. An identification component identifies processing components that execute computations in deep neural networks (DNNs) in which convolutions are realized as general matrix to matrix multiplications (GEMM) operations, and identifies a subset of the processing components that store deep neural network (DNN) features in a non-lowered form component that determines output for successively larger neural networks of a set. An address translation component translates address requests, generated by the subset of processing components to a memory subsystem, from a lowered index form to a non-lowered index form.

Type: Application

Filed: December 18, 2017

Publication date: June 20, 2019

Inventors: Jungwook Choi, Bruce Fleischer, Vijayalakshmi Srinivasan, Swagath Venkataramani
Methods of cache preloading on a partition or a context switch

Patent number: 10268588

Abstract: A scheme referred to as a “Region-based cache restoration prefetcher” (RECAP) is employed for cache preloading on a partition or a context switch. The RECAP exploits spatial locality to provide a bandwidth-efficient prefetcher to reduce the “cold” cache effect caused by multiprogrammed virtualization. The RECAP groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks that should be prefetched the next time the current virtual machine executes. Based on these predictions, and using a simple compression technique that also exploits spatial locality, the RECAP provides a robust prefetcher that improves performance without excessive bandwidth overhead or slowdown.

Type: Grant

Filed: October 10, 2017

Date of Patent: April 23, 2019

Assignee: International Business Machines Corporation

Inventors: Harold W. Cain, III, Vijayalakshmi Srinivasan, Jason Zebchuk
Matrix multiplication on a systolic array

Patent number: 10261978

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Grant

Filed: December 14, 2017

Date of Patent: April 16, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
Matrix multiplication on a systolic array

Patent number: 10241972

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Grant

Filed: March 16, 2017

Date of Patent: March 26, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
DEEP NEURAL NETWORK PERFORANCE ANALYSIS ON SHARED MEMORY ACCELERATOR SYSTEMS

Publication number: 20190080232

Abstract: A Deep Neural Networks (DNN) analysis method, system, and computer program product include characterizing a space of possible configurations for a DNN, evaluating a metric-of-interest for a configuration of the possible configurations, and searching the space to identify a configuration of the possible configurations that maximizes the metric-of-interest.

Type: Application

Filed: September 8, 2017

Publication date: March 14, 2019

Inventors: Jungwook Choi, Vijayalakshmi Srinivasan, Swagath Venkataramani
Tightly coupled processor arrays using coarse grained reconfigurable architecture with iteration level commits

Patent number: 10120685

Abstract: An apparatus and method for supporting simultaneous multiple iterations (SMI) in a course grained reconfigurable architecture (CGRA). In support of SMI, the apparatus includes: Hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight and which are yet to begin, and a control unit including hardware structures that are used to maintain synchronization and initiate and terminate loops within the PEs. SMI permits execution of the next instruction within any iteration (in flight). If instructions from multiple iterations are ready for execution (and are pre-decoded), then the hardware selects the lowest iteration number ready for execution. If in a particular clock cycle, a loop iteration with a lower iteration number is stalled (i.e.

Type: Grant

Filed: November 4, 2015

Date of Patent: November 6, 2018

Assignee: International Business Machines Corporation

Inventors: Chia-yu Chen, Kailash Gopalakrishnan, Jinwook Oh, Sunil K. Shukla, Vijayalakshmi Srinivasan
PREDICTING CACHE MISSES USING DATA ACCESS BEHAVIOR AND INSTRUCTION ADDRESS

Publication number: 20180300141

Abstract: In a decode stage of hardware processor pipeline, one particular instruction of a plurality of instructions is decoded. It is determined that the particular instruction requires a memory access. Responsive to such determination, it is predicted whether the memory access will result in a cache miss. The predicting in turn includes accessing one of a plurality of entries in a pattern history table stored as a hardware table in the decode stage. The accessing is based, at least in part, upon at least a most recent entry in a global history buffer. The pattern history table stores a plurality of predictions. The global history buffer stores actual results of previous memory accesses as one of cache hits and cache misses.

Type: Application

Filed: June 16, 2018

Publication date: October 18, 2018

Inventors: Vijayalakshmi Srinivasan, Brian R. Prasky
MATRIX MULTIPLICATION ON A SYSTOLIC ARRAY

Publication number: 20180267938

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Application

Filed: December 14, 2017

Publication date: September 20, 2018

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
MATRIX MULTIPLICATION ON A SYSTOLIC ARRAY

Publication number: 20180267936

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Application

Filed: March 16, 2017

Publication date: September 20, 2018

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
Predicting cache misses using data access behavior and instruction address

Patent number: 10007523

Abstract: In a decode stage of hardware processor pipeline, one particular instruction of a plurality of instructions is decoded. It is determined that the particular instruction requires a memory access. Responsive to such determination, it is predicted whether the memory access will result in a cache miss. The predicting in turn includes accessing one of a plurality of entries in a pattern history table stored as a hardware table in the decode stage. The accessing is based, at least in part, upon at least a most recent entry in a global history buffer. The pattern history table stores a plurality of predictions. The global history buffer stores actual results of previous memory accesses as one of cache hits and cache misses.

Type: Grant

Filed: May 2, 2011

Date of Patent: June 26, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Vijayalakshmi Srinivasan, Brian R. Prasky
METHODS OF CACHE PRELOADING ON A PARTITION OR A CONTEXT SWITCH

Publication number: 20180032438

Abstract: A scheme referred to as a “Region-based cache restoration prefetcher” (RECAP) is employed for cache preloading on a partition or a context switch. The RECAP exploits spatial locality to provide a bandwidth-efficient prefetcher to reduce the “cold” cache effect caused by multiprogrammed virtualization. The RECAP groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks that should be prefetched the next time the current virtual machine executes. Based on these predictions, and using a simple compression technique that also exploits spatial locality, the RECAP provides a robust prefetcher that improves performance without excessive bandwidth overhead or slowdown.

Type: Application

Filed: October 10, 2017

Publication date: February 1, 2018

Inventors: Harold W. CAIN, III, Vijayalakshmi SRINIVASAN, Jason ZEBCHUK
Methods of cache preloading on a partition or a context switch

Patent number: 9804967

Abstract: A scheme referred to as a “Region-based cache restoration prefetcher” (RECAP) is employed for cache preloading on a partition or a context switch. The RECAP exploits spatial locality to provide a bandwidth-efficient prefetcher to reduce the “cold” cache effect caused by multiprogrammed virtualization. The RECAP groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks that should be prefetched the next time the current virtual machine executes. Based on these predictions, and using a simple compression technique that also exploits spatial locality, the RECAP provides a robust prefetcher that improves performance without excessive bandwidth overhead or slowdown.

Type: Grant

Filed: December 12, 2016

Date of Patent: October 31, 2017

Assignee: International Business Machines Corporation

Inventors: Harold W. Cain, III, Vijayalakshmi Srinivasan, Jason Zebchuk

prev 1 2 3 4 5 6 7 … next