Patents by Inventor Vijayalakshmi Srinivasan

Vijayalakshmi Srinivasan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MATRIX MULTIPLICATION ON A SYSTOLIC ARRAY

Publication number: 20190236113

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Application

Filed: April 11, 2019

Publication date: August 1, 2019

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
METHODS OF CACHE PRELOADING ON A PARTITION OR A CONTEXT SWITCH

Publication number: 20190213132

Abstract: A scheme referred to as a “Region-based cache restoration prefetcher” (RECAP) is employed for cache preloading on a partition or a context switch. The RECAP exploits spatial locality to provide a bandwidth-efficient prefetcher to reduce the “cold” cache effect caused by multiprogrammed virtualization. The RECAP groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks that should be prefetched the next time the current virtual machine executes. Based on these predictions, and using a simple compression technique that also exploits spatial locality, the RECAP provides a robust prefetcher that improves performance without excessive bandwidth overhead or slowdown.

Type: Application

Filed: March 14, 2019

Publication date: July 11, 2019

Inventors: Harold W. CAIN, III, Vijayalakshmi SRINIVASAN, Jason ZEBCHUK
PROCESSOR AND MEMORY TRANSPARENT CONVOLUTIONAL LOWERING AND AUTO ZERO PADDING FOR DEEP NEURAL NETWORK IMPLEMENTATIONS

Publication number: 20190188240

Abstract: A convolutional lowering component (CoLor component) between processor and memory units (or within a memory hierarchy) maps location in a lowered matrix to an equivalent location in a non-lowered matrix and provides auto zero padding in computational heavy convolutional layers. An identification component identifies processing components that execute computations in deep neural networks (DNNs) in which convolutions are realized as general matrix to matrix multiplications (GEMM) operations, and identifies a subset of the processing components that store deep neural network (DNN) features in a non-lowered form component that determines output for successively larger neural networks of a set. An address translation component translates address requests, generated by the subset of processing components to a memory subsystem, from a lowered index form to a non-lowered index form.

Type: Application

Filed: December 18, 2017

Publication date: June 20, 2019

Inventors: Jungwook Choi, Bruce Fleischer, Vijayalakshmi Srinivasan, Swagath Venkataramani
Methods of cache preloading on a partition or a context switch

Patent number: 10268588

Abstract: A scheme referred to as a “Region-based cache restoration prefetcher” (RECAP) is employed for cache preloading on a partition or a context switch. The RECAP exploits spatial locality to provide a bandwidth-efficient prefetcher to reduce the “cold” cache effect caused by multiprogrammed virtualization. The RECAP groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks that should be prefetched the next time the current virtual machine executes. Based on these predictions, and using a simple compression technique that also exploits spatial locality, the RECAP provides a robust prefetcher that improves performance without excessive bandwidth overhead or slowdown.

Type: Grant

Filed: October 10, 2017

Date of Patent: April 23, 2019

Assignee: International Business Machines Corporation

Inventors: Harold W. Cain, III, Vijayalakshmi Srinivasan, Jason Zebchuk
Matrix multiplication on a systolic array

Patent number: 10261978

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Grant

Filed: December 14, 2017

Date of Patent: April 16, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
Matrix multiplication on a systolic array

Patent number: 10241972

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Grant

Filed: March 16, 2017

Date of Patent: March 26, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
DEEP NEURAL NETWORK PERFORANCE ANALYSIS ON SHARED MEMORY ACCELERATOR SYSTEMS

Publication number: 20190080232

Abstract: A Deep Neural Networks (DNN) analysis method, system, and computer program product include characterizing a space of possible configurations for a DNN, evaluating a metric-of-interest for a configuration of the possible configurations, and searching the space to identify a configuration of the possible configurations that maximizes the metric-of-interest.

Type: Application

Filed: September 8, 2017

Publication date: March 14, 2019

Inventors: Jungwook Choi, Vijayalakshmi Srinivasan, Swagath Venkataramani
Tightly coupled processor arrays using coarse grained reconfigurable architecture with iteration level commits

Patent number: 10120685

Abstract: An apparatus and method for supporting simultaneous multiple iterations (SMI) in a course grained reconfigurable architecture (CGRA). In support of SMI, the apparatus includes: Hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight and which are yet to begin, and a control unit including hardware structures that are used to maintain synchronization and initiate and terminate loops within the PEs. SMI permits execution of the next instruction within any iteration (in flight). If instructions from multiple iterations are ready for execution (and are pre-decoded), then the hardware selects the lowest iteration number ready for execution. If in a particular clock cycle, a loop iteration with a lower iteration number is stalled (i.e.

Type: Grant

Filed: November 4, 2015

Date of Patent: November 6, 2018

Assignee: International Business Machines Corporation

Inventors: Chia-yu Chen, Kailash Gopalakrishnan, Jinwook Oh, Sunil K. Shukla, Vijayalakshmi Srinivasan
PREDICTING CACHE MISSES USING DATA ACCESS BEHAVIOR AND INSTRUCTION ADDRESS

Publication number: 20180300141

Abstract: In a decode stage of hardware processor pipeline, one particular instruction of a plurality of instructions is decoded. It is determined that the particular instruction requires a memory access. Responsive to such determination, it is predicted whether the memory access will result in a cache miss. The predicting in turn includes accessing one of a plurality of entries in a pattern history table stored as a hardware table in the decode stage. The accessing is based, at least in part, upon at least a most recent entry in a global history buffer. The pattern history table stores a plurality of predictions. The global history buffer stores actual results of previous memory accesses as one of cache hits and cache misses.

Type: Application

Filed: June 16, 2018

Publication date: October 18, 2018

Inventors: Vijayalakshmi Srinivasan, Brian R. Prasky
MATRIX MULTIPLICATION ON A SYSTOLIC ARRAY

Publication number: 20180267936

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Application

Filed: March 16, 2017

Publication date: September 20, 2018

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
MATRIX MULTIPLICATION ON A SYSTOLIC ARRAY

Publication number: 20180267938

Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.

Type: Application

Filed: December 14, 2017

Publication date: September 20, 2018

Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
Predicting cache misses using data access behavior and instruction address

Patent number: 10007523

Abstract: In a decode stage of hardware processor pipeline, one particular instruction of a plurality of instructions is decoded. It is determined that the particular instruction requires a memory access. Responsive to such determination, it is predicted whether the memory access will result in a cache miss. The predicting in turn includes accessing one of a plurality of entries in a pattern history table stored as a hardware table in the decode stage. The accessing is based, at least in part, upon at least a most recent entry in a global history buffer. The pattern history table stores a plurality of predictions. The global history buffer stores actual results of previous memory accesses as one of cache hits and cache misses.

Type: Grant

Filed: May 2, 2011

Date of Patent: June 26, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Vijayalakshmi Srinivasan, Brian R. Prasky
METHODS OF CACHE PRELOADING ON A PARTITION OR A CONTEXT SWITCH

Publication number: 20180032438

Abstract: A scheme referred to as a “Region-based cache restoration prefetcher” (RECAP) is employed for cache preloading on a partition or a context switch. The RECAP exploits spatial locality to provide a bandwidth-efficient prefetcher to reduce the “cold” cache effect caused by multiprogrammed virtualization. The RECAP groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks that should be prefetched the next time the current virtual machine executes. Based on these predictions, and using a simple compression technique that also exploits spatial locality, the RECAP provides a robust prefetcher that improves performance without excessive bandwidth overhead or slowdown.

Type: Application

Filed: October 10, 2017

Publication date: February 1, 2018

Inventors: Harold W. CAIN, III, Vijayalakshmi SRINIVASAN, Jason ZEBCHUK
Methods of cache preloading on a partition or a context switch

Patent number: 9804967

Abstract: A scheme referred to as a “Region-based cache restoration prefetcher” (RECAP) is employed for cache preloading on a partition or a context switch. The RECAP exploits spatial locality to provide a bandwidth-efficient prefetcher to reduce the “cold” cache effect caused by multiprogrammed virtualization. The RECAP groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks that should be prefetched the next time the current virtual machine executes. Based on these predictions, and using a simple compression technique that also exploits spatial locality, the RECAP provides a robust prefetcher that improves performance without excessive bandwidth overhead or slowdown.

Type: Grant

Filed: December 12, 2016

Date of Patent: October 31, 2017

Assignee: International Business Machines Corporation

Inventors: Harold W. Cain, III, Vijayalakshmi Srinivasan, Jason Zebchuk
Thread-based cache content saving for task switching

Patent number: 9766937

Abstract: Embodiments relate to thread-based cache content savings for task switching in a computer processor. An aspect includes determining a cache entry in a cache of the computer processor that is owned by the first thread, wherein the determination is made based on a hardware thread identifier (ID) of the first thread matching a hardware thread ID in the cache entry. Another aspect includes determining whether the determined cache entry is eligible for prefetching. Yet another aspect includes, based on determining that the determined cache entry is eligible for prefetching, setting a marker in the cache entry to active.

Type: Grant

Filed: August 26, 2016

Date of Patent: September 19, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Harold W. Cain, III, David M. Daly, Brian R. Prasky, Vijayalakshmi Srinivasan
Private memory table for reduced memory coherence traffic

Patent number: 9760489

Abstract: A mechanism is provided for memory coherence in a multiple processor system. Responsive to a memory operation from a processing core of the multiple processor system resulting in a cache miss, the mechanism checks a private region table associated with the processing core. The memory operation attempts to access a memory region. Responsive to determining the memory region corresponds to an entry in the private region table, the mechanism performs a remote memory controller snoop of a remote memory controller without snooping the multiple processor system.

Type: Grant

Filed: July 15, 2016

Date of Patent: September 12, 2017

Assignee: International Business Machines Corporation

Inventors: David M. Daly, Vijayalakshmi Srinivasan
Private memory table for reduced memory coherence traffic

Patent number: 9760490

Abstract: A mechanism is provided for memory coherence in a multiple processor system. Responsive to a memory operation from a processing core of the multiple processor system resulting in a cache miss, the mechanism checks a private region table associated with the processing core. The memory operation attempts to access a memory region. Responsive to determining the memory region corresponds to an entry in the private region table, the mechanism performs a remote memory controller snoop of a remote memory controller without snooping the multiple processor system.

Type: Grant

Filed: July 15, 2016

Date of Patent: September 12, 2017

Assignee: International Business Machines Corporation

Inventors: David M. Daly, Vijayalakshmi Srinivasan
Processor with memory-embedded pipeline for table-driven computation

Patent number: 9740497

Abstract: A processor and a method implemented by the processor to obtain computation results are described. The processor includes a unified reuse table embedded in a processor pipeline, the unified reuse table including a plurality of entries, each entry of the plurality of entries corresponding with a computation instruction or a set of computation instructions. The processor also includes a functional unit to perform a computation based on a corresponding instruction.

Type: Grant

Filed: October 15, 2013

Date of Patent: August 22, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Pradip Bose, Alper Buyuktosunoglu, Xiaochen Guo, Hillery C. Hunter, Jude A. Rivers, Vijayalakshmi Srinivasan
Processor with memory-embedded pipeline for table-driven computation

Patent number: 9740496

Abstract: A processor and a method implemented by the processor to obtain computation results are described. The processor includes a unified reuse table embedded in a processor pipeline, the unified reuse table including a plurality of entries, each entry of the plurality of entries corresponding with a computation instruction or a set of computation instructions. The processor also includes a functional unit to perform a computation based on a corresponding instruction.

Type: Grant

Filed: September 6, 2013

Date of Patent: August 22, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Pradip Bose, Alper Buyuktosunoglu, Xiaochen Guo, Hillery C. Hunter, Jude A. Rivers, Vijayalakshmi Srinivasan
DYNAMIC TUNING OF A SIMULTANEOUS MULTITHREADING METERING ARCHITECTURE

Publication number: 20170212824

Abstract: The disclosed herein relates to a method of dynamic simultaneous multithreading metering for a plurality of independent threads being multithreaded. The method is executable by a processor. The method includes collecting attributes from processor and building a model utilizing the attributes. The method also includes performing the dynamic simultaneous multithreading metering in accordance with the model to output metering estimates for a first thread of the plurality of independent threads being multithreaded and updating the model based on the metering estimates.

Type: Application

Filed: January 21, 2016

Publication date: July 27, 2017

Inventors: EMRAH ACAR, JANE H. BARTIK, ALPER BUYUKTOSUNOGLU, BRIAN R. PRASKY, VIJAYALAKSHMI SRINIVASAN, JOHN-DAVID WELLMAN

prev 1 2 3 4 5 6 7 … next