Patents by Inventor Vijayalakshmi Srinivasan

Vijayalakshmi Srinivasan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20190236113
    Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.
    Type: Application
    Filed: April 11, 2019
    Publication date: August 1, 2019
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
  • Publication number: 20190213132
    Abstract: A scheme referred to as a “Region-based cache restoration prefetcher” (RECAP) is employed for cache preloading on a partition or a context switch. The RECAP exploits spatial locality to provide a bandwidth-efficient prefetcher to reduce the “cold” cache effect caused by multiprogrammed virtualization. The RECAP groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks that should be prefetched the next time the current virtual machine executes. Based on these predictions, and using a simple compression technique that also exploits spatial locality, the RECAP provides a robust prefetcher that improves performance without excessive bandwidth overhead or slowdown.
    Type: Application
    Filed: March 14, 2019
    Publication date: July 11, 2019
    Inventors: Harold W. CAIN, III, Vijayalakshmi SRINIVASAN, Jason ZEBCHUK
  • Publication number: 20190188240
    Abstract: A convolutional lowering component (CoLor component) between processor and memory units (or within a memory hierarchy) maps location in a lowered matrix to an equivalent location in a non-lowered matrix and provides auto zero padding in computational heavy convolutional layers. An identification component identifies processing components that execute computations in deep neural networks (DNNs) in which convolutions are realized as general matrix to matrix multiplications (GEMM) operations, and identifies a subset of the processing components that store deep neural network (DNN) features in a non-lowered form component that determines output for successively larger neural networks of a set. An address translation component translates address requests, generated by the subset of processing components to a memory subsystem, from a lowered index form to a non-lowered index form.
    Type: Application
    Filed: December 18, 2017
    Publication date: June 20, 2019
    Inventors: Jungwook Choi, Bruce Fleischer, Vijayalakshmi Srinivasan, Swagath Venkataramani
  • Patent number: 10268588
    Abstract: A scheme referred to as a “Region-based cache restoration prefetcher” (RECAP) is employed for cache preloading on a partition or a context switch. The RECAP exploits spatial locality to provide a bandwidth-efficient prefetcher to reduce the “cold” cache effect caused by multiprogrammed virtualization. The RECAP groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks that should be prefetched the next time the current virtual machine executes. Based on these predictions, and using a simple compression technique that also exploits spatial locality, the RECAP provides a robust prefetcher that improves performance without excessive bandwidth overhead or slowdown.
    Type: Grant
    Filed: October 10, 2017
    Date of Patent: April 23, 2019
    Assignee: International Business Machines Corporation
    Inventors: Harold W. Cain, III, Vijayalakshmi Srinivasan, Jason Zebchuk
  • Patent number: 10261978
    Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.
    Type: Grant
    Filed: December 14, 2017
    Date of Patent: April 16, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
  • Patent number: 10241972
    Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.
    Type: Grant
    Filed: March 16, 2017
    Date of Patent: March 26, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
  • Publication number: 20190080232
    Abstract: A Deep Neural Networks (DNN) analysis method, system, and computer program product include characterizing a space of possible configurations for a DNN, evaluating a metric-of-interest for a configuration of the possible configurations, and searching the space to identify a configuration of the possible configurations that maximizes the metric-of-interest.
    Type: Application
    Filed: September 8, 2017
    Publication date: March 14, 2019
    Inventors: Jungwook Choi, Vijayalakshmi Srinivasan, Swagath Venkataramani
  • Patent number: 10120685
    Abstract: An apparatus and method for supporting simultaneous multiple iterations (SMI) in a course grained reconfigurable architecture (CGRA). In support of SMI, the apparatus includes: Hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight and which are yet to begin, and a control unit including hardware structures that are used to maintain synchronization and initiate and terminate loops within the PEs. SMI permits execution of the next instruction within any iteration (in flight). If instructions from multiple iterations are ready for execution (and are pre-decoded), then the hardware selects the lowest iteration number ready for execution. If in a particular clock cycle, a loop iteration with a lower iteration number is stalled (i.e.
    Type: Grant
    Filed: November 4, 2015
    Date of Patent: November 6, 2018
    Assignee: International Business Machines Corporation
    Inventors: Chia-yu Chen, Kailash Gopalakrishnan, Jinwook Oh, Sunil K. Shukla, Vijayalakshmi Srinivasan
  • Publication number: 20180300141
    Abstract: In a decode stage of hardware processor pipeline, one particular instruction of a plurality of instructions is decoded. It is determined that the particular instruction requires a memory access. Responsive to such determination, it is predicted whether the memory access will result in a cache miss. The predicting in turn includes accessing one of a plurality of entries in a pattern history table stored as a hardware table in the decode stage. The accessing is based, at least in part, upon at least a most recent entry in a global history buffer. The pattern history table stores a plurality of predictions. The global history buffer stores actual results of previous memory accesses as one of cache hits and cache misses.
    Type: Application
    Filed: June 16, 2018
    Publication date: October 18, 2018
    Inventors: Vijayalakshmi Srinivasan, Brian R. Prasky
  • Publication number: 20180267936
    Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.
    Type: Application
    Filed: March 16, 2017
    Publication date: September 20, 2018
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
  • Publication number: 20180267938
    Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.
    Type: Application
    Filed: December 14, 2017
    Publication date: September 20, 2018
    Inventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
  • Patent number: 10007523
    Abstract: In a decode stage of hardware processor pipeline, one particular instruction of a plurality of instructions is decoded. It is determined that the particular instruction requires a memory access. Responsive to such determination, it is predicted whether the memory access will result in a cache miss. The predicting in turn includes accessing one of a plurality of entries in a pattern history table stored as a hardware table in the decode stage. The accessing is based, at least in part, upon at least a most recent entry in a global history buffer. The pattern history table stores a plurality of predictions. The global history buffer stores actual results of previous memory accesses as one of cache hits and cache misses.
    Type: Grant
    Filed: May 2, 2011
    Date of Patent: June 26, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Vijayalakshmi Srinivasan, Brian R. Prasky
  • Publication number: 20180032438
    Abstract: A scheme referred to as a “Region-based cache restoration prefetcher” (RECAP) is employed for cache preloading on a partition or a context switch. The RECAP exploits spatial locality to provide a bandwidth-efficient prefetcher to reduce the “cold” cache effect caused by multiprogrammed virtualization. The RECAP groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks that should be prefetched the next time the current virtual machine executes. Based on these predictions, and using a simple compression technique that also exploits spatial locality, the RECAP provides a robust prefetcher that improves performance without excessive bandwidth overhead or slowdown.
    Type: Application
    Filed: October 10, 2017
    Publication date: February 1, 2018
    Inventors: Harold W. CAIN, III, Vijayalakshmi SRINIVASAN, Jason ZEBCHUK
  • Patent number: 9804967
    Abstract: A scheme referred to as a “Region-based cache restoration prefetcher” (RECAP) is employed for cache preloading on a partition or a context switch. The RECAP exploits spatial locality to provide a bandwidth-efficient prefetcher to reduce the “cold” cache effect caused by multiprogrammed virtualization. The RECAP groups cache blocks into coarse-grain regions of memory, and predicts which regions contain useful blocks that should be prefetched the next time the current virtual machine executes. Based on these predictions, and using a simple compression technique that also exploits spatial locality, the RECAP provides a robust prefetcher that improves performance without excessive bandwidth overhead or slowdown.
    Type: Grant
    Filed: December 12, 2016
    Date of Patent: October 31, 2017
    Assignee: International Business Machines Corporation
    Inventors: Harold W. Cain, III, Vijayalakshmi Srinivasan, Jason Zebchuk
  • Patent number: 9766937
    Abstract: Embodiments relate to thread-based cache content savings for task switching in a computer processor. An aspect includes determining a cache entry in a cache of the computer processor that is owned by the first thread, wherein the determination is made based on a hardware thread identifier (ID) of the first thread matching a hardware thread ID in the cache entry. Another aspect includes determining whether the determined cache entry is eligible for prefetching. Yet another aspect includes, based on determining that the determined cache entry is eligible for prefetching, setting a marker in the cache entry to active.
    Type: Grant
    Filed: August 26, 2016
    Date of Patent: September 19, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Harold W. Cain, III, David M. Daly, Brian R. Prasky, Vijayalakshmi Srinivasan
  • Patent number: 9760489
    Abstract: A mechanism is provided for memory coherence in a multiple processor system. Responsive to a memory operation from a processing core of the multiple processor system resulting in a cache miss, the mechanism checks a private region table associated with the processing core. The memory operation attempts to access a memory region. Responsive to determining the memory region corresponds to an entry in the private region table, the mechanism performs a remote memory controller snoop of a remote memory controller without snooping the multiple processor system.
    Type: Grant
    Filed: July 15, 2016
    Date of Patent: September 12, 2017
    Assignee: International Business Machines Corporation
    Inventors: David M. Daly, Vijayalakshmi Srinivasan
  • Patent number: 9760490
    Abstract: A mechanism is provided for memory coherence in a multiple processor system. Responsive to a memory operation from a processing core of the multiple processor system resulting in a cache miss, the mechanism checks a private region table associated with the processing core. The memory operation attempts to access a memory region. Responsive to determining the memory region corresponds to an entry in the private region table, the mechanism performs a remote memory controller snoop of a remote memory controller without snooping the multiple processor system.
    Type: Grant
    Filed: July 15, 2016
    Date of Patent: September 12, 2017
    Assignee: International Business Machines Corporation
    Inventors: David M. Daly, Vijayalakshmi Srinivasan
  • Patent number: 9740497
    Abstract: A processor and a method implemented by the processor to obtain computation results are described. The processor includes a unified reuse table embedded in a processor pipeline, the unified reuse table including a plurality of entries, each entry of the plurality of entries corresponding with a computation instruction or a set of computation instructions. The processor also includes a functional unit to perform a computation based on a corresponding instruction.
    Type: Grant
    Filed: October 15, 2013
    Date of Patent: August 22, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Pradip Bose, Alper Buyuktosunoglu, Xiaochen Guo, Hillery C. Hunter, Jude A. Rivers, Vijayalakshmi Srinivasan
  • Patent number: 9740496
    Abstract: A processor and a method implemented by the processor to obtain computation results are described. The processor includes a unified reuse table embedded in a processor pipeline, the unified reuse table including a plurality of entries, each entry of the plurality of entries corresponding with a computation instruction or a set of computation instructions. The processor also includes a functional unit to perform a computation based on a corresponding instruction.
    Type: Grant
    Filed: September 6, 2013
    Date of Patent: August 22, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Pradip Bose, Alper Buyuktosunoglu, Xiaochen Guo, Hillery C. Hunter, Jude A. Rivers, Vijayalakshmi Srinivasan
  • Publication number: 20170212824
    Abstract: The disclosed herein relates to a method of dynamic simultaneous multithreading metering for a plurality of independent threads being multithreaded. The method is executable by a processor. The method includes collecting attributes from processor and building a model utilizing the attributes. The method also includes performing the dynamic simultaneous multithreading metering in accordance with the model to output metering estimates for a first thread of the plurality of independent threads being multithreaded and updating the model based on the metering estimates.
    Type: Application
    Filed: January 21, 2016
    Publication date: July 27, 2017
    Inventors: EMRAH ACAR, JANE H. BARTIK, ALPER BUYUKTOSUNOGLU, BRIAN R. PRASKY, VIJAYALAKSHMI SRINIVASAN, JOHN-DAVID WELLMAN