Patents by Inventor Karthikeyan Sankaralingam

Karthikeyan Sankaralingam has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11853244
    Abstract: A reconfigurable hardware accelerator for computers combines a high-speed dataflow processor, having programmable functional units rapidly reconfigured in a network of programmable switches, with a stream processor that may autonomously access memory in predefined access patterns after receiving simple stream instructions. The result is a compact, high-speed processor that may exploit parallelism associated with many application-specific programs susceptible to acceleration.
    Type: Grant
    Filed: January 26, 2017
    Date of Patent: December 26, 2023
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki, Vinay Gangadhar
  • Patent number: 11513805
    Abstract: A computer architecture employs multiple special-purpose processors having different affinities for program execution to execute substantial portions of general-purpose programs to provide improved performance with respect to a general-purpose processor executing the general-purpose program alone.
    Type: Grant
    Filed: August 19, 2016
    Date of Patent: November 29, 2022
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki
  • Patent number: 11151077
    Abstract: A hardware accelerator for computers combines a stand-alone, high-speed, fixed program dataflow functional element with a stream processor, the latter of which may autonomously access memory in predefined access patterns after receiving simple stream instructions and provide them to the dataflow functional element. The result is a compact, high-speed processor that may exploit fixed program dataflow functional elements.
    Type: Grant
    Filed: June 28, 2017
    Date of Patent: October 19, 2021
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki, Vinay Gangadhar
  • Patent number: 11048661
    Abstract: A dataflow accelerator including a control/command core, a scratchpad and a coarse grain reconfigurable array (CGRA) according to an exemplary embodiment is disclosed. The scratchpad may include a write controller to transmit data to an input vector port interface and to receive data from the input vector port interface. The CGRA may receive data from the input vector port interface and includes a plurality of interconnects and a plurality of functional units.
    Type: Grant
    Filed: April 15, 2019
    Date of Patent: June 29, 2021
    Assignee: SIMPLE MACHINES INC.
    Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki, Vinay Gangadhar, Preyas Shah, Newsha Ardalani
  • Patent number: 11042797
    Abstract: According to exemplary embodiments, a method, processor, and system for accelerating a recurrent neural network are presented. A method of accelerating a recurrent neural network may include distributing from a first master core to each of a plurality of processing cores a same relative one or more columns of weight matrix data for each of a plurality of gates in the neural network, broadcasting a current input vector from the first master core to each of the processing cores, and processing each column of weight matrix data in parallel, at each of the respective processing cores.
    Type: Grant
    Filed: January 6, 2020
    Date of Patent: June 22, 2021
    Assignee: SIMPLEMACHINES INC.
    Inventors: Karthikeyan Sankaralingam, Yunfeng Li, Vinay Gangadhar, Anthony Nowatzki
  • Patent number: 10963384
    Abstract: A method for performing acceleration of simultaneous access to shared data may include providing a plurality of groups of cores and a plurality of shared memory structures, providing a pod comprising the plurality of groups of cores linked by a common broadcast channel, and coordinating each shared memory structure to provide a logically unified memory structure. Each memory structure may be associated with a group of cores, and each group of cores may include one or more cores. The common broadcast channel may be operatively coupled to each shared memory structure. The coordinating each shared memory structure may include identifying a simultaneous read-reuse load to a first shared memory structure, fetching data corresponding to the simultaneous read-reuse load, and forwarding the data to shared memory structures other than the first shared memory structure and to groups of cores other than a first group of cores via the broadcast channel.
    Type: Grant
    Filed: December 18, 2019
    Date of Patent: March 30, 2021
    Assignee: SimpleMachines Inc.
    Inventors: Karthikeyan Sankaralingam, Vinay Gangadhar, Anthony Nowatzki, Yunfeng Li
  • Patent number: 10936536
    Abstract: Aspects of the present invention provide a memory system comprising a plurality of stacked memory layers, each memory layer divided into memory sections, wherein each memory section connects to a neighboring memory section in an adjacent memory layer, and a logic layer stacked among the plurality of memory layers, the logic layer divided into logic sections, each logic section including a memory processing core, wherein each logic section connects to a neighboring memory section in an adjacent memory layer to form a memory vault of connected logic and memory sections, and wherein each logic section is configured to communicate directly or indirectly with a host processor. Accordingly, each memory processing core may be configured to respond to a procedure call from the host processor by processing data stored in its respective memory vault and providing a result to the host processor. As a result, increased performance may be provided.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: March 2, 2021
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Karthikeyan Sankaralingam, Jaikrishnan Menon, Lorenzo De Carli
  • Patent number: 10754744
    Abstract: The amount of speed-up that can be obtained by optimizing the program to run on a different architecture is determined by static measurements of the program. Multiple such static measurements are processed by a machine learning system after being discretized to alter their accuracy vs precision. Static analysis requires less analysis overhead and permits analysis of program portions to optimize allocation of porting resources on a large program.
    Type: Grant
    Filed: March 15, 2016
    Date of Patent: August 25, 2020
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Karthikeyan Sankaralingam, Newsha Ardalani, Urmish Thakker
  • Publication number: 20200218965
    Abstract: According to exemplary embodiments, a method, processor, and system for accelerating a recurrent neural network are presented. A method of accelerating a recurrent neural network may include distributing from a first master core to each of a plurality of processing cores a same relative one or more columns of weight matrix data for each of a plurality of gates in the neural network, broadcasting a current input vector from the first master core to each of the processing cores, and processing each column of weight matrix data in parallel, at each of the respective processing cores.
    Type: Application
    Filed: January 6, 2020
    Publication date: July 9, 2020
    Applicant: SimpleMachines Inc.
    Inventors: Karthikeyan Sankaralingam, Yunfeng Li, Vinay Gangadhar, Anthony Nowatzki
  • Publication number: 20200201690
    Abstract: A method for performing acceleration of simultaneous access to shared data may include providing a plurality of groups of cores and a plurality of shared memory structures, providing a pod comprising the plurality of groups of cores linked by a common broadcast channel, and coordinating each shared memory structure to provide a logically unified memory structure. Each memory structure may be associated with a group of cores, and each group of cores may include one or more cores. The common broadcast channel may be operatively coupled to each shared memory structure. The coordinating each shared memory structure may include identifying a simultaneous read-reuse load to a first shared memory structure, fetching data corresponding to the simultaneous read-reuse load, and forwarding the data to shared memory structures other than the first shared memory structure and to groups of cores other than a first group of cores via the broadcast channel.
    Type: Application
    Filed: December 18, 2019
    Publication date: June 25, 2020
    Applicant: SimpleMachines Inc.
    Inventors: Karthikeyan Sankaralingam, Vinay Gangadhar, Anthony Nowatzki, Yunfeng Li
  • Patent number: 10591983
    Abstract: A specialized memory access processor is placed between a main processor and accelerator hardware to handle memory access for the accelerator hardware. The architecture of the memory access processor is designed to allow lower energy memory accesses than can be obtained by the main processor in providing data to the hardware accelerator while providing the hardware accelerator with a sufficiently high bandwidth memory channel. In some embodiments, the main processor may enter a sleep state during accelerator calculations to substantially lower energy consumption.
    Type: Grant
    Filed: March 14, 2014
    Date of Patent: March 17, 2020
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Chen-Han Ho, Karthikeyan Sankaralingam, Sung Kim
  • Publication number: 20190317770
    Abstract: According to some embodiments, a dataflow accelerator comprises a control/command core, a scratchpad and a coarse grain reconfigurable array (CGRA). The scratchpad comprises a write controller to transmit data to an input vector port interface and to receive data from the input vector port interface. The CGRA receives data from the input vector port interface where the CGRA comprising a plurality of interconnects and a plurality of functional units.
    Type: Application
    Filed: April 15, 2019
    Publication date: October 17, 2019
    Applicant: SimpleMachines Inc.
    Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki, Vinay Gangadhar, Preyas Shah, Newsha Ardalani
  • Publication number: 20190258601
    Abstract: Aspects of the present invention provide a memory system comprising a plurality of stacked memory layers, each memory layer divided into memory sections, wherein each memory section connects to a neighboring memory section in an adjacent memory layer, and a logic layer stacked among the plurality of memory layers, the logic layer divided into logic sections, each logic section including a memory processing core, wherein each logic section connects to a neighboring memory section in an adjacent memory layer to form a memory vault of connected logic and memory sections, and wherein each logic section is configured to communicate directly or indirectly with a host processor. Accordingly, each memory processing core may be configured to respond to a procedure call from the host processor by processing data stored in its respective memory vault and providing a result to the host processor. As a result, increased performance may be provided.
    Type: Application
    Filed: April 30, 2019
    Publication date: August 22, 2019
    Inventors: Karthikeyan Sankaralingam, Jaikrishnan Menon, Lorenzo De Carli
  • Patent number: 10289604
    Abstract: Aspects of the present invention provide a memory system comprising a plurality of stacked memory layers, each memory layer divided into memory sections, wherein each memory section connects to a neighboring memory section in an adjacent memory layer, and a logic layer stacked among the plurality of memory layers, the logic layer divided into logic sections, each logic section including a memory processing core, wherein each logic section connects to a neighboring memory section in an adjacent memory layer to form a memory vault of connected logic and memory sections, and wherein each logic section is configured to communicate directly or indirectly with a host processor. Accordingly, each memory processing core may be configured to respond to a procedure call from the host processor by processing data stored in its respective memory vault and providing a result to the host processor. As a result, increased performance may be provided.
    Type: Grant
    Filed: August 7, 2014
    Date of Patent: May 14, 2019
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Karthikeyan Sankaralingam, Jaikrishnan Menon, Lorenzo De Carli
  • Patent number: 10216693
    Abstract: A dataflow computer processor is teamed with a general computer processor so that program portions of an application program particularly suited to dataflow execution may be transferred to the dataflow processor during portions of the execution of the application program by the general computer processor. During this time the general computer processor may be placed in partial shutdown for energy conservation.
    Type: Grant
    Filed: July 30, 2015
    Date of Patent: February 26, 2019
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Anthony Nowatzki, Vinay Gangadhar, Karthikeyan Sankaralingam
  • Publication number: 20190004995
    Abstract: A hardware accelerator for computers combines a stand-alone, high-speed, fixed program dataflow functional element with a stream processor, the latter of which may autonomously access memory in predefined access patterns after receiving simple stream instructions and provide them to the dataflow functional element. The result is a compact, high-speed processor that may exploit fixed program dataflow functional elements.
    Type: Application
    Filed: June 28, 2017
    Publication date: January 3, 2019
    Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki, Vinay Gangadhar
  • Publication number: 20180210730
    Abstract: A reconfigurable hardware accelerator for computers combines a high-speed dataflow processor, having programmable functional units rapidly reconfigured in a network of programmable switches, with a stream processor that may autonomously access memory in predefined access patterns after receiving simple stream instructions. The result is a compact, high-speed processor that may exploit parallelism associated with many application-specific programs susceptible to acceleration.
    Type: Application
    Filed: January 26, 2017
    Publication date: July 26, 2018
    Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki, Vinay Gangadhar
  • Publication number: 20180052693
    Abstract: A computer architecture employs multiple special-purpose processors having different affinities for program execution to execute substantial portions of general-purpose programs to provide improved performance with respect to a general-purpose processor executing the general-purpose program alone.
    Type: Application
    Filed: August 19, 2016
    Publication date: February 22, 2018
    Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki
  • Publication number: 20170270424
    Abstract: The amount of speed-up that can be obtained by optimizing the program to run on a different architecture is determined by static measurements of the program. Multiple such static measurements are processed by a machine learning system after being discretized to alter their accuracy vs precision. Static analysis requires less analysis overhead and permits analysis of program portions to optimize allocation of porting resources on a large program.
    Type: Application
    Filed: March 15, 2016
    Publication date: September 21, 2017
    Inventors: Karthikeyan Sankaralingam, Newsha Ardalani, Urmish Thakker
  • Patent number: 9619233
    Abstract: A computer architecture allows for simplified exception handling by restarting the program after exceptions at the beginning of idempotent regions, the idempotent regions allowing re-execution without the need for restoring complex state information from checkpoints. Recovery from mis-speculation may be provided by a similar mechanism but using smaller idempotent regions reflecting a more frequent occurrence of mis-speculation. A compiler generating different idempotent regions for speculation and exception handling is also disclosed.
    Type: Grant
    Filed: February 19, 2016
    Date of Patent: April 11, 2017
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Jaikrishnan Menon, Marc Asher De Kruijf, Karthikeyan Sankaralingam