Patents by Inventor Karthikeyan Sankaralingam
Karthikeyan Sankaralingam has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11853244Abstract: A reconfigurable hardware accelerator for computers combines a high-speed dataflow processor, having programmable functional units rapidly reconfigured in a network of programmable switches, with a stream processor that may autonomously access memory in predefined access patterns after receiving simple stream instructions. The result is a compact, high-speed processor that may exploit parallelism associated with many application-specific programs susceptible to acceleration.Type: GrantFiled: January 26, 2017Date of Patent: December 26, 2023Assignee: Wisconsin Alumni Research FoundationInventors: Karthikeyan Sankaralingam, Anthony Nowatzki, Vinay Gangadhar
-
Patent number: 11513805Abstract: A computer architecture employs multiple special-purpose processors having different affinities for program execution to execute substantial portions of general-purpose programs to provide improved performance with respect to a general-purpose processor executing the general-purpose program alone.Type: GrantFiled: August 19, 2016Date of Patent: November 29, 2022Assignee: Wisconsin Alumni Research FoundationInventors: Karthikeyan Sankaralingam, Anthony Nowatzki
-
Patent number: 11151077Abstract: A hardware accelerator for computers combines a stand-alone, high-speed, fixed program dataflow functional element with a stream processor, the latter of which may autonomously access memory in predefined access patterns after receiving simple stream instructions and provide them to the dataflow functional element. The result is a compact, high-speed processor that may exploit fixed program dataflow functional elements.Type: GrantFiled: June 28, 2017Date of Patent: October 19, 2021Assignee: Wisconsin Alumni Research FoundationInventors: Karthikeyan Sankaralingam, Anthony Nowatzki, Vinay Gangadhar
-
Patent number: 11048661Abstract: A dataflow accelerator including a control/command core, a scratchpad and a coarse grain reconfigurable array (CGRA) according to an exemplary embodiment is disclosed. The scratchpad may include a write controller to transmit data to an input vector port interface and to receive data from the input vector port interface. The CGRA may receive data from the input vector port interface and includes a plurality of interconnects and a plurality of functional units.Type: GrantFiled: April 15, 2019Date of Patent: June 29, 2021Assignee: SIMPLE MACHINES INC.Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki, Vinay Gangadhar, Preyas Shah, Newsha Ardalani
-
Patent number: 11042797Abstract: According to exemplary embodiments, a method, processor, and system for accelerating a recurrent neural network are presented. A method of accelerating a recurrent neural network may include distributing from a first master core to each of a plurality of processing cores a same relative one or more columns of weight matrix data for each of a plurality of gates in the neural network, broadcasting a current input vector from the first master core to each of the processing cores, and processing each column of weight matrix data in parallel, at each of the respective processing cores.Type: GrantFiled: January 6, 2020Date of Patent: June 22, 2021Assignee: SIMPLEMACHINES INC.Inventors: Karthikeyan Sankaralingam, Yunfeng Li, Vinay Gangadhar, Anthony Nowatzki
-
Patent number: 10963384Abstract: A method for performing acceleration of simultaneous access to shared data may include providing a plurality of groups of cores and a plurality of shared memory structures, providing a pod comprising the plurality of groups of cores linked by a common broadcast channel, and coordinating each shared memory structure to provide a logically unified memory structure. Each memory structure may be associated with a group of cores, and each group of cores may include one or more cores. The common broadcast channel may be operatively coupled to each shared memory structure. The coordinating each shared memory structure may include identifying a simultaneous read-reuse load to a first shared memory structure, fetching data corresponding to the simultaneous read-reuse load, and forwarding the data to shared memory structures other than the first shared memory structure and to groups of cores other than a first group of cores via the broadcast channel.Type: GrantFiled: December 18, 2019Date of Patent: March 30, 2021Assignee: SimpleMachines Inc.Inventors: Karthikeyan Sankaralingam, Vinay Gangadhar, Anthony Nowatzki, Yunfeng Li
-
Patent number: 10936536Abstract: Aspects of the present invention provide a memory system comprising a plurality of stacked memory layers, each memory layer divided into memory sections, wherein each memory section connects to a neighboring memory section in an adjacent memory layer, and a logic layer stacked among the plurality of memory layers, the logic layer divided into logic sections, each logic section including a memory processing core, wherein each logic section connects to a neighboring memory section in an adjacent memory layer to form a memory vault of connected logic and memory sections, and wherein each logic section is configured to communicate directly or indirectly with a host processor. Accordingly, each memory processing core may be configured to respond to a procedure call from the host processor by processing data stored in its respective memory vault and providing a result to the host processor. As a result, increased performance may be provided.Type: GrantFiled: April 30, 2019Date of Patent: March 2, 2021Assignee: Wisconsin Alumni Research FoundationInventors: Karthikeyan Sankaralingam, Jaikrishnan Menon, Lorenzo De Carli
-
Patent number: 10754744Abstract: The amount of speed-up that can be obtained by optimizing the program to run on a different architecture is determined by static measurements of the program. Multiple such static measurements are processed by a machine learning system after being discretized to alter their accuracy vs precision. Static analysis requires less analysis overhead and permits analysis of program portions to optimize allocation of porting resources on a large program.Type: GrantFiled: March 15, 2016Date of Patent: August 25, 2020Assignee: Wisconsin Alumni Research FoundationInventors: Karthikeyan Sankaralingam, Newsha Ardalani, Urmish Thakker
-
Publication number: 20200218965Abstract: According to exemplary embodiments, a method, processor, and system for accelerating a recurrent neural network are presented. A method of accelerating a recurrent neural network may include distributing from a first master core to each of a plurality of processing cores a same relative one or more columns of weight matrix data for each of a plurality of gates in the neural network, broadcasting a current input vector from the first master core to each of the processing cores, and processing each column of weight matrix data in parallel, at each of the respective processing cores.Type: ApplicationFiled: January 6, 2020Publication date: July 9, 2020Applicant: SimpleMachines Inc.Inventors: Karthikeyan Sankaralingam, Yunfeng Li, Vinay Gangadhar, Anthony Nowatzki
-
Publication number: 20200201690Abstract: A method for performing acceleration of simultaneous access to shared data may include providing a plurality of groups of cores and a plurality of shared memory structures, providing a pod comprising the plurality of groups of cores linked by a common broadcast channel, and coordinating each shared memory structure to provide a logically unified memory structure. Each memory structure may be associated with a group of cores, and each group of cores may include one or more cores. The common broadcast channel may be operatively coupled to each shared memory structure. The coordinating each shared memory structure may include identifying a simultaneous read-reuse load to a first shared memory structure, fetching data corresponding to the simultaneous read-reuse load, and forwarding the data to shared memory structures other than the first shared memory structure and to groups of cores other than a first group of cores via the broadcast channel.Type: ApplicationFiled: December 18, 2019Publication date: June 25, 2020Applicant: SimpleMachines Inc.Inventors: Karthikeyan Sankaralingam, Vinay Gangadhar, Anthony Nowatzki, Yunfeng Li
-
Patent number: 10591983Abstract: A specialized memory access processor is placed between a main processor and accelerator hardware to handle memory access for the accelerator hardware. The architecture of the memory access processor is designed to allow lower energy memory accesses than can be obtained by the main processor in providing data to the hardware accelerator while providing the hardware accelerator with a sufficiently high bandwidth memory channel. In some embodiments, the main processor may enter a sleep state during accelerator calculations to substantially lower energy consumption.Type: GrantFiled: March 14, 2014Date of Patent: March 17, 2020Assignee: Wisconsin Alumni Research FoundationInventors: Chen-Han Ho, Karthikeyan Sankaralingam, Sung Kim
-
Publication number: 20190317770Abstract: According to some embodiments, a dataflow accelerator comprises a control/command core, a scratchpad and a coarse grain reconfigurable array (CGRA). The scratchpad comprises a write controller to transmit data to an input vector port interface and to receive data from the input vector port interface. The CGRA receives data from the input vector port interface where the CGRA comprising a plurality of interconnects and a plurality of functional units.Type: ApplicationFiled: April 15, 2019Publication date: October 17, 2019Applicant: SimpleMachines Inc.Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki, Vinay Gangadhar, Preyas Shah, Newsha Ardalani
-
Publication number: 20190258601Abstract: Aspects of the present invention provide a memory system comprising a plurality of stacked memory layers, each memory layer divided into memory sections, wherein each memory section connects to a neighboring memory section in an adjacent memory layer, and a logic layer stacked among the plurality of memory layers, the logic layer divided into logic sections, each logic section including a memory processing core, wherein each logic section connects to a neighboring memory section in an adjacent memory layer to form a memory vault of connected logic and memory sections, and wherein each logic section is configured to communicate directly or indirectly with a host processor. Accordingly, each memory processing core may be configured to respond to a procedure call from the host processor by processing data stored in its respective memory vault and providing a result to the host processor. As a result, increased performance may be provided.Type: ApplicationFiled: April 30, 2019Publication date: August 22, 2019Inventors: Karthikeyan Sankaralingam, Jaikrishnan Menon, Lorenzo De Carli
-
Patent number: 10289604Abstract: Aspects of the present invention provide a memory system comprising a plurality of stacked memory layers, each memory layer divided into memory sections, wherein each memory section connects to a neighboring memory section in an adjacent memory layer, and a logic layer stacked among the plurality of memory layers, the logic layer divided into logic sections, each logic section including a memory processing core, wherein each logic section connects to a neighboring memory section in an adjacent memory layer to form a memory vault of connected logic and memory sections, and wherein each logic section is configured to communicate directly or indirectly with a host processor. Accordingly, each memory processing core may be configured to respond to a procedure call from the host processor by processing data stored in its respective memory vault and providing a result to the host processor. As a result, increased performance may be provided.Type: GrantFiled: August 7, 2014Date of Patent: May 14, 2019Assignee: Wisconsin Alumni Research FoundationInventors: Karthikeyan Sankaralingam, Jaikrishnan Menon, Lorenzo De Carli
-
Patent number: 10216693Abstract: A dataflow computer processor is teamed with a general computer processor so that program portions of an application program particularly suited to dataflow execution may be transferred to the dataflow processor during portions of the execution of the application program by the general computer processor. During this time the general computer processor may be placed in partial shutdown for energy conservation.Type: GrantFiled: July 30, 2015Date of Patent: February 26, 2019Assignee: Wisconsin Alumni Research FoundationInventors: Anthony Nowatzki, Vinay Gangadhar, Karthikeyan Sankaralingam
-
Publication number: 20190004995Abstract: A hardware accelerator for computers combines a stand-alone, high-speed, fixed program dataflow functional element with a stream processor, the latter of which may autonomously access memory in predefined access patterns after receiving simple stream instructions and provide them to the dataflow functional element. The result is a compact, high-speed processor that may exploit fixed program dataflow functional elements.Type: ApplicationFiled: June 28, 2017Publication date: January 3, 2019Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki, Vinay Gangadhar
-
Publication number: 20180210730Abstract: A reconfigurable hardware accelerator for computers combines a high-speed dataflow processor, having programmable functional units rapidly reconfigured in a network of programmable switches, with a stream processor that may autonomously access memory in predefined access patterns after receiving simple stream instructions. The result is a compact, high-speed processor that may exploit parallelism associated with many application-specific programs susceptible to acceleration.Type: ApplicationFiled: January 26, 2017Publication date: July 26, 2018Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki, Vinay Gangadhar
-
Publication number: 20180052693Abstract: A computer architecture employs multiple special-purpose processors having different affinities for program execution to execute substantial portions of general-purpose programs to provide improved performance with respect to a general-purpose processor executing the general-purpose program alone.Type: ApplicationFiled: August 19, 2016Publication date: February 22, 2018Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki
-
Publication number: 20170270424Abstract: The amount of speed-up that can be obtained by optimizing the program to run on a different architecture is determined by static measurements of the program. Multiple such static measurements are processed by a machine learning system after being discretized to alter their accuracy vs precision. Static analysis requires less analysis overhead and permits analysis of program portions to optimize allocation of porting resources on a large program.Type: ApplicationFiled: March 15, 2016Publication date: September 21, 2017Inventors: Karthikeyan Sankaralingam, Newsha Ardalani, Urmish Thakker
-
Patent number: 9619233Abstract: A computer architecture allows for simplified exception handling by restarting the program after exceptions at the beginning of idempotent regions, the idempotent regions allowing re-execution without the need for restoring complex state information from checkpoints. Recovery from mis-speculation may be provided by a similar mechanism but using smaller idempotent regions reflecting a more frequent occurrence of mis-speculation. A compiler generating different idempotent regions for speculation and exception handling is also disclosed.Type: GrantFiled: February 19, 2016Date of Patent: April 11, 2017Assignee: Wisconsin Alumni Research FoundationInventors: Jaikrishnan Menon, Marc Asher De Kruijf, Karthikeyan Sankaralingam