Patents by Inventor Sitanshu Gupta

Sitanshu Gupta has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Flow control for reconfigurable processors

Patent number: 12236220

Abstract: The technology disclosed relates to storing a dataflow graph with a plurality of compute nodes that transmit data along data connections, and controlling data transmission between compute nodes in the plurality of compute nodes along the data connections by using control connections to control writing of data.

Type: Grant

Filed: June 7, 2023

Date of Patent: February 25, 2025

Assignee: SambaNova Systems, Inc.

Inventors: Weiwei Chen, Raghu Prabhakar, David Alan Koeplinger, Sitanshu Gupta, Ruddhi Chaphekar, Ajit Punj, Sumti Jairath
Dynamically-sized data structures on data flow architectures

Patent number: 12189564

Abstract: A data processing system for implementing operations that generate a dynamically-sized output is presented. The data processing system includes a reconfigurable processor that is configured to implement a first operation, a second operation, a recording unit, and a control unit. The first operation generates an output, wherein a size of the output is unknown during a configuration phase. The second operation receives the output of the first operation as an input. The recording unit generates control data that is indicative of the size of the output. The control unit that provides the control data to the second operation, wherein the second operation processes the input based on the control data.

Type: Grant

Filed: February 14, 2023

Date of Patent: January 7, 2025

Assignee: SambaNova Systems, Inc.

Inventors: Abhishek Srivastava, Matthew Vilim, Raghu Prabhakar, Sankar Rachuru, Zhekun Zhang, Matheen Musaddiq, Apurv Vivek, Sitanshu Gupta, Ayesha Siddiqua
System for Implementing Operations Having Dynamically-Sized Data Structures on a Reconfigurable Processor.

Publication number: 20250004971

Abstract: A data processing system for implementing operations that generate a dynamically-sized output includes a reconfigurable processor that is configured to implement first and second operations and control circuitry. The first operation generates an output, whose size is unknown during a configuration phase. The second operation receives the output as an input. During a write operation, the first operation is enabled to write a first portion of the output to a first portion of a buffer, while the second operation reads a first portion of the input that is different than the first portion of the output from a second portion of the buffer that is different than the first portion of the buffer. The control circuity includes a control unit that: directs the second operation during a read operation following the write operation to read data as the input from the buffer that was stored during the write operation.

Type: Application

Filed: September 13, 2024

Publication date: January 2, 2025

Applicant: SambaNova Systems, Inc.

Inventors: Abhishek SRIVASTAVA, Matthew VILIM, Raghu PRABHAKAR, Sankar RACHURU, Zhekun ZHANG, Matheen MUSADDIQ, Apurv VIVEK, Sitanshu GUPTA, Ayesha Siddiqua
Coupling Operations on Dynamically-Sized Data Structures in Data Flow Architectures

Publication number: 20250004972

Abstract: A data processing system for implementing operations that generate a dynamically-sized output comprises a reconfigurable processor and a compiler. The compiler generates configuration data for configuring the reconfigurable processor to implement first and second operations and first and second connections. The first operation generates an output, and the second operation receives the output of the first operation as an input. The size of the output is unknown when generating the configuration data, and the output comprises a number of elements that is smaller than or equal to a predetermined maximum number of elements. The first connection for the output and the second connection for the input are both suitable for a transmission of the predetermined maximum number of elements. The reconfigurable processor is configured with the configuration data such that the reconfigurable processor implements the first operation, the second operation, the first connection, and the second connection.

Type: Application

Filed: September 13, 2024

Publication date: January 2, 2025

Applicant: SambaNova Systems, Inc.

Inventors: Abhishek SRIVASTAVA, Matthew VILIM, Raghu PRABHAKAR, Sankar RACHURU, Zhekun ZHANG, Matheen MUSADDIQ, Apurv VIVEK, Sitanshu GUPTA, Ayesha Siddiqua
HANDLING DYNAMIC TENSOR LENGTHS IN A RECONFIGURABLE PROCESSOR THAT INCLUDES MULTIPLE MEMORY UNITS

Publication number: 20240427727

Abstract: In some aspects, a program is executed on a coarse-grained reconfigurable (CGR) processor. The CGR determines that the program produces an output that includes a variable length tensor, determines a maximum size of the variable length tensor and sets, based on the maximum size, a maximum of a counter associated with the program. The counter is set to an initial value of zero. The CGR initiates execution of the program, causing the program to receive an input tensor. Based on determining that the program is operating on a first portion of the input tensor, the CGR performs an update to the counter, to create an updated counter, and communicates the updated counter to one or more consumers within the program. After determining that the program has completed operating on the input tensor, a final size of the output is communicated to one or more downstream consumers external to the program.

Type: Application

Filed: June 23, 2023

Publication date: December 26, 2024

Applicant: SambaNova Systems, Inc.

Inventors: Abhishek SRIVASTAVA, Matthew VILIM, Raghu PRABHAKAR, Sankar RACHURU, Zhekun ZHANG, Matheen MUSADDIQ, Apurv VIVEK, Sitanshu GUPTA
Instrumentation networks for data flow graphs

Patent number: 11841811

Abstract: A reconfigurable processor comprises an array of processing units and an instrumentation network. The array of processing units is configured to execute runtime events to execute an application. The instrumentation network is operatively coupled to the array of processing units. The instrumentation network comprises a control bus configured to form control signal routes in the instrumentation network. The instrumentation network further comprises a plurality of instrumentation counters having inputs and outputs connected to the control bus and to the processing units. Instrumentation counters in the plurality instrumentation units are configurable to consume control signals on the inputs and produce counts of the runtime events on the outputs.

Type: Grant

Filed: September 20, 2021

Date of Patent: December 12, 2023

Assignee: SambaNova Systems, Inc.

Inventors: Raghu Prabhakar, Matthew Thomas Grimm, Sumti Jairath, Kin Hing Leung, Sitanshu Gupta, Yuan Lin, Luca Boasso
FLOW CONTROL FOR RECONFIGURABLE PROCESSORS

Publication number: 20230325163

Abstract: The technology disclosed relates to storing a dataflow graph with a plurality of compute nodes that transmit data along data connections, and controlling data transmission between compute nodes in the plurality of compute nodes along the data connections by using control connections to control writing of data.

Type: Application

Filed: June 7, 2023

Publication date: October 12, 2023

Applicant: SambaNova Systems, Inc.

Inventors: Weiwei CHEN, Raghu PRABHAKAR, David Alan KOEPLINGER, Sitanshu GUPTA, Ruddhi CHAPHEKAR, Ajit PUNJ, Sumti JAIRATH
Compile time instrumentation of data flow graphs

Patent number: 11782856

Abstract: A data processing system comprises memory, compile time logic, runtime logic, and instrumentation profiling logic. The memory stores a dataflow graph for an application. The dataflow graph has a plurality of compute nodes that are configured to be producers to produce data for execution of the application, and to be consumers to consume the data for execution of the application. The compile time logic partitions execution of the dataflow graph into stages. Each of the stages has one or more compute nodes, one or more producers, and one or more consumers. The runtime logic determines a processing latency for each of the stages by calculating time elapsed between producers of a particular stage receiving input data and consumers of the particular stage receiving output data. The instrumentation profiling logic generates performance statistics for the dataflow graph based on the processing latency determined for each of the stages.

Type: Grant

Filed: September 20, 2021

Date of Patent: October 10, 2023

Assignee: SambaNova Systems, Inc.

Inventors: Raghu Prabhakar, Matthew Thomas Grimm, Sumti Jairath, Kin Hing Leung, Sitanshu Gupta, Yuan Lin, Luca Boasso
Dynamically-Sized Data Structures on Data Flow Architectures

Publication number: 20230259477

Abstract: A data processing system for implementing operations that generate a dynamically-sized output is presented. The data processing system includes a reconfigurable processor that is configured to implement a first operation, a second operation, a recording unit, and a control unit. The first operation generates an output, wherein a size of the output is unknown during a configuration phase. The second operation receives the output of the first operation as an input. The recording unit generates control data that is indicative of the size of the output. The control unit that provides the control data to the second operation, wherein the second operation processes the input based on the control data.

Type: Application

Filed: February 14, 2023

Publication date: August 17, 2023

Applicant: SambaNova Systems, Inc.

Inventors: Abhishek SRIVASTAVA, Matthew VILIM, Raghu PRABHAKAR, Sankar RACHURU, Zhekun ZHANG, Matheen MUSADDIQ, Apurv VIVEK, Sitanshu GUPTA, Ayesha Siddiqua
Matrix Multiplication on Coarse-grained Computing Grids

Publication number: 20230244748

Abstract: A method for multiplying matrices in a coarse-grained computing grid includes assigning each compute unit c of C compute units to a unique submatrix Rc of a result matrix R, wherein the C compute units are arranged in a 2D computing grid, configuring one or more source memory units to provide relevant matrix A data and matrix B data to the C compute units via a plurality of packets, configuring each compute unit c to produce the unique submatrix Rc and send the unique submatrix Rc to one or more desired memory units. The method also includes initiating data flow in the computing grid to produce the result matrix R within the desired memory units. To reduce packet traffic, Matrix B data corresponding to a column of compute units may be narrow-casted to each column of compute units. A corresponding system and computer-readable medium are also disclosed herein.

Type: Application

Filed: May 25, 2022

Publication date: August 3, 2023

Applicant: SambaNova Systems, Inc.

Inventors: Pramod Natarja, Sitanshu Gupta, Ram Sivaramakrishnan, Ajit Punj
Anti-congestion flow control for reconfigurable processors

Patent number: 11709664

Abstract: A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.

Type: Grant

Filed: June 2, 2020

Date of Patent: July 25, 2023

Assignee: SambaNova Systems, Inc.

Inventors: Weiwei Chen, Raghu Prabhakar, David Alan Koeplinger, Sitanshu Gupta, Ruddhi Arun Chaphekar, Ajit Punj, Sumti Jairath
Tensor partitioning and partition access order

Patent number: 11561925

Abstract: A method of processing partitions of a tensor in a target order includes receiving, by a reorder unit and from two or more producer units, a plurality of partitions of a tensor in a first order that is different from the target order, storing the plurality of partitions in the reorder unit, and providing, from the reorder unit, the plurality of partitions in the target order to one or more consumer units. In an example, the one or more consumer units process the plurality of partitions in the target order.

Type: Grant

Filed: September 16, 2021

Date of Patent: January 24, 2023

Assignee: SambaNova Systems, Inc.

Inventors: Raghu Prabhakar, Nathan Francis Sheeley, Matheen Musaddiq, Scott Layson Burson, Sitanshu Gupta, Sumti Jairath, Pramod Nataraja, Ajit Punj
Tensor Partitioning and Partition Access Order

Publication number: 20220309029

Abstract: A method of processing partitions of a tensor in a target order includes receiving, by a reorder unit and from two or more producer units, a plurality of partitions of a tensor in a first order that is different from the target order, storing the plurality of partitions in the reorder unit, and providing, from the reorder unit, the plurality of partitions in the target order to one or more consumer units. In an example, the one or more consumer units process the plurality of partitions in the target order.

Type: Application

Filed: September 16, 2021

Publication date: September 29, 2022

Applicant: SambaNova Systems, Inc.

Inventors: Raghu PRABHAKAR, Nathan Francis SHEELEY, Matheen MUSADDIQ, Scott Layson BURSON, Sitanshu GUPTA, Sumti JAIRATH, Pramod NATARAJA, Ajit PUNJ
Sparse matrix multiplier in hardware and a reconfigurable data processor including same

Patent number: 11443014

Abstract: The technology disclosed relates to matrix multiplication where the multiplier can be a sparse matrix. In particular, a multiplication device includes first circuitry configured to obtain the multiplicand matrix and an index of columns of the multiplier matrix and to generate an intermediate matrix that has one row per entry in the index copied from a respective row of the multiplicand matrix based on a value of a corresponding entry in the index. The device also includes second circuitry configured to receive the intermediate matrix from the first circuitry, obtain non-zero values of the multiplier matrix and a list of a number of non-zero entries per row of the multiplier matrix, and generate a product matrix as a result of multiplies of the non-zero values of the multiplier matrix and the intermediate matrix.

Type: Grant

Filed: November 5, 2021

Date of Patent: September 13, 2022

Assignee: SambaNova Systems, Inc.

Inventors: Mingran Wang, Raghu Prabhakar, Darshan Dhimantkumar Gandhi, Maulik Subhash Desai, Nathan Francis Sheeley, Scott Layson Burson, Sitanshu Gupta
Instrumentation Networks for Data Flow Graphs

Publication number: 20220261365

Abstract: A reconfigurable processor comprises an array of processing units and an instrumentation network. The array of processing units is configured to execute runtime events to execute an application. The instrumentation network is operatively coupled to the array of processing units. The instrumentation network comprises a control bus configured to form control signal routes in the instrumentation network. The instrumentation network further comprises a plurality of instrumentation counters having inputs and outputs connected to the control bus and to the processing units. Instrumentation counters in the plurality instrumentation units are configurable to consume control signals on the inputs and produce counts of the runtime events on the outputs.

Type: Application

Filed: September 20, 2021

Publication date: August 18, 2022

Applicant: SambaNova Systems, Inc.

Inventors: Raghu PRABHAKAR, Matthew Thomas GRIMM, Sumti JAIRATH, Kin Hing LEUNG, Sitanshu GUPTA, Yuan LIN, Luca BOASSO
Compile Time Instrumentation of Data Flow Graphs

Publication number: 20220261364

Abstract: A data processing system comprises memory, compile time logic, runtime logic, and instrumentation profiling logic. The memory stores a dataflow graph for an application. The dataflow graph has a plurality of compute nodes that are configured to be producers to produce data for execution of the application, and to be consumers to consume the data for execution of the application. The compile time logic partitions execution of the dataflow graph into stages. Each of the stages has one or more compute nodes, one or more producers, and one or more consumers. The runtime logic determines a processing latency for each of the stages by calculating time elapsed between producers of a particular stage receiving input data and consumers of the particular stage receiving output data. The instrumentation profiling logic generates performance statistics for the dataflow graph based on the processing latency determined for each of the stages.

Type: Application

Filed: September 20, 2021

Publication date: August 18, 2022

Applicant: SambaNova Systems, Inc.

Inventors: Raghu PRABHAKAR, Matthew Thomas GRIMM, Sumti JAIRATH, Kin Hing LEUNG, Sitanshu GUPTA, Yuan LIN, Luca BOASSO
Multi-headed multi-buffer for buffering data for processing

Patent number: 11366783

Abstract: An integrated circuit includes a plurality of configurable units, each configurable unit having two or more corresponding sections. The plurality of configurable units is arranged in a serial arrangement to form a chain of sections of the configurable units. A data bus is connected to the plurality of configurable units which communicates data at a clock rate. The chain of sections is to receive and write a series of tensors at the clock rate at a first end section of the chain of sections, and sequentially propagate the series of tensors through individual sections within the chain of sections at the clock rate. The chain of sections is to output the series of tensors at a second end section of the chain of sections. The chain of sections is to also output the series of tensors at an intermediate section of the chain of sections.

Type: Grant

Filed: March 29, 2021

Date of Patent: June 21, 2022

Assignee: SambaNova Systems, Inc.

Inventors: Raghu Prabhakar, Nathan Francis Sheeley, Amitabh Menon, Sitanshu Gupta, Sumti Jairath, Matheen Musaddiq
Tensor partitioning and partition access order

Patent number: 11204889

Abstract: A method of processing partitions of a tensor in a target order includes receiving, by a reorder unit and from two or more producer units, a plurality of partitions of a tensor in a first order that is different from the target order, storing the plurality of partitions in the reorder unit, and providing, from the reorder unit, the plurality of partitions in the target order to one or more consumer units. In an example, the one or more consumer units process the plurality of partitions in the target order.

Type: Grant

Filed: March 29, 2021

Date of Patent: December 21, 2021

Assignee: SambaNova Systems, Inc.

Inventors: Raghu Prabhakar, Nathan Francis Sheeley, Matheen Musaddiq, Scott Layson Burson, Sitanshu Gupta, Sumti Jairath, Pramod Nataraja, Ajit Punj
Anti-Congestion Flow Control for Reconfigurable Processors

Publication number: 20210373867

Abstract: A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.

Type: Application

Filed: June 2, 2020

Publication date: December 2, 2021

Applicant: SambaNova Systems, Inc.

Inventors: Weiwei CHEN, Raghu PRABHAKAR, David Alan KOEPLINGER, Sitanshu GUPTA, Ruddhi Arun CHAPHEKAR, Ajit PUNJ, Sumti JAIRATH
Instrumentation profiling for reconfigurable processors

Patent number: 11126574

Abstract: A data processing system comprises compile time logic, runtime logic, a control bus, and instrumentation units operatively coupled to processing units of an array. The compile time logic is configured to generate configuration files for a dataflow graph. The runtime logic is configured to execute the configuration files on the array, and to trigger start and stop events, as defined by the configuration files, in response to implementation of compute and memory operations of the dataflow graph on the array. A control bus is configured to form event routes in the array. The instrumentation units have inputs and outputs connected to the control bus and to the processing units. The instrumentation units are configured to consume the start events on the inputs and start counting clock cycles, consume the stop events on the inputs and stop counting the clock cycles, and report the counted clock cycles on the outputs.

Type: Grant

Filed: February 12, 2021

Date of Patent: September 21, 2021

Assignee: SambaNova Systems, Inc.

Inventors: Raghu Prabhakar, Matthew Thomas Grimm, Sumti Jairath, Kin Hing Leung, Sitanshu Gupta, Yuan Lin, Luca Boasso