Patents by Inventor Sitanshu Gupta

Sitanshu Gupta has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11841811
    Abstract: A reconfigurable processor comprises an array of processing units and an instrumentation network. The array of processing units is configured to execute runtime events to execute an application. The instrumentation network is operatively coupled to the array of processing units. The instrumentation network comprises a control bus configured to form control signal routes in the instrumentation network. The instrumentation network further comprises a plurality of instrumentation counters having inputs and outputs connected to the control bus and to the processing units. Instrumentation counters in the plurality instrumentation units are configurable to consume control signals on the inputs and produce counts of the runtime events on the outputs.
    Type: Grant
    Filed: September 20, 2021
    Date of Patent: December 12, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Raghu Prabhakar, Matthew Thomas Grimm, Sumti Jairath, Kin Hing Leung, Sitanshu Gupta, Yuan Lin, Luca Boasso
  • Publication number: 20230325163
    Abstract: The technology disclosed relates to storing a dataflow graph with a plurality of compute nodes that transmit data along data connections, and controlling data transmission between compute nodes in the plurality of compute nodes along the data connections by using control connections to control writing of data.
    Type: Application
    Filed: June 7, 2023
    Publication date: October 12, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Weiwei CHEN, Raghu PRABHAKAR, David Alan KOEPLINGER, Sitanshu GUPTA, Ruddhi CHAPHEKAR, Ajit PUNJ, Sumti JAIRATH
  • Patent number: 11782856
    Abstract: A data processing system comprises memory, compile time logic, runtime logic, and instrumentation profiling logic. The memory stores a dataflow graph for an application. The dataflow graph has a plurality of compute nodes that are configured to be producers to produce data for execution of the application, and to be consumers to consume the data for execution of the application. The compile time logic partitions execution of the dataflow graph into stages. Each of the stages has one or more compute nodes, one or more producers, and one or more consumers. The runtime logic determines a processing latency for each of the stages by calculating time elapsed between producers of a particular stage receiving input data and consumers of the particular stage receiving output data. The instrumentation profiling logic generates performance statistics for the dataflow graph based on the processing latency determined for each of the stages.
    Type: Grant
    Filed: September 20, 2021
    Date of Patent: October 10, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Raghu Prabhakar, Matthew Thomas Grimm, Sumti Jairath, Kin Hing Leung, Sitanshu Gupta, Yuan Lin, Luca Boasso
  • Publication number: 20230259477
    Abstract: A data processing system for implementing operations that generate a dynamically-sized output is presented. The data processing system includes a reconfigurable processor that is configured to implement a first operation, a second operation, a recording unit, and a control unit. The first operation generates an output, wherein a size of the output is unknown during a configuration phase. The second operation receives the output of the first operation as an input. The recording unit generates control data that is indicative of the size of the output. The control unit that provides the control data to the second operation, wherein the second operation processes the input based on the control data.
    Type: Application
    Filed: February 14, 2023
    Publication date: August 17, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Abhishek SRIVASTAVA, Matthew VILIM, Raghu PRABHAKAR, Sankar RACHURU, Zhekun ZHANG, Matheen MUSADDIQ, Apurv VIVEK, Sitanshu GUPTA, Ayesha Siddiqua
  • Publication number: 20230244748
    Abstract: A method for multiplying matrices in a coarse-grained computing grid includes assigning each compute unit c of C compute units to a unique submatrix Rc of a result matrix R, wherein the C compute units are arranged in a 2D computing grid, configuring one or more source memory units to provide relevant matrix A data and matrix B data to the C compute units via a plurality of packets, configuring each compute unit c to produce the unique submatrix Rc and send the unique submatrix Rc to one or more desired memory units. The method also includes initiating data flow in the computing grid to produce the result matrix R within the desired memory units. To reduce packet traffic, Matrix B data corresponding to a column of compute units may be narrow-casted to each column of compute units. A corresponding system and computer-readable medium are also disclosed herein.
    Type: Application
    Filed: May 25, 2022
    Publication date: August 3, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Pramod Natarja, Sitanshu Gupta, Ram Sivaramakrishnan, Ajit Punj
  • Patent number: 11709664
    Abstract: A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.
    Type: Grant
    Filed: June 2, 2020
    Date of Patent: July 25, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Weiwei Chen, Raghu Prabhakar, David Alan Koeplinger, Sitanshu Gupta, Ruddhi Arun Chaphekar, Ajit Punj, Sumti Jairath
  • Patent number: 11561925
    Abstract: A method of processing partitions of a tensor in a target order includes receiving, by a reorder unit and from two or more producer units, a plurality of partitions of a tensor in a first order that is different from the target order, storing the plurality of partitions in the reorder unit, and providing, from the reorder unit, the plurality of partitions in the target order to one or more consumer units. In an example, the one or more consumer units process the plurality of partitions in the target order.
    Type: Grant
    Filed: September 16, 2021
    Date of Patent: January 24, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Raghu Prabhakar, Nathan Francis Sheeley, Matheen Musaddiq, Scott Layson Burson, Sitanshu Gupta, Sumti Jairath, Pramod Nataraja, Ajit Punj
  • Publication number: 20220309029
    Abstract: A method of processing partitions of a tensor in a target order includes receiving, by a reorder unit and from two or more producer units, a plurality of partitions of a tensor in a first order that is different from the target order, storing the plurality of partitions in the reorder unit, and providing, from the reorder unit, the plurality of partitions in the target order to one or more consumer units. In an example, the one or more consumer units process the plurality of partitions in the target order.
    Type: Application
    Filed: September 16, 2021
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Raghu PRABHAKAR, Nathan Francis SHEELEY, Matheen MUSADDIQ, Scott Layson BURSON, Sitanshu GUPTA, Sumti JAIRATH, Pramod NATARAJA, Ajit PUNJ
  • Patent number: 11443014
    Abstract: The technology disclosed relates to matrix multiplication where the multiplier can be a sparse matrix. In particular, a multiplication device includes first circuitry configured to obtain the multiplicand matrix and an index of columns of the multiplier matrix and to generate an intermediate matrix that has one row per entry in the index copied from a respective row of the multiplicand matrix based on a value of a corresponding entry in the index. The device also includes second circuitry configured to receive the intermediate matrix from the first circuitry, obtain non-zero values of the multiplier matrix and a list of a number of non-zero entries per row of the multiplier matrix, and generate a product matrix as a result of multiplies of the non-zero values of the multiplier matrix and the intermediate matrix.
    Type: Grant
    Filed: November 5, 2021
    Date of Patent: September 13, 2022
    Assignee: SambaNova Systems, Inc.
    Inventors: Mingran Wang, Raghu Prabhakar, Darshan Dhimantkumar Gandhi, Maulik Subhash Desai, Nathan Francis Sheeley, Scott Layson Burson, Sitanshu Gupta
  • Publication number: 20220261364
    Abstract: A data processing system comprises memory, compile time logic, runtime logic, and instrumentation profiling logic. The memory stores a dataflow graph for an application. The dataflow graph has a plurality of compute nodes that are configured to be producers to produce data for execution of the application, and to be consumers to consume the data for execution of the application. The compile time logic partitions execution of the dataflow graph into stages. Each of the stages has one or more compute nodes, one or more producers, and one or more consumers. The runtime logic determines a processing latency for each of the stages by calculating time elapsed between producers of a particular stage receiving input data and consumers of the particular stage receiving output data. The instrumentation profiling logic generates performance statistics for the dataflow graph based on the processing latency determined for each of the stages.
    Type: Application
    Filed: September 20, 2021
    Publication date: August 18, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Raghu PRABHAKAR, Matthew Thomas GRIMM, Sumti JAIRATH, Kin Hing LEUNG, Sitanshu GUPTA, Yuan LIN, Luca BOASSO
  • Publication number: 20220261365
    Abstract: A reconfigurable processor comprises an array of processing units and an instrumentation network. The array of processing units is configured to execute runtime events to execute an application. The instrumentation network is operatively coupled to the array of processing units. The instrumentation network comprises a control bus configured to form control signal routes in the instrumentation network. The instrumentation network further comprises a plurality of instrumentation counters having inputs and outputs connected to the control bus and to the processing units. Instrumentation counters in the plurality instrumentation units are configurable to consume control signals on the inputs and produce counts of the runtime events on the outputs.
    Type: Application
    Filed: September 20, 2021
    Publication date: August 18, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Raghu PRABHAKAR, Matthew Thomas GRIMM, Sumti JAIRATH, Kin Hing LEUNG, Sitanshu GUPTA, Yuan LIN, Luca BOASSO
  • Patent number: 11366783
    Abstract: An integrated circuit includes a plurality of configurable units, each configurable unit having two or more corresponding sections. The plurality of configurable units is arranged in a serial arrangement to form a chain of sections of the configurable units. A data bus is connected to the plurality of configurable units which communicates data at a clock rate. The chain of sections is to receive and write a series of tensors at the clock rate at a first end section of the chain of sections, and sequentially propagate the series of tensors through individual sections within the chain of sections at the clock rate. The chain of sections is to output the series of tensors at a second end section of the chain of sections. The chain of sections is to also output the series of tensors at an intermediate section of the chain of sections.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: June 21, 2022
    Assignee: SambaNova Systems, Inc.
    Inventors: Raghu Prabhakar, Nathan Francis Sheeley, Amitabh Menon, Sitanshu Gupta, Sumti Jairath, Matheen Musaddiq
  • Patent number: 11204889
    Abstract: A method of processing partitions of a tensor in a target order includes receiving, by a reorder unit and from two or more producer units, a plurality of partitions of a tensor in a first order that is different from the target order, storing the plurality of partitions in the reorder unit, and providing, from the reorder unit, the plurality of partitions in the target order to one or more consumer units. In an example, the one or more consumer units process the plurality of partitions in the target order.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: December 21, 2021
    Assignee: SambaNova Systems, Inc.
    Inventors: Raghu Prabhakar, Nathan Francis Sheeley, Matheen Musaddiq, Scott Layson Burson, Sitanshu Gupta, Sumti Jairath, Pramod Nataraja, Ajit Punj
  • Publication number: 20210373867
    Abstract: A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.
    Type: Application
    Filed: June 2, 2020
    Publication date: December 2, 2021
    Applicant: SambaNova Systems, Inc.
    Inventors: Weiwei CHEN, Raghu PRABHAKAR, David Alan KOEPLINGER, Sitanshu GUPTA, Ruddhi Arun CHAPHEKAR, Ajit PUNJ, Sumti JAIRATH
  • Patent number: 11126574
    Abstract: A data processing system comprises compile time logic, runtime logic, a control bus, and instrumentation units operatively coupled to processing units of an array. The compile time logic is configured to generate configuration files for a dataflow graph. The runtime logic is configured to execute the configuration files on the array, and to trigger start and stop events, as defined by the configuration files, in response to implementation of compute and memory operations of the dataflow graph on the array. A control bus is configured to form event routes in the array. The instrumentation units have inputs and outputs connected to the control bus and to the processing units. The instrumentation units are configured to consume the start events on the inputs and start counting clock cycles, consume the stop events on the inputs and stop counting the clock cycles, and report the counted clock cycles on the outputs.
    Type: Grant
    Filed: February 12, 2021
    Date of Patent: September 21, 2021
    Assignee: SambaNova Systems, Inc.
    Inventors: Raghu Prabhakar, Matthew Thomas Grimm, Sumti Jairath, Kin Hing Leung, Sitanshu Gupta, Yuan Lin, Luca Boasso