Patents by Inventor Raghu Prabhakar

Raghu Prabhakar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240020264
    Abstract: A cost estimation tool in a system for implementing an operation unit graph on a reconfigurable processor is presented as well as a method of operating a cost estimation tool for determining scaled logical edge bandwidths in an operation unit graph in preparation of placing and routing the operation unit graph onto a reconfigurable processor. The cost estimation tool may be configured to receive the operation unit graph, divide the operation unit graph in first and second subgraphs, determine maximum latencies of the first and second subgraphs, and determine a scaled logical edge bandwidth of a logical edge that couples a first logical unit of M logical units in the first subgraph with a second logical unit of N logical units in the first subgraph based on M, N, and scaled bandwidth limits of the M and N logical units.
    Type: Application
    Filed: July 13, 2023
    Publication date: January 18, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Yue FU, Kin Hing LEUNG, Joshua BROT, Arvind Krishna SUJEETH, Sumti JAIRATH, Andrew DENG, Chris RÉ, Raghu PRABHAKAR
  • Publication number: 20240020170
    Abstract: A cost estimation tool in a system for implementing an operation unit graph on a reconfigurable processor is presented as well as a method of operating a cost estimation tool for estimating a cost of implementing an operation unit graph. The operation unit graph may include first and second logical units that perform first and second data operations and have first and second ports, respectively, coupled by a logical edge, on a reconfigurable processor. The method includes receiving the operation unit graph, determining first and second upper bandwidth limits of the first and second ports, respectively, determining a logical edge bandwidth of the logical edge based on the first and second upper bandwidth limits, determining a timing group for the logical edge, and providing the logical edge bandwidth and the timing group as a cost estimation of implementing the operation unit graph on the reconfigurable processor.
    Type: Application
    Filed: July 13, 2023
    Publication date: January 18, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Yue FU, Kin Hing LEUNG, Arvind Krishna SUJEETH, Sumti JAIRATH, Andrew DENG, Chris RÉ, Raghu PRABHAKAR
  • Publication number: 20230409233
    Abstract: Buffer assignment in a contiguous area in a coarse-grained reconfigurable (CGR) array is optimized by temporarily assigning a first buffer portion and a second buffer portion to first and second physical memory units, routing connections in the contiguous area, and calculating a first cost. A list of candidates for a third physical memory unit is created, and a best cost and a best candidate are initialized. For each candidate, the first and second buffer are reassigned to the candidate, connections for data and dataflow control information in the contiguous area are routed, and a second cost is calculated. If the second cost is better than the best cost, the best cost and the best candidate are updated.
    Type: Application
    Filed: November 15, 2022
    Publication date: December 21, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Nathan Francis SHEELEY, Raghu PRABHAKAR, David Alan KOEPLINGER
  • Patent number: 11841811
    Abstract: A reconfigurable processor comprises an array of processing units and an instrumentation network. The array of processing units is configured to execute runtime events to execute an application. The instrumentation network is operatively coupled to the array of processing units. The instrumentation network comprises a control bus configured to form control signal routes in the instrumentation network. The instrumentation network further comprises a plurality of instrumentation counters having inputs and outputs connected to the control bus and to the processing units. Instrumentation counters in the plurality instrumentation units are configurable to consume control signals on the inputs and produce counts of the runtime events on the outputs.
    Type: Grant
    Filed: September 20, 2021
    Date of Patent: December 12, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Raghu Prabhakar, Matthew Thomas Grimm, Sumti Jairath, Kin Hing Leung, Sitanshu Gupta, Yuan Lin, Luca Boasso
  • Publication number: 20230367844
    Abstract: A computing method comprises generating an integrated matrix having (K+P) number of columns, columns 1 through K of the integrated matrix comprising columns 1 through K of a multiplicand matrix and columns (K+1) though P of the integrated matrix comprising addend columns. The method computes K number of products of elements of a row of the integrated matrix multiplied by elements of a column of a second multiplicand matrix; computes a (K+1) product comprising an element of an addend column multiplied by a constant; and, computes a sum of the K number of products added to the (K+1) product. The sum is equivalent to a sum of products of a column of the M×K matrix multiplied by a row of the K×N matrix added to the an element of an addend column of the integrated matrix. A computing system and a computer program product can implement the method.
    Type: Application
    Filed: July 24, 2023
    Publication date: November 16, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Pramod NATARAJA, Raghu PRABHAKAR, David Brian JACKSON, Ram SIVARAMAKRISHNAN
  • Publication number: 20230367845
    Abstract: A method comprises executing (K+P) number of transposition cycles to generate a transpose-extended matrix having N rows and (K+P) columns, in which columns 1 to K comprise a transposition of a first matrix having K rows and N columns, and columns (K+1) to (K+P) comprise constants or elements of an N×1 matrix. The method includes computing a sum-product of a row of a second matrix, having M rows and N columns, multiplied by a column among columns 1 to K of the transpose-extended matrix; and, computing a second sum-product of the row of the second matrix multiplied by a column among columns (K+1) to (K+P) of the transpose-extended matrix. The sum-products can comprise gradients of input matrices. A transpose processing unit can execute the transposition cycles to read K rows of the first matrix and insert P number of constant or N×1 columns to generate the transpose-extended matrix.
    Type: Application
    Filed: July 24, 2023
    Publication date: November 16, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Pramod NATARAJA, Raghu PRABHAKAR, David Brian JACKSON, Ram SIVARAMAKRISHNAN
  • Publication number: 20230325163
    Abstract: The technology disclosed relates to storing a dataflow graph with a plurality of compute nodes that transmit data along data connections, and controlling data transmission between compute nodes in the plurality of compute nodes along the data connections by using control connections to control writing of data.
    Type: Application
    Filed: June 7, 2023
    Publication date: October 12, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Weiwei CHEN, Raghu PRABHAKAR, David Alan KOEPLINGER, Sitanshu GUPTA, Ruddhi CHAPHEKAR, Ajit PUNJ, Sumti JAIRATH
  • Patent number: 11782729
    Abstract: A data processing system comprises a pool of reconfigurable data flow resources and a runtime processor. The pool of reconfigurable data flow resources includes arrays of physical configurable units and memory. The runtime processor includes logic to receive a plurality of configuration files for user applications. The configuration files include configurations of virtual data flow resources required to execute the user applications. The runtime processor also includes logic to allocate physical configurable units and memory in the pool of reconfigurable data flow resources to the virtual data flow resources and load the configuration files to the allocated physical configurable units. The runtime processor further includes logic to execute the user applications using the allocated physical configurable units and memory.
    Type: Grant
    Filed: August 18, 2020
    Date of Patent: October 10, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Gregory Frederick Grohoski, Manish K. Shah, Raghu Prabhakar, Mark Luttrell, Ravinder Kumar, Kin Hing Leung, Ranen Chatterjee, Sumti Jairath, David Alan Koeplinger, Ram Sivaramakrishnan, Matthew Thomas Grimm
  • Patent number: 11782856
    Abstract: A data processing system comprises memory, compile time logic, runtime logic, and instrumentation profiling logic. The memory stores a dataflow graph for an application. The dataflow graph has a plurality of compute nodes that are configured to be producers to produce data for execution of the application, and to be consumers to consume the data for execution of the application. The compile time logic partitions execution of the dataflow graph into stages. Each of the stages has one or more compute nodes, one or more producers, and one or more consumers. The runtime logic determines a processing latency for each of the stages by calculating time elapsed between producers of a particular stage receiving input data and consumers of the particular stage receiving output data. The instrumentation profiling logic generates performance statistics for the dataflow graph based on the processing latency determined for each of the stages.
    Type: Grant
    Filed: September 20, 2021
    Date of Patent: October 10, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Raghu Prabhakar, Matthew Thomas Grimm, Sumti Jairath, Kin Hing Leung, Sitanshu Gupta, Yuan Lin, Luca Boasso
  • Publication number: 20230289310
    Abstract: A reconfigurable data processor comprises an array of configurable units and a bus system. The bus system is connected to the array of configurable units. The bus system includes a top level network and an array level network. The top level network is connected to an external data interface for communication with memory outside of the array of configurable units. The array level network is connected to configurable units in the array of configurable units.
    Type: Application
    Filed: May 18, 2023
    Publication date: September 14, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Gregory Frederick GROHOSKI, Sumti JAIRATH, Mark LUTTRELL, Raghu PRABHAKAR, Ram SIVARAMAKRISHNAN, Manish K. SHAH
  • Publication number: 20230259477
    Abstract: A data processing system for implementing operations that generate a dynamically-sized output is presented. The data processing system includes a reconfigurable processor that is configured to implement a first operation, a second operation, a recording unit, and a control unit. The first operation generates an output, wherein a size of the output is unknown during a configuration phase. The second operation receives the output of the first operation as an input. The recording unit generates control data that is indicative of the size of the output. The control unit that provides the control data to the second operation, wherein the second operation processes the input based on the control data.
    Type: Application
    Filed: February 14, 2023
    Publication date: August 17, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Abhishek SRIVASTAVA, Matthew VILIM, Raghu PRABHAKAR, Sankar RACHURU, Zhekun ZHANG, Matheen MUSADDIQ, Apurv VIVEK, Sitanshu GUPTA, Ayesha Siddiqua
  • Publication number: 20230252106
    Abstract: A method generates pairs of split matrices based on a left and a right matrix sharing dimension K. A first column-split matrix comprises columns 1 to Q of the left matrix and a second column-split matrix comprises columns Q+1 to Q+P of the left matrix. A first row-split matrix comprises rows 1 to Q of the right matrix and a second row-split matrix comprises columns rows Q+1 to Q+P of the right matrix. The method multiplies the first column-matrix and first row matrix to compute a first dot product, and multiplies the second column-matrix and second row matrix to compute a second dot product. The method adds the dot products to compute a third dot product. The method can compute the first and second dot products concurrently. A computing system can comprise a matrix splitter to generate the matrices and can comprise matrix processing units to compute the dot products.
    Type: Application
    Filed: February 3, 2023
    Publication date: August 10, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Pramod NATARAJA, Raghu PRABHAKAR
  • Patent number: 11714780
    Abstract: The technology disclosed partitions a dataflow graph of a high-level program into memory allocations and execution fragments. The memory allocations represent creation of logical memory spaces in on-processor and/or off-processor memories for data required to implement the dataflow graph. The execution fragments represent operations on the data. The technology disclosed designates the memory allocations to virtual memory units and the execution fragments to virtual compute units. The technology disclosed partitions the execution fragments into memory fragments and compute fragments, and assigns the memory fragments to the virtual memory units and the compute fragments to the virtual compute units. The technology disclosed then allocates the virtual memory units to physical memory units and the virtual compute units to physical compute units.
    Type: Grant
    Filed: May 20, 2021
    Date of Patent: August 1, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: David Alan Koeplinger, Raghu Prabhakar, Sumti Jairath
  • Patent number: 11709664
    Abstract: A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.
    Type: Grant
    Filed: June 2, 2020
    Date of Patent: July 25, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Weiwei Chen, Raghu Prabhakar, David Alan Koeplinger, Sitanshu Gupta, Ruddhi Arun Chaphekar, Ajit Punj, Sumti Jairath
  • Publication number: 20230229623
    Abstract: A coarse-grained reconfigurable (CGR) processor includes a configurable unit comprising a fracturable data path with a plurality of sub-paths. The fracturable data path includes multiple stages that each include an arithmetic logic unit (ALU), selection logic to select two or more inputs for the ALU, and sub-path pipeline registers. The fracturable data path also includes a first output configurable to provide first data selected from any one of the sub-path pipeline registers and a second output configurable to provide second data selected from any one of the sub-path pipeline registers. The configurable unit includes a configuration store to store configuration data to provide a two or more immediate data fields for each stage of the fracturable data path and configuration information for the ALUs, the selection logic, and to select the first data and the second data for the first output and the second output.
    Type: Application
    Filed: January 19, 2023
    Publication date: July 20, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Raghu PRABHAKAR, David Brian JACKSON
  • Publication number: 20230229407
    Abstract: A complier produces a configuration file to configure a fracturable data path of a configurable unit in a coarse-grained reconfigurable processor to concurrently generate different address sequences generated using different address associated with different operations. The fracturable data path includes multiple computation stages respectively including a pipeline register. The compiler analyzes a first address calculation and a second address calculation and assigns a first set of stages to the first operation to generate the first address sequence and a second set of stages to the second operation to generate the second address sequence using the second set of stages, based on the analysis. A configuration file for the configurable unit is generated by the compiler that assigns the first set of stages to the first operation and the second set of stages to the second operation and includes two or more immediate values for each computation stage.
    Type: Application
    Filed: January 19, 2023
    Publication date: July 20, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Raghu PRABHAKAR, David Brian JACKSON, Scott BURSON
  • Publication number: 20230205501
    Abstract: The technology disclosed provides a system that comprises a processor with computing units on an integrated circuit substrate. The processor is configured to map a program across multiple hardware stages with each hardware stage executing a corresponding operation of the program at a different stage latency dependent on an operation type and an operand format. The system further comprises a runtime logic that configures the compute units with configuration data. The configuration data causes first and second producer hardware stages in a given compute unit to execute first and second data processing operations and produce first and second outputs at first and second stage latencies, and synchronizes consumption of the first and second outputs by a consumer hardware stage in the given compute unit for execution of a third data processing operation by introducing a register storage delay that compensates for a difference between the first and second stage latencies.
    Type: Application
    Filed: December 27, 2022
    Publication date: June 29, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Weiwei CHEN, Raghu PRABHAKAR, David Alan KOEPLINGER
  • Publication number: 20230195686
    Abstract: A logic unit in an array of processing units is configurable to consume source tokens and a status signal and to produce barrier tokens and an enable signal based on the source tokens and the status signal.
    Type: Application
    Filed: February 14, 2023
    Publication date: June 22, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Raghu PRABHAKAR, Manish K. SHAH, Ram SIVARAMAKRISHNAN, Pramod NATARAJA, David Brian JACKSON, Gregory Frederick GROHOSKI
  • Publication number: 20230195478
    Abstract: In a Coarse-Grained Reconfigurable Architecture (CGRA) system, two configuration files are used. The CGRA system has an array of configurable units that includes a plurality of switches, a print configurable unit, a source configurable unit, and one or more sink configurable units, The first configuration file, upon being executed by the CGRA system, configures the CGRA system to send output data directly from the source configurable unit to the one or more sink configurable units through the plurality of switches. The second configuration file, upon being executed into the CGRA system, configures the CGRA system to send the output data from the source configurable unit to the print configurable unit through the plurality of switches, send the output data from the print configurable unit to both a memory that is accessible by a host computing unit, and the one or more sink configurable units.
    Type: Application
    Filed: December 15, 2022
    Publication date: June 22, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Joshua BROT, Raghu PRABHAKAR, Subhra MAZUMDAR, James DECKER, Tram TRAN
  • Patent number: 11681645
    Abstract: A reconfigurable data processor includes a plurality of configurable units, and a configuration controller. The configuration controller is configured to start execution of a first application graph in a first set of configurable units. Then, concurrently with the execution of the first application graph in the first set of configurable units, the configuration controllers receive a command to load a configuration file into a second set of configurable units and obtain the configuration file. The configuration file contains information to configure the second set of configurable units to execute a second application graph. The configuration file is then loaded into the second set of configurable units and execution of the second application graph is started in the second set of configurable units.
    Type: Grant
    Filed: January 31, 2022
    Date of Patent: June 20, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Gregory Frederick Grohoski, Sumti Jairath, Mark Luttrell, Raghu Prabhakar, Ram Sivaramakrishnan, Manish K. Shah