Patents by Inventor Raghu Prabhakar

Raghu Prabhakar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240094794
    Abstract: An integrated circuit (IC) includes an array of statically reconfigurable compute units for separation into mutually exclusive groups. Each group includes statically reconfigurable number of compute units. Each compute unit includes a register statically reconfigurable with a group identifier that identifies which group the compute unit belongs to, a counter statically reconfigurable to synchronously increment with the counters of all the other compute units such that all the counters have the same value each clock cycle, and control circuitry that prevents the compute unit from starting to process data until the counter value matches the identifier. According to operation of the register, the counter, and the control circuitry, no more than the statically reconfigurable number of the compute units are allowed to start processing data concurrently to mitigate supply voltage droop caused by a time rate of change of current drawn by the IC through inductive loads of the IC.
    Type: Application
    Filed: April 8, 2023
    Publication date: March 21, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Darshan GANDHI, Manish K. SHAH, Raghu PRABHAKAR, Gregory Frederick GROHOSKI, Youngmoon CHOI, Jinuk SHIN
  • Patent number: 11934343
    Abstract: Disclosed is a data processing system to receive a processing graph of an application. A compile time logic is configured to modify the processing graph and generate a modified processing graph. The modified processing graph is configured to apply a post-padding tiling after applying a cumulative input padding that confines padding to an input. The cumulative input padding pads the input into a padded input. The post-padding tiling tiles the padded input into a set of pre-padded input tiles with a same tile size, tiles intermediate representation of the input into a set of intermediate tiles with a same tile size, and tiles output representation of the input into a set of non-overlapping output tiles with a same tile size. Runtime logic is configured with the compile time logic to execute the modified processing graph to execute the application.
    Type: Grant
    Filed: July 23, 2021
    Date of Patent: March 19, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth
  • Publication number: 20240085967
    Abstract: An integrated circuit (IC) includes an array of compute units. Each compute unit is configured such that, when transitioning from not processing data to processing data, the compute unit makes an individual contribution to an aggregate time rate of change of current drawn by the IC. Control circuitry is configurable to, for each compute unit of the array of compute units, control when the compute unit is eligible to transition from not processing data to processing data relative to when the other compute units start processing data to mitigate supply voltage overshoot caused by the aggregate time rate of change of current drawn by the IC through inductive loads of the IC.
    Type: Application
    Filed: April 8, 2023
    Publication date: March 14, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Darshan GANDHI, Manish K. SHAH, Raghu PRABHAKAR, Gregory Frederick GROHOSKI, Youngmoon CHOI, Jinuk SHIN
  • Publication number: 20240085965
    Abstract: An integrated circuit (IC) includes an array of compute units. Each compute unit is configured such that, when transitioning from not processing data to processing data, the compute unit makes an individual contribution to an aggregate time rate of change of current drawn by the IC. Control circuitry is configurable to, for each compute unit of the array of compute units, control when the compute unit is eligible to transition from not processing data to processing data relative to when the other compute units start processing data to mitigate supply voltage droop caused by the aggregate time rate of change of current drawn by the IC through inductive loads of the IC.
    Type: Application
    Filed: April 8, 2023
    Publication date: March 14, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Darshan GANDHI, Manish K. SHAH, Raghu PRABHAKAR, Gregory Frederick GROHOSKI, Youngmoon CHOI, Jinuk SHIN
  • Publication number: 20240085966
    Abstract: A method includes analyzing a dataflow graph to generate configuration information loadable into an integrated circuit. The dataflow graph specifies operations to be performed and data dependencies between the operations. The configuration information is usable by the integrated circuit to configure compute units of the integrated circuit to perform respective one or more of the operations of the dataflow graph, control data flow between the compute units to accomplish the data dependencies between the respective operations performed by the compute units, and control when each compute unit starts to perform the respective operations on the data to mitigate supply voltage droop caused by a time rate of change of current drawn by the integrated circuit through inductive loads of the integrated circuit.
    Type: Application
    Filed: April 8, 2023
    Publication date: March 14, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Darshan GANDHI, Manish K. SHAH, Raghu PRABHAKAR, Gregory Frederick GROHOSKI, Youngmoon CHOI, Jinuk SHIN
  • Patent number: 11928512
    Abstract: A reconfigurable data processor comprises an array of configurable units configurable to allocate a plurality of sets of configurable units in the array to implement respective execution fragments of the data processing operation. Quiesce logic is coupled to configurable units in the array, configurable to respond to a quiesce control signal to quiesce the sets of configurable units in the array on quiesce boundaries of the respective execution fragments, and to forward quiesce ready signals for the respective execution fragments when the corresponding sets of processing units are ready. An array quiesce controller distributes the quiesce control signal to configurable units in the array, and receives quiesce ready signals for the respective execution fragments from the quiesce logic.
    Type: Grant
    Filed: May 17, 2021
    Date of Patent: March 12, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Raghu Prabhakar, Manish K. Shah, Pramod Nataraja, David Brian Jackson, Kin Hing Leung, Ram Sivaramakrishnan, Sumti Jairath, Gregory Frederick Grohoski
  • Patent number: 11928445
    Abstract: A complier produces a configuration file to configure a fracturable data path of a configurable unit in a coarse-grained reconfigurable processor to concurrently generate different address sequences generated using different address associated with different operations. The fracturable data path includes multiple computation stages respectively including a pipeline register. The compiler analyzes a first address calculation and a second address calculation and assigns a first set of stages to the first operation to generate the first address sequence and a second set of stages to the second operation to generate the second address sequence using the second set of stages, based on the analysis. A configuration file for the configurable unit is generated by the compiler that assigns the first set of stages to the first operation and the second set of stages to the second operation and includes two or more immediate values for each computation stage.
    Type: Grant
    Filed: January 19, 2023
    Date of Patent: March 12, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Raghu Prabhakar, David Brian Jackson, Scott Burson
  • Publication number: 20240073129
    Abstract: A computing system is disclosed, comprising a plurality of interconnected reconfigurable dataflow units (RDUs). Each RDU includes configurable units, internal networks, and external interfaces. The first configurable unit of the first RDU sends a request to access an external memory attached to the second RDU over its first internal network. The second configurable unit of the first RDU obtains a memory address for the request, determines an identifier for the second RDU, and sends the request, identifier, and memory address to the third configurable unit of the first RDU over its second internal network. The third configurable unit of the first RDU generates a routable address on the external network, synthesizes a payload, and sends it through an external network interface. The third configurable unit of the second RDU receives the payload, and the fourth configurable unit of the second RDU uses the address to access the external memory.
    Type: Application
    Filed: October 25, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Manish K. SHAH, Ram SIVARAMAKRISHNAN, Gregory Frederick GROHOSKI, Raghu PRABHAKAR
  • Publication number: 20240070106
    Abstract: A reconfigurable processing unit includes a first and second internal network, an interface to an external network, a first configurable unit coupled to the first internal network, a second configurable unit coupled to both internal networks, and a third configurable unit coupled to both the second internal network and the interface to the external network. The third configurable unit is configured to receive a payload containing a transaction type identifier and an identifier of the second configurable unit through the interface to the external network, and send a first packet including the transaction type identifier to the second configurable unit over the second internal network. The second configurable unit is configured to increment a counter in response to a particular transaction type identifier, and send a token to the first configurable unit over the first internal network while the counter is non-zero and the first configurable unit is executing.
    Type: Application
    Filed: October 25, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Manish K. SHAH, Ram SIVARAMAKRISHNAN, Gregory Frederick GROHOSKI, Raghu PRABHAKAR
  • Publication number: 20240069770
    Abstract: A system includes a coarse-grained reconfigurable (CGR) processor and a compiler configured to generate one or more configuration files for an application for execution on the CGR processor including an array of pattern compute units (PCUs) and pattern memory units (PMUs). A PCU is configured to perform an operation. A PMU comprises a plurality of data structures including a plurality of portions of operation-specific data related to the operation. The PMU is coupled to the PCU via a multi-segment datapath pipeline. The CGR processor is coupled to configure a segment of the datapath pipeline using a set of configurations bits corresponding to a portion of the operation-specific data related to the operation to activate to the segment, to further communicate the operation-specific data to the PCU via the activated segment. The CGR processor is coupled to switch among multiple PMU contexts in various segments sequentially to concurrently.
    Type: Application
    Filed: August 22, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventor: Raghu PRABHAKAR
  • Publication number: 20240070111
    Abstract: A reconfigurable processing unit is disclosed, comprising a first internal network and a second internal network with different protocols, an interface to an external network with a different protocol, a first configurable unit connected to the first internal network, a second configurable unit connected to both the first internal network and the second internal network, and a third configurable unit connected to both the second internal network and the interface to the external network. The third configurable unit is configured to receive a payload from the external network and send the transaction type identifier and the source application ID to the second configurable unit over the second internal network. The second configurable unit sends information to the first configurable unit based on the transaction type identifier and the source application ID matching the local application ID retrieved from the register.
    Type: Application
    Filed: October 25, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Manish K. SHAH, Ram SIVARAMAKRISHNAN, Gregory Frederick GROHOSKI, Raghu PRABHAKAR
  • Publication number: 20240073136
    Abstract: A reconfigurable processing unit is disclosed, comprising a first internal network and a second internal network with different protocols, an interface to an external network with a different protocol, a first configurable unit sending a request to access an external memory over the first internal network, a second configurable unit receiving the request on the first internal network, obtaining a memory address, determining an identifier for the target reconfigurable processing unit, and sending the request, identifier, and memory address over the second internal network, and a third configurable unit receiving the request, identifier, and memory address on the second internal network, determining a routable address on the external network based on the identifier, synthesizing a payload with the request, address, and identifier, and sending the payload to the routable address on the external network.
    Type: Application
    Filed: October 25, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Manish K. SHAH, Ram SIVARAMAKRISHNAN, Gregory Frederick GROHOSKI, Raghu PRABHAKAR
  • Publication number: 20240070113
    Abstract: A data processing system includes a coarse-grained reconfigurable (CGR) processor and a compiler configured to generate one or more configuration files for an application for execution on the CGR processor. The CGR processor includes an array of pattern compute units (PCUs) and pattern memory units (PMUs). A PCU comprises a plurality of single-instruction multiple data (SIMD) units configurable to form a datapath. The CGR processor is coupled to configure a datapath including a SIMD, using a set of configurations bits corresponding to an operation related to the task. The CGR processor is coupled to switch among the plurality of tasks and their corresponding PCU contexts during execution of the dataflow graph. The CGR processor is coupled to switch among tasks via static switching or dynamic switching, in response to the triggering of a task complete event generated by a preset counter, indicating completion of a current task.
    Type: Application
    Filed: August 22, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventor: Raghu PRABHAKAR
  • Publication number: 20240069959
    Abstract: A system includes a coarse-grained reconfigurable (CGR) processor and a compiler configured to generate one or more configuration files for an application for execution on the CGR processor including an array of pattern compute units (PCUs) and pattern memory units (PMUs). A PCU is configured to perform an operation. A PMU includes operation-specific data related to the operation. The PMU is coupled to the PCU via a multi-segment datapath pipeline. The CGR processor is coupled to configure a segment of the datapath pipeline using a set of configurations bits corresponding to the operation-specific data to activate to the segment, to communicate the operation-specific data to the PCU via the activated segment. A finite state machine (FSM) is configured to progress through a plurality of states corresponding to the plurality of PMU contexts and allow the PMU to switch among multiple PMU contexts sequentially or concurrently.
    Type: Application
    Filed: August 23, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventor: Raghu PRABHAKAR
  • Publication number: 20240037182
    Abstract: In a method, based on a left side matrix and a right side matrix having a shared dimension, a first Multiply Accumulate Arithmetic Logic Unit (MACC ALU) receives elements of a row of a first column-split matrix and elements of a column of a first column-split matrix. A second MACC ALU receives elements of a row of the second column-split matrix and elements of a column of the second row-split matrix. The first and a second column-split matrices comprise columns of the left side matrix and the first and second row-split matrices comprise rows of the right side matrix. The first and second MACC ALU concurrently compute partial dot products of the column and row elements and the second MACC ALU computes a sum of the partial dot products. A computing system can include the MACC ALUs in a matrix processing unit and can implement the method.
    Type: Application
    Filed: October 10, 2023
    Publication date: February 1, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Pramod NATARAJA, Raghu PRABHAKAR
  • Publication number: 20240037181
    Abstract: In a method a first and a second column-split matrix comprise columns of a left side matrix and a first and a second row-split matrix comprise rows of a right side matrix. A Matrix Processing Unit (MPU) receives column elements of a row of the first column-split matrix and row elements of a column of the first column-split matrix, A second MPU receives column elements of a row of the second column-split matrix and row elements of a column of the second row-split matrix. The first and second MPU concurrently compute partial dot products of the column and row elements and a third MPU computes a sum of the partial dot products. A computing system can include the MPUs and can implement the method.
    Type: Application
    Filed: October 10, 2023
    Publication date: February 1, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Pramod NATARAJA, Raghu PRABHAKAR
  • Patent number: 11886931
    Abstract: The technology disclosed relates to inter-node execution of configuration files on reconfigurable processors using network interface controller (NIC) buffers. In particular, the technology disclosed relates to a runtime logic that is configured to execute configuration files that define applications and application data for applications using a first reconfigurable processor connected to a first host, and a second reconfigurable processor connected to a second host. The first reconfigurable processor is configured to push input data for the applications in a first plurality of buffers. The first host is configured to cause a first network interface controller (NIC) to stream the input data to a second plurality of buffers from the first plurality of buffers. The second host is configured to cause a second NIC to stream the input data to the second reconfigurable processor from the second plurality of buffers.
    Type: Grant
    Filed: November 9, 2021
    Date of Patent: January 30, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Ram Sivaramakrishnan, Sumti Jairath, Emre Ali Burhan, Manish K. Shah, Raghu Prabhakar, Ravinder Kumar, Arnav Goel, Ranen Chatterjee, Gregory Frederick Grohoski, Kin Hing Leung, Dawei Huang, Manoj Unnikrishnan, Martin Russell Raumann, Bandish B. Shah
  • Patent number: 11886930
    Abstract: The technology disclosed relates to runtime execution of functions across reconfigurable processor. In particular, the technology disclosed relates to a runtime logic that is configured to execute a first set of functions in a plurality of functions and/or data therefor on a first reconfigurable processor, and a second set of functions in the plurality of functions and/or data therefor on additional reconfigurable processors. Functions in the second set of functions and/or the data therefor are transmitted to the additional reconfigurable processors using one or more of a first reconfigurable processor-to-additional reconfigurable processors buffers, and results of executing the functions and/or the data therefor on the additional reconfigurable processors are transmitted to the first reconfigurable processor using one or more of additional reconfigurable processors-to-first reconfigurable processor buffers.
    Type: Grant
    Filed: November 9, 2021
    Date of Patent: January 30, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Ram Sivaramakrishnan, Sumti Jairath, Emre Ali Burhan, Manish K. Shah, Raghu Prabhakar, Ravinder Kumar, Arnav Goel, Ranen Chatterjee, Gregory Frederick Grohoski, Kin Hing Leung, Dawei Huang, Manoj Unnikrishnan, Martin Russell Raumann, Bandish B. Shah
  • Publication number: 20240020265
    Abstract: A system with a cost estimation tool for estimating a realized bandwidth consumption of a logical edge between a logical producer unit and a logical consumer unit of an operation unit graph during placement and routing of the logical producer unit, the logical consumer unit, and the logical edge onto a reconfigurable processor is presented as well as a method of operating such a cost estimation tool and a non-transitory computer-readable storage medium including instructions that, when executed by a processing unit, cause the processing unit to operate such a cost estimation tool The cost estimation tool may be configured to determine the realized bandwidth consumption of the tentative assignment based on an upper bandwidth limit of the logical edge, an end-to-end bandwidth, a scaling factor of a realized bandwidth, and a congestion estimation of the physical link.
    Type: Application
    Filed: July 13, 2023
    Publication date: January 18, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Yue FU, Kin Hing LEUNG, Likun HAO, Arvind Krishna SUJEETH, Sumti JAIRATH, Andrew DENG, Chris RÉ, Raghu PRABHAKAR
  • Publication number: 20240020264
    Abstract: A cost estimation tool in a system for implementing an operation unit graph on a reconfigurable processor is presented as well as a method of operating a cost estimation tool for determining scaled logical edge bandwidths in an operation unit graph in preparation of placing and routing the operation unit graph onto a reconfigurable processor. The cost estimation tool may be configured to receive the operation unit graph, divide the operation unit graph in first and second subgraphs, determine maximum latencies of the first and second subgraphs, and determine a scaled logical edge bandwidth of a logical edge that couples a first logical unit of M logical units in the first subgraph with a second logical unit of N logical units in the first subgraph based on M, N, and scaled bandwidth limits of the M and N logical units.
    Type: Application
    Filed: July 13, 2023
    Publication date: January 18, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Yue FU, Kin Hing LEUNG, Joshua BROT, Arvind Krishna SUJEETH, Sumti JAIRATH, Andrew DENG, Chris RÉ, Raghu PRABHAKAR