Patents Assigned to SambaNova Systems, Inc.
  • Publication number: 20240070111
    Abstract: A reconfigurable processing unit is disclosed, comprising a first internal network and a second internal network with different protocols, an interface to an external network with a different protocol, a first configurable unit connected to the first internal network, a second configurable unit connected to both the first internal network and the second internal network, and a third configurable unit connected to both the second internal network and the interface to the external network. The third configurable unit is configured to receive a payload from the external network and send the transaction type identifier and the source application ID to the second configurable unit over the second internal network. The second configurable unit sends information to the first configurable unit based on the transaction type identifier and the source application ID matching the local application ID retrieved from the register.
    Type: Application
    Filed: October 25, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Manish K. SHAH, Ram SIVARAMAKRISHNAN, Gregory Frederick GROHOSKI, Raghu PRABHAKAR
  • Publication number: 20240070113
    Abstract: A data processing system includes a coarse-grained reconfigurable (CGR) processor and a compiler configured to generate one or more configuration files for an application for execution on the CGR processor. The CGR processor includes an array of pattern compute units (PCUs) and pattern memory units (PMUs). A PCU comprises a plurality of single-instruction multiple data (SIMD) units configurable to form a datapath. The CGR processor is coupled to configure a datapath including a SIMD, using a set of configurations bits corresponding to an operation related to the task. The CGR processor is coupled to switch among the plurality of tasks and their corresponding PCU contexts during execution of the dataflow graph. The CGR processor is coupled to switch among tasks via static switching or dynamic switching, in response to the triggering of a task complete event generated by a preset counter, indicating completion of a current task.
    Type: Application
    Filed: August 22, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventor: Raghu PRABHAKAR
  • Publication number: 20240073136
    Abstract: A reconfigurable processing unit is disclosed, comprising a first internal network and a second internal network with different protocols, an interface to an external network with a different protocol, a first configurable unit sending a request to access an external memory over the first internal network, a second configurable unit receiving the request on the first internal network, obtaining a memory address, determining an identifier for the target reconfigurable processing unit, and sending the request, identifier, and memory address over the second internal network, and a third configurable unit receiving the request, identifier, and memory address on the second internal network, determining a routable address on the external network based on the identifier, synthesizing a payload with the request, address, and identifier, and sending the payload to the routable address on the external network.
    Type: Application
    Filed: October 25, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Manish K. SHAH, Ram SIVARAMAKRISHNAN, Gregory Frederick GROHOSKI, Raghu PRABHAKAR
  • Publication number: 20240069959
    Abstract: A system includes a coarse-grained reconfigurable (CGR) processor and a compiler configured to generate one or more configuration files for an application for execution on the CGR processor including an array of pattern compute units (PCUs) and pattern memory units (PMUs). A PCU is configured to perform an operation. A PMU includes operation-specific data related to the operation. The PMU is coupled to the PCU via a multi-segment datapath pipeline. The CGR processor is coupled to configure a segment of the datapath pipeline using a set of configurations bits corresponding to the operation-specific data to activate to the segment, to communicate the operation-specific data to the PCU via the activated segment. A finite state machine (FSM) is configured to progress through a plurality of states corresponding to the plurality of PMU contexts and allow the PMU to switch among multiple PMU contexts sequentially or concurrently.
    Type: Application
    Filed: August 23, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventor: Raghu PRABHAKAR
  • Publication number: 20240070106
    Abstract: A reconfigurable processing unit includes a first and second internal network, an interface to an external network, a first configurable unit coupled to the first internal network, a second configurable unit coupled to both internal networks, and a third configurable unit coupled to both the second internal network and the interface to the external network. The third configurable unit is configured to receive a payload containing a transaction type identifier and an identifier of the second configurable unit through the interface to the external network, and send a first packet including the transaction type identifier to the second configurable unit over the second internal network. The second configurable unit is configured to increment a counter in response to a particular transaction type identifier, and send a token to the first configurable unit over the first internal network while the counter is non-zero and the first configurable unit is executing.
    Type: Application
    Filed: October 25, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Manish K. SHAH, Ram SIVARAMAKRISHNAN, Gregory Frederick GROHOSKI, Raghu PRABHAKAR
  • Publication number: 20240069770
    Abstract: A system includes a coarse-grained reconfigurable (CGR) processor and a compiler configured to generate one or more configuration files for an application for execution on the CGR processor including an array of pattern compute units (PCUs) and pattern memory units (PMUs). A PCU is configured to perform an operation. A PMU comprises a plurality of data structures including a plurality of portions of operation-specific data related to the operation. The PMU is coupled to the PCU via a multi-segment datapath pipeline. The CGR processor is coupled to configure a segment of the datapath pipeline using a set of configurations bits corresponding to a portion of the operation-specific data related to the operation to activate to the segment, to further communicate the operation-specific data to the PCU via the activated segment. The CGR processor is coupled to switch among multiple PMU contexts in various segments sequentially to concurrently.
    Type: Application
    Filed: August 22, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventor: Raghu PRABHAKAR
  • Publication number: 20240069880
    Abstract: In a method a computer-implemented efficiency analyzer selects operators from an intermediate representation of a dataflow program. The operators are included in a mapping of the operators to hardware of a computing system to execute the dataflow program. Based on the mapping and a description of the hardware, the efficiency analyzer computes an execution metric associated with executing the operators on the hardware. Based on the execution metric and hardware description, the efficiency analyzer determines an inefficiency metric, and based on the inefficiency metric, the efficiency analyzer determines an inefficiency associated with the dataflow program. The computing system to execute the dataflow program can comprise a coarse grain computing system and the hardware can include a reconfigurable processor of the computing system. A computer program product and a computing system to a the dataflow program can implement the method.
    Type: Application
    Filed: November 8, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Blaine RISTER, Qingjian LI, Bowen YANG, Junjue WANG, Chen LIU, Zhuo CHEN, Arvind SUJEETH, Sumti JAIRATH
  • Publication number: 20240073129
    Abstract: A computing system is disclosed, comprising a plurality of interconnected reconfigurable dataflow units (RDUs). Each RDU includes configurable units, internal networks, and external interfaces. The first configurable unit of the first RDU sends a request to access an external memory attached to the second RDU over its first internal network. The second configurable unit of the first RDU obtains a memory address for the request, determines an identifier for the second RDU, and sends the request, identifier, and memory address to the third configurable unit of the first RDU over its second internal network. The third configurable unit of the first RDU generates a routable address on the external network, synthesizes a payload, and sends it through an external network interface. The third configurable unit of the second RDU receives the payload, and the fourth configurable unit of the second RDU uses the address to access the external memory.
    Type: Application
    Filed: October 25, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Manish K. SHAH, Ram SIVARAMAKRISHNAN, Gregory Frederick GROHOSKI, Raghu PRABHAKAR
  • Patent number: 11916559
    Abstract: A DLL includes a delay line with two phase outputs, a gater coupled with the delay line phase outputs, a PFD coupled with gater outputs, a PD coupled with PFD outputs, a retimer coupled with PD outputs, and a loop filter with inputs coupled with the retimer and a speed control output coupled with the delay line. The gater passes signals on its two inputs to its two outputs, apart from a first pulse on its first input. The PD determines if the second gated signal leads or lags the first gated signal. The retimer retimes PD output signals to be aligned with a delay line input signal. The loop filter uses the retimed PD output signals to determine if the delay line should delay more or delay less, and outputs a speed control signal to control the delay line speed.
    Type: Grant
    Filed: December 29, 2022
    Date of Patent: February 27, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Fahim Ur Rahman, Jinuk Shin
  • Publication number: 20240054099
    Abstract: A method for placing, routing and using compute units and memory units in a reconfigurable computing grid includes receiving a placement graph for a computing task that defines a set of unplaced memory units, a set of unplaced compute units and data connections between the unplaced memory units and the unplaced compute units, the data connections comprising primary connections corresponding to the primary ports of the unplaced compute units and secondary connections corresponding to the secondary ports of the unplaced compute units. The method also includes forming a subgraph for each unplaced memory unit having a primary connection, each subgraph comprising the unplaced memory unit and each unplaced compute unit connected to the unplaced memory unit via a primary connection. The method also includes placing each formed subgraph as a cluster on the reconfigurable computing grid. A corresponding computer program product and system are also disclosed herein.
    Type: Application
    Filed: December 16, 2022
    Publication date: February 15, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Kin Hing LEUNG, Feng SHENG, Ajit PUNJ
  • Patent number: 11893424
    Abstract: A system for training parameters of a neural network includes a processing node with a processor reconfigurable at a first level of configuration granularity and a controller reconfigurable at a finer level of configuration granularity. The processor is configured to execute a first dataflow segment of the neural network with training data to generate a predicted output value using a set of neural network parameters, calculate a first intermediate result for a parameter based on the predicted output value, a target output value, and a parameter gradient, and provide the first intermediate result to the controller. The controller is configured to receive a second intermediate result over a network, and execute a second dataflow segment, dependent upon the first intermediate result and the second intermediate result, to generate a third intermediate result indicative of an update of the parameter.
    Type: Grant
    Filed: January 24, 2022
    Date of Patent: February 6, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Martin Russell Raumann, Qi Zheng, Bandish B. Shah, Ravinder Kumar, Kin Hing Leung, Sumti Jairath, Gregory Frederick Grohoski
  • Publication number: 20240037181
    Abstract: In a method a first and a second column-split matrix comprise columns of a left side matrix and a first and a second row-split matrix comprise rows of a right side matrix. A Matrix Processing Unit (MPU) receives column elements of a row of the first column-split matrix and row elements of a column of the first column-split matrix, A second MPU receives column elements of a row of the second column-split matrix and row elements of a column of the second row-split matrix. The first and second MPU concurrently compute partial dot products of the column and row elements and a third MPU computes a sum of the partial dot products. A computing system can include the MPUs and can implement the method.
    Type: Application
    Filed: October 10, 2023
    Publication date: February 1, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Pramod NATARAJA, Raghu PRABHAKAR
  • Publication number: 20240037063
    Abstract: A placer and router for an iterative placement and routing of a sorted operation unit graph on a reconfigurable processor is presented as well as a method of operating a placer and router for an iterative placement and routing of a sorted operation unit graph on a reconfigurable processor. The placer and router is configured to receive an architectural specification of the reconfigurable processor and the sorted operation unit graph having an ordered sequence of nodes and edges that interconnect nodes in the ordered sequence of nodes. The placer and router is further configured to iteratively assign nodes of the sorted operation unit graph to locations on the reconfigurable processor followed by an assignment of edges that connect nodes that were assigned in the current iteration and nodes that were assigned in previous iterations to interconnection resources of the reconfigurable processor.
    Type: Application
    Filed: July 25, 2023
    Publication date: February 1, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Hong SUH, Sumti JAIRATH
  • Publication number: 20240037061
    Abstract: A sorting tool for determining an ordered sequence of nodes in an operation unit graph for placing and routing the operation unit graph onto a reconfigurable processor is presented as well as a method of operating a sorting tool for determining an ordered sequence of nodes in an operation unit graph for placing and routing the operation unit graph onto a reconfigurable processor. The sorting tool is configured to receive the operation unit graph including a set of unsorted nodes and edges that interconnect nodes in the set of unsorted nodes, determine an ordered sequence of the nodes in the operation unit graph, and provide the ordered sequence of nodes for the placing and routing of the operation unit graph onto the reconfigurable processor.
    Type: Application
    Filed: July 25, 2023
    Publication date: February 1, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Hong SUH, Sumti JAIRATH
  • Publication number: 20240037182
    Abstract: In a method, based on a left side matrix and a right side matrix having a shared dimension, a first Multiply Accumulate Arithmetic Logic Unit (MACC ALU) receives elements of a row of a first column-split matrix and elements of a column of a first column-split matrix. A second MACC ALU receives elements of a row of the second column-split matrix and elements of a column of the second row-split matrix. The first and a second column-split matrices comprise columns of the left side matrix and the first and second row-split matrices comprise rows of the right side matrix. The first and second MACC ALU concurrently compute partial dot products of the column and row elements and the second MACC ALU computes a sum of the partial dot products. A computing system can include the MACC ALUs in a matrix processing unit and can implement the method.
    Type: Application
    Filed: October 10, 2023
    Publication date: February 1, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Pramod NATARAJA, Raghu PRABHAKAR
  • Publication number: 20240036871
    Abstract: A placer and router for an iterative placement and routing of a sorted operation unit graph on a reconfigurable processor is presented as well as a method of operating a placer and router for an iterative placement and routing of a sorted operation unit graph on a reconfigurable processor. The placer and router is configured to receive an architectural specification of the reconfigurable processor and the sorted operation unit graph having an ordered sequence of nodes and edges that interconnect nodes in the ordered sequence of nodes. The placer and router is further configured to provide an assignment of nodes of the sorted operation unit graph to locations on the reconfigurable processor and an assignment of edges of the sorted operation unit graph to physical links and switches of the reconfigurable processor.
    Type: Application
    Filed: July 25, 2023
    Publication date: February 1, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Hong SUH, Sumti JAIRATH
  • Patent number: 11886930
    Abstract: The technology disclosed relates to runtime execution of functions across reconfigurable processor. In particular, the technology disclosed relates to a runtime logic that is configured to execute a first set of functions in a plurality of functions and/or data therefor on a first reconfigurable processor, and a second set of functions in the plurality of functions and/or data therefor on additional reconfigurable processors. Functions in the second set of functions and/or the data therefor are transmitted to the additional reconfigurable processors using one or more of a first reconfigurable processor-to-additional reconfigurable processors buffers, and results of executing the functions and/or the data therefor on the additional reconfigurable processors are transmitted to the first reconfigurable processor using one or more of additional reconfigurable processors-to-first reconfigurable processor buffers.
    Type: Grant
    Filed: November 9, 2021
    Date of Patent: January 30, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Ram Sivaramakrishnan, Sumti Jairath, Emre Ali Burhan, Manish K. Shah, Raghu Prabhakar, Ravinder Kumar, Arnav Goel, Ranen Chatterjee, Gregory Frederick Grohoski, Kin Hing Leung, Dawei Huang, Manoj Unnikrishnan, Martin Russell Raumann, Bandish B. Shah
  • Patent number: 11886931
    Abstract: The technology disclosed relates to inter-node execution of configuration files on reconfigurable processors using network interface controller (NIC) buffers. In particular, the technology disclosed relates to a runtime logic that is configured to execute configuration files that define applications and application data for applications using a first reconfigurable processor connected to a first host, and a second reconfigurable processor connected to a second host. The first reconfigurable processor is configured to push input data for the applications in a first plurality of buffers. The first host is configured to cause a first network interface controller (NIC) to stream the input data to a second plurality of buffers from the first plurality of buffers. The second host is configured to cause a second NIC to stream the input data to the second reconfigurable processor from the second plurality of buffers.
    Type: Grant
    Filed: November 9, 2021
    Date of Patent: January 30, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Ram Sivaramakrishnan, Sumti Jairath, Emre Ali Burhan, Manish K. Shah, Raghu Prabhakar, Ravinder Kumar, Arnav Goel, Ranen Chatterjee, Gregory Frederick Grohoski, Kin Hing Leung, Dawei Huang, Manoj Unnikrishnan, Martin Russell Raumann, Bandish B. Shah
  • Publication number: 20240020265
    Abstract: A system with a cost estimation tool for estimating a realized bandwidth consumption of a logical edge between a logical producer unit and a logical consumer unit of an operation unit graph during placement and routing of the logical producer unit, the logical consumer unit, and the logical edge onto a reconfigurable processor is presented as well as a method of operating such a cost estimation tool and a non-transitory computer-readable storage medium including instructions that, when executed by a processing unit, cause the processing unit to operate such a cost estimation tool The cost estimation tool may be configured to determine the realized bandwidth consumption of the tentative assignment based on an upper bandwidth limit of the logical edge, an end-to-end bandwidth, a scaling factor of a realized bandwidth, and a congestion estimation of the physical link.
    Type: Application
    Filed: July 13, 2023
    Publication date: January 18, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Yue FU, Kin Hing LEUNG, Likun HAO, Arvind Krishna SUJEETH, Sumti JAIRATH, Andrew DENG, Chris RÉ, Raghu PRABHAKAR
  • Publication number: 20240020264
    Abstract: A cost estimation tool in a system for implementing an operation unit graph on a reconfigurable processor is presented as well as a method of operating a cost estimation tool for determining scaled logical edge bandwidths in an operation unit graph in preparation of placing and routing the operation unit graph onto a reconfigurable processor. The cost estimation tool may be configured to receive the operation unit graph, divide the operation unit graph in first and second subgraphs, determine maximum latencies of the first and second subgraphs, and determine a scaled logical edge bandwidth of a logical edge that couples a first logical unit of M logical units in the first subgraph with a second logical unit of N logical units in the first subgraph based on M, N, and scaled bandwidth limits of the M and N logical units.
    Type: Application
    Filed: July 13, 2023
    Publication date: January 18, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Yue FU, Kin Hing LEUNG, Joshua BROT, Arvind Krishna SUJEETH, Sumti JAIRATH, Andrew DENG, Chris RÉ, Raghu PRABHAKAR