Patents Assigned to SambaNova Systems, Inc.
  • Patent number: 11809908
    Abstract: A data processing system comprises a pool of reconfigurable data flow resources and a runtime processor. The pool of reconfigurable data flow resources includes arrays of physical configurable units and memory. The runtime processor includes logic to receive a plurality of configuration files for user applications. The configuration files include configurations of virtual data flow resources required to execute the user applications. The runtime processor also includes logic to allocate physical configurable units and memory in the pool of reconfigurable data flow resources to the virtual data flow resources and load the configuration files to the allocated physical configurable units. The runtime processor further includes logic to execute the user applications using the allocated physical configurable units and memory.
    Type: Grant
    Filed: July 7, 2020
    Date of Patent: November 7, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Ravinder Kumar, Conrad Alexander Turlik, Arnav Goel, Qi Zheng, Raghunath Shenbagam, Anand Misra, Ananda Reddy Vayyala
  • Publication number: 20230350822
    Abstract: A method for integrating buffer views into buffer access operations in a reconfigurable computing environment includes detecting, in an instruction stream for a reconfigurable dataflow unit (RDU), a buffer allocation statement comprising a tensor indexing expression, a buffer view indicator and one or more buffer view parameters. The method also includes lowering the buffer view parameters into the indexing expression according to the buffer view indicator to produce a modified tensor indexing expression, removing the buffer view indicator from the buffer allocation statement to produce a modified buffer allocation statement and allocating a buffer according to the modified buffer allocation statement. The modified buffer allocation statement may include the modified tensor indexing expression. A corresponding system and computer readable medium are also disclosed herein.
    Type: Application
    Filed: October 13, 2022
    Publication date: November 2, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Yaqi ZHANG, Matthew FELDMAN
  • Publication number: 20230333879
    Abstract: A data processing system is presented that is configured as a server in a client-server configuration for executing applications that a client in the client-server configuration can offload as execution tasks for execution on the server. The data processing system includes a reconfigurable processor, a storage device that stores configuration files for the applications, and a host processor that is coupled to the storage device and to the reconfigurable processor. The host processor is configured to receive an execution task of the execution tasks with an identifier of an application from the client, retrieve a configuration file that is associated with the application from the storage device using the identifier of the application, configure the reconfigurable processor with the configuration file, and start execution of the application on the reconfigurable processor, whereby the reconfigurable processor provides output data of the execution of the application to the client.
    Type: Application
    Filed: April 12, 2023
    Publication date: October 19, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Arnav GOEL, Ravinder KUMAR, Qi ZHENG, Milad SHARIF, Jiayu BAI, Neal SANGHVI
  • Publication number: 20230325346
    Abstract: A method in a reconfigurable computing system includes receiving a user program for execution on a reconfigurable dataflow computing system, comprising a grid of compute units and grid of memory units interconnected with a switching array. The user program includes multiple tensor-based algebraic expressions that are converted to an intermediate representation comprising one or more logical operations executable via dataflow through compute units. These one or more logical operations are preceded by or followed by a buffer, each buffer corresponding to one or more memory units. The method includes determining whether splitting a selected buffer yields a reduced cost and then splitting the selected buffer, in response to the determining step, to produce first and second buffers. Dataflow through memory units corresponding to the first and second buffers is controlled by one or more memory units within the grid of memory units. Buffer splitting optimization reduces memory unit consumption.
    Type: Application
    Filed: April 4, 2023
    Publication date: October 12, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: David Alan KOEPLINGER, Weihang FAN
  • Publication number: 20230325163
    Abstract: The technology disclosed relates to storing a dataflow graph with a plurality of compute nodes that transmit data along data connections, and controlling data transmission between compute nodes in the plurality of compute nodes along the data connections by using control connections to control writing of data.
    Type: Application
    Filed: June 7, 2023
    Publication date: October 12, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Weiwei CHEN, Raghu PRABHAKAR, David Alan KOEPLINGER, Sitanshu GUPTA, Ruddhi CHAPHEKAR, Ajit PUNJ, Sumti JAIRATH
  • Publication number: 20230325312
    Abstract: A method for merging buffers and associated operations includes receiving a compute graph for a reconfigurable dataflow computing system and conducting a buffer allocation and merging process responsive to determining that a first operation specified by a first operation node is a memory indexing operation and that the first operation node is a producer for exactly one consuming node that specifies a second operation. The buffer allocation and merging process may include replacing the first operation node and the consuming node with a merged buffer node within the graph responsive to determining that the first operation and the second operation can be merged into a merged indexing operation and that the resource cost of the merged node is less than the sum of the resource costs of separate buffer nodes. A corresponding system and computer readable medium are also disclosed herein.
    Type: Application
    Filed: October 27, 2022
    Publication date: October 12, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: David Alan KOEPLINGER, Adam BORDELON, Weihang FAN, Kevin BROWN, Weiwei CHEN
  • Patent number: 11782729
    Abstract: A data processing system comprises a pool of reconfigurable data flow resources and a runtime processor. The pool of reconfigurable data flow resources includes arrays of physical configurable units and memory. The runtime processor includes logic to receive a plurality of configuration files for user applications. The configuration files include configurations of virtual data flow resources required to execute the user applications. The runtime processor also includes logic to allocate physical configurable units and memory in the pool of reconfigurable data flow resources to the virtual data flow resources and load the configuration files to the allocated physical configurable units. The runtime processor further includes logic to execute the user applications using the allocated physical configurable units and memory.
    Type: Grant
    Filed: August 18, 2020
    Date of Patent: October 10, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Gregory Frederick Grohoski, Manish K. Shah, Raghu Prabhakar, Mark Luttrell, Ravinder Kumar, Kin Hing Leung, Ranen Chatterjee, Sumti Jairath, David Alan Koeplinger, Ram Sivaramakrishnan, Matthew Thomas Grimm
  • Patent number: 11782856
    Abstract: A data processing system comprises memory, compile time logic, runtime logic, and instrumentation profiling logic. The memory stores a dataflow graph for an application. The dataflow graph has a plurality of compute nodes that are configured to be producers to produce data for execution of the application, and to be consumers to consume the data for execution of the application. The compile time logic partitions execution of the dataflow graph into stages. Each of the stages has one or more compute nodes, one or more producers, and one or more consumers. The runtime logic determines a processing latency for each of the stages by calculating time elapsed between producers of a particular stage receiving input data and consumers of the particular stage receiving output data. The instrumentation profiling logic generates performance statistics for the dataflow graph based on the processing latency determined for each of the stages.
    Type: Grant
    Filed: September 20, 2021
    Date of Patent: October 10, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Raghu Prabhakar, Matthew Thomas Grimm, Sumti Jairath, Kin Hing Leung, Sitanshu Gupta, Yuan Lin, Luca Boasso
  • Patent number: 11782760
    Abstract: A method for executing applications in a system comprising general hardware and reconfigurable hardware includes accessing a first execution file comprising metadata storing a first priority indicator associated with a first application, and a second execution file comprising metadata storing a second priority indicator associated with a second application. In an example, use of the reconfigurable hardware is interleaved between the first application and the second application, and the interleaving is scheduled to take into account (i) workload of the reconfigurable hardware and (ii) the first priority indicator and the second priority indicator associated with the first application and the second application, respectively. In an example, when the reconfigurable hardware is used by one of the first and second applications, the general hardware is used by another of the first and second applications.
    Type: Grant
    Filed: February 25, 2021
    Date of Patent: October 10, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Anand Misra, Arnav Goel, Qi Zheng, Raghunath Shenbagam, Ravinder Kumar
  • Publication number: 20230315410
    Abstract: A method comprises a compiler analyzing a graph to determine a pipeline of operators based on a shared dimension of input and output tensors among the operators. The operators are included in the graph and the graph corresponds to a dataflow application. The compiler determines a tiling decision associated with the pipeline and a tiling cost associated with the tiling decision. The tiling decision can comprise a tile shape to slice tensors of operators of the pipeline. Based on the tiling cost, the compiler determines that the tiling decision improves an optimization objective and includes the pipeline and tiling decision in mapping decisions associated with executing the application on a computing system. The compiler can apply a tiling cost model to determine the tiling costs. A computer program product and a computing system can implement the method.
    Type: Application
    Filed: March 31, 2023
    Publication date: October 5, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Bowen YANG, Zhuo CHEN, Chen LIU, Fei WANG, Ruobing WANG, Qinghua Li, Weiwei CHEN, Junjue WANG, Sumti JAIRATH
  • Publication number: 20230315407
    Abstract: According to a computing method a compiler determines a recompute node included in a dataflow application and a checkpoint tensor produced by the recompute node. The compiler determines a recompute cost to recompute the checkpoint tensor, and a memory cost to checkpoint the checkpoint tensor in a memory. Based on the recompute cost and/or the memory cost, the compiler determines a solution cost and compares the solution cost to a solution threshold. Based on comparing the solution cost to the solution threshold, the compiler determines a checkpoint solution to execute the dataflow application. The checkpoint solution can comprise recomputing or checkpointing the checkpoint tensor. In some implementations, the compiler can determine a recompute ratio of the recompute cost to the memory cost and can compare the recompute ratio to the solution threshold. A computer program product and a computing system can implement aspects of the method.
    Type: Application
    Filed: March 31, 2023
    Publication date: October 5, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Bowen YANG, Zhuo CHEN, Fei WANG, Venkat Krishna SRINIVASAN, Chen LIU, Junjue WANG, Arvind Krishna SUJEETH, Sumti JAIRATH
  • Publication number: 20230315802
    Abstract: A method comprises a compiler generating a MI (mixed integer) model to determine mapping decisions to map a dataflow application to hardware of a computing system to execute the application. The MI model comprises MI equations to solve by an MI solver. The MI equations include equations of an objective function corresponding to an optimization objective. The MI equations can comprise decision variables and equations and constraint variables and equations. The compiler outputs the MI model to the MI solver and invokes the MI solver to compute an MI solution comprising solutions to equations among the equations included in the MI model. The compiler receives the MI solution and generates a globally optimized mapping decision based on the MI solution. The MI solver can comprise a commercial program to solve MI linear equations. A computer program product and a computing system can implement the method.
    Type: Application
    Filed: March 29, 2023
    Publication date: October 5, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Junjue WANG, Blaine Burton RISTER, Zhichao MA, Zhuo CHEN, Andrew DENG, Sumti JAIRATH, Arvind Krishna SUJEETH
  • Publication number: 20230315406
    Abstract: In a method a compiler performs a trial compilation to a low level (LL) intermediate representation (IR) of a high level (HL) decision to execute a dataflow application on a computing system. The LLIR comprises hardware resources to execute the application based on the HL decision and the compiler determines a trial result based on LL execution metrics associated with the trail compilation. The compiler performs a trial compilation of a second HL decision to a second LLIR and determines a trial result based on LL execution metrics associated with the second trail compilation. The compiler evaluates the trial results and, based on the evaluations, selects one or both of the HL decisions for executing the dataflow application. A computer program product and a computing system can implement the method.
    Type: Application
    Filed: March 31, 2023
    Publication date: October 5, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Blaine RISTER, Haocheng DONG, David Alan KOEPLINGER, Yaqi ZHANG, Junjue WANG, Zhuo CHEN, Arvind SUJEETH
  • Publication number: 20230315414
    Abstract: A method comprises a compiler determining operators and matrices of an application model. The compiler generates a dimension-based search space (DBSS) comprising Named Nodes corresponding to the operators. The Named Nodes comprise a Named DIM corresponding to a matrix associated with an operator. The Named DIM comprises a DIM Name corresponding to a dimension of a row or column of the matrix. The DBSS comprises an application programming interface (API) to determine operators, matrices, and/or attributes of operators/matrices of the application model using the DIM Names. The method includes the compiler determining the operator, the matrix, and the Named DIM and generating an entry in the DBSS that includes a Named Node corresponding to the operator, a Named DIM corresponding to the matrix and including the DIM Name. A computing system and/or a computer program product can implement the method.
    Type: Application
    Filed: March 29, 2023
    Publication date: October 5, 2023
    Applicants: SambaNova Systems, Inc., SambaNova Systems, Inc.
    Inventors: Bowen YANG, Fei WANG, Shengyue HUO
  • Publication number: 20230315411
    Abstract: A method for improving throughput in a reconfigurable computing system includes detecting, in an algebraic representation of a computing task for a reconfigurable dataflow processor, an outer meta-pipeline loop, detecting an inner meta-pipeline loop nested within the outer meta-pipeline loop, and determining that the inner meta-pipeline loop and the outer meta-pipeline loop each conduct a common operation. The method also includes fusing the common operation for the inner meta-pipeline loop and the outer meta-pipeline loop into a single operation within the inner meta-pipeline loop. The instances of the common operation may be fused if the output of a first instance of the common operation is the source for a second instance of the common operation. Examples of the common operation include an accumulator operation, a re-read operation, and a temporal (chip buffer synchronized) operation such as a temporal concatenation operation and a temporal slicing operation.
    Type: Application
    Filed: April 4, 2023
    Publication date: October 5, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Fei WANG, Weihang FAN, David Alan KOEPLINGER
  • Publication number: 20230315322
    Abstract: A system includes a reconfigurable dataflow processor that comprises an array of compute units and an array of memory units interconnected with a switching fabric. The reconfigurable dataflow processor can be configured to execute a plurality of tensor indexing expressions and access the array of memory units according to a memory unit partitioning solution.
    Type: Application
    Filed: June 12, 2023
    Publication date: October 5, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Matthew FELDMAN, Yaqi ZHANG
  • Publication number: 20230315624
    Abstract: A processor has multiple memory interfaces and a memory interleaver controlling access to the memory interfaces. The memory interfaces may each couple with one or more memory devices. The number of memory devices coupled to the different memory interfaces may be unequal. The memory interleaver determines a memory region from a logical address, and a region relative address. It determines the interleave factor IF corresponding to the memory region. It performs an integer division to obtain a device line address, and a modulo operation to obtain an uncorrected channel address. The memory interleaver may add a region start address associated with the memory region to the device line address to obtain a physical line address. It may correct the uncorrected channel address, based on the memory region, to obtain a physical channel address. Some implementations use configuration memories to allow flexibility, other implementations are hardwired for a particular memory architecture.
    Type: Application
    Filed: June 7, 2023
    Publication date: October 5, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Paul JORDAN, Manish K. SHAH
  • Publication number: 20230305860
    Abstract: Argument registers in a reconfigurable processor are loaded from a runtime program running on a host processor. The runtime program stores a configuration file in a memory. A program load controller reads the configuration file from the memory and distributes it to configurable units in in the reconfigurable processor which sequentially shift it into a shift register of the configuration data store. The runtime program stores an argument load file in the memory and a fast argument load (FAL) controller reads the argument load file from memory and distributes (value, control) tuples to the configuration units in the reconfigurable processor. The configurable units process the tuples by writing the value directly into an argument register made up of a portion of the shift register in the configuration data store specified by the control of the tuple without shifting the value through the shift register.
    Type: Application
    Filed: February 2, 2023
    Publication date: September 28, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Manish K. SHAH, Gregory Frederick GROHOSKI
  • Publication number: 20230305881
    Abstract: A data processing system is presented that includes a communication link, a runtime processor, and one or more reconfigurable processors. A reconfigurable processor includes first and second dies arranged in a package, having respective K and L arrays of coarse-grained reconfigurable (CGR) units, and respective first and second communication link interfaces coupled to the communication link. The runtime processor is adapted for configuring the first communication link interface to provide access to the K arrays of CGR units through the communication link from a first physical function driver and from up to M virtual function drivers, and for configuring the second communication link interface to provide access to the K arrays of CGR units of the first die and to the L arrays of CGR units of the second die through the communication link from a second physical function driver and from up to N virtual function drivers.
    Type: Application
    Filed: February 1, 2023
    Publication date: September 28, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Manish K. SHAH, Paul JORDAN, Maran WILSON, Ravinder KUMAR
  • Publication number: 20230305823
    Abstract: A method in a reconfigurable computing system includes connecting a plurality of tensor consumers to their corresponding tensor producers via skip-buffers, which generates a plurality of skip-buffers. The method includes determining that at least one skip-buffer of the plurality of skip-buffers corresponding to a first set of tensor consumers and at least one skip-buffer of the plurality of skip-buffers corresponding to a second set of tensor consumers, are compatible to wholly or partially merge. The method also includes merging, wholly or partially, the compatible skip-buffers to produce a merged skip-buffer having a minimal buffer depth. The described method may reduce memory unit consumption and latency.
    Type: Application
    Filed: March 27, 2023
    Publication date: September 28, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Fei WANG, David Alan KOEPLINGER, Kevin BROWN, Weiwei CHEN