Patents Assigned to SambaNova Systems, Inc.
  • Patent number: 11983141
    Abstract: A system for executing an application on a pool of reconfigurable processors with first and second reconfigurable processors having first and second architectures that are different from each other is presented. The system comprises an archive of configuration files with first and second configuration files for executing the application on the first and second reconfigurable processors, respectively, and a host system that is operatively coupled to the first and second reconfigurable processors. The host system comprises a runtime processor that allocates reconfigurable processors for executing the application and an auto-discovery module that is configured to perform discovery of whether the reconfigurable processors include at least one of the first reconfigurable processors and whether the reconfigurable processors include at least one of the second reconfigurable processors.
    Type: Grant
    Filed: September 9, 2022
    Date of Patent: May 14, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Greg Dykema, Maran Wilson, Guoyao Feng, Kuan Zhou, Tianyu Sun, Taylor Lee, Kin Hing Leung, Arnav Goel, Conrad Turlik, Milad Sharif
  • Patent number: 11983509
    Abstract: A floating-point accumulator circuit includes an addend input register having an addend exponent and an addend significand and an accumulation register with a first portion to hold a representation of an accumulation exponent and a second portion to hold a representation of an accumulation significand. A control circuit is also included to generate an accumulator zero control signal and an addend zero control signal based on the addend exponent and the accumulation exponent. It also includes an adder circuit with an output an input of the accumulation register. A first zeroing circuit sends either a zero or a value based on the addend significand to a first input of the adder circuit based on the addend zero control signal, and a second zeroing circuit sends either zeros or a value based on the accumulator significand to a second input of the adder circuit, based on the accumulator zero control signal.
    Type: Grant
    Filed: September 12, 2022
    Date of Patent: May 14, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Vojin G. Oklobdzija, Matthew M. Kim
  • Patent number: 11983140
    Abstract: A reconfigurable data processor comprises a bus system, and an array of configurable units connected to the bus system, configurable units in the array including configuration data stores to store unit files comprising a plurality of sub-files of configuration data particular to the corresponding configurable units. A configuration unload controller connected to the bus system, including logic to execute an array configuration unload process, including distributing a command to a plurality of the configurable units in the array to unload the unit files particular to the corresponding configurable units, the unit files each comprising a plurality of ordered sub-files, receiving sub-files via the bus system from the array of configurable units, and assembling an unload configuration file by arranging the received sub-files in memory according to the configurable unit of the unit file of which the sub-file is a part, and the order of the sub-file in the unit file.
    Type: Grant
    Filed: November 22, 2021
    Date of Patent: May 14, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Manish K. Shah, Ram Sivaramakrishnan, Mark Luttrell, David B. Jackson, Raghu Prabhakar, Sumti Jairath, Gregory Frederick Grohoski, Pramod Nataraja
  • Patent number: 11971846
    Abstract: A logic unit in an array of processing units is configurable to consume source tokens and a status signal and to produce barrier tokens and an enable signal based on the source tokens and the status signal.
    Type: Grant
    Filed: February 14, 2023
    Date of Patent: April 30, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Raghu Prabhakar, Manish K. Shah, Ram Sivaramakrishnan, Pramod Nataraja, David Brian Jackson, Gregory Frederick Grohoski
  • Patent number: 11967955
    Abstract: A clocked storage element comprises a first latch having an input data node, a clock input node and a first latch output data node, and a second latch having an input connected to the first latch output data node, a clock input node and a second latch output data node. The first and second latches can have a clocked pull-up current path consisting of two p-channel transistors between their respective output data nodes and the VDD supply line, and a clocked pull-down current path consisting of two n-channel transistors between their respective output data nodes and the VSS supply line.
    Type: Grant
    Filed: January 12, 2023
    Date of Patent: April 23, 2024
    Assignee: SambaNova Systems, Inc.
    Inventor: Vojin G. Oklobdzija
  • Patent number: 11961575
    Abstract: An integrated circuit (IC) includes first and scan latches that are enabled to load data during a first part of a clock period. A clocking circuit outputs latch clocks with one latch clock driven to an active state during a second part of the clock period dependent on a first address input. A set of storage elements have inputs coupled to the output of the first scan latch and are respectively coupled to a latch clock to load data during a time that their respective latch clock is in an active state. A selector circuit is coupled to outputs of the first set of storage elements and outputs a value from one output based on a second address input. The second scan latch then loads data from the selector's output during the first part of the input clock period.
    Type: Grant
    Filed: September 9, 2022
    Date of Patent: April 16, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Thomas A. Ziaja, Uma Durairajan, Dinesh R. Amirtharaj
  • Patent number: 11954053
    Abstract: A method for integrating buffer views into buffer access operations in a reconfigurable computing environment includes detecting, in an instruction stream for a reconfigurable dataflow unit (RDU), a buffer allocation statement comprising a tensor indexing expression, a buffer view indicator and one or more buffer view parameters. The method also includes lowering the buffer view parameters into the indexing expression according to the buffer view indicator to produce a modified tensor indexing expression, removing the buffer view indicator from the buffer allocation statement to produce a modified buffer allocation statement and allocating a buffer according to the modified buffer allocation statement. The modified buffer allocation statement may include the modified tensor indexing expression. A corresponding system and computer readable medium are also disclosed herein.
    Type: Grant
    Filed: October 13, 2022
    Date of Patent: April 9, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Yaqi Zhang, Matthew Feldman
  • Publication number: 20240094794
    Abstract: An integrated circuit (IC) includes an array of statically reconfigurable compute units for separation into mutually exclusive groups. Each group includes statically reconfigurable number of compute units. Each compute unit includes a register statically reconfigurable with a group identifier that identifies which group the compute unit belongs to, a counter statically reconfigurable to synchronously increment with the counters of all the other compute units such that all the counters have the same value each clock cycle, and control circuitry that prevents the compute unit from starting to process data until the counter value matches the identifier. According to operation of the register, the counter, and the control circuitry, no more than the statically reconfigurable number of the compute units are allowed to start processing data concurrently to mitigate supply voltage droop caused by a time rate of change of current drawn by the IC through inductive loads of the IC.
    Type: Application
    Filed: April 8, 2023
    Publication date: March 21, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Darshan GANDHI, Manish K. SHAH, Raghu PRABHAKAR, Gregory Frederick GROHOSKI, Youngmoon CHOI, Jinuk SHIN
  • Patent number: 11934343
    Abstract: Disclosed is a data processing system to receive a processing graph of an application. A compile time logic is configured to modify the processing graph and generate a modified processing graph. The modified processing graph is configured to apply a post-padding tiling after applying a cumulative input padding that confines padding to an input. The cumulative input padding pads the input into a padded input. The post-padding tiling tiles the padded input into a set of pre-padded input tiles with a same tile size, tiles intermediate representation of the input into a set of intermediate tiles with a same tile size, and tiles output representation of the input into a set of non-overlapping output tiles with a same tile size. Runtime logic is configured with the compile time logic to execute the modified processing graph to execute the application.
    Type: Grant
    Filed: July 23, 2021
    Date of Patent: March 19, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth
  • Publication number: 20240085965
    Abstract: An integrated circuit (IC) includes an array of compute units. Each compute unit is configured such that, when transitioning from not processing data to processing data, the compute unit makes an individual contribution to an aggregate time rate of change of current drawn by the IC. Control circuitry is configurable to, for each compute unit of the array of compute units, control when the compute unit is eligible to transition from not processing data to processing data relative to when the other compute units start processing data to mitigate supply voltage droop caused by the aggregate time rate of change of current drawn by the IC through inductive loads of the IC.
    Type: Application
    Filed: April 8, 2023
    Publication date: March 14, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Darshan GANDHI, Manish K. SHAH, Raghu PRABHAKAR, Gregory Frederick GROHOSKI, Youngmoon CHOI, Jinuk SHIN
  • Publication number: 20240085966
    Abstract: A method includes analyzing a dataflow graph to generate configuration information loadable into an integrated circuit. The dataflow graph specifies operations to be performed and data dependencies between the operations. The configuration information is usable by the integrated circuit to configure compute units of the integrated circuit to perform respective one or more of the operations of the dataflow graph, control data flow between the compute units to accomplish the data dependencies between the respective operations performed by the compute units, and control when each compute unit starts to perform the respective operations on the data to mitigate supply voltage droop caused by a time rate of change of current drawn by the integrated circuit through inductive loads of the integrated circuit.
    Type: Application
    Filed: April 8, 2023
    Publication date: March 14, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Darshan GANDHI, Manish K. SHAH, Raghu PRABHAKAR, Gregory Frederick GROHOSKI, Youngmoon CHOI, Jinuk SHIN
  • Publication number: 20240085967
    Abstract: An integrated circuit (IC) includes an array of compute units. Each compute unit is configured such that, when transitioning from not processing data to processing data, the compute unit makes an individual contribution to an aggregate time rate of change of current drawn by the IC. Control circuitry is configurable to, for each compute unit of the array of compute units, control when the compute unit is eligible to transition from not processing data to processing data relative to when the other compute units start processing data to mitigate supply voltage overshoot caused by the aggregate time rate of change of current drawn by the IC through inductive loads of the IC.
    Type: Application
    Filed: April 8, 2023
    Publication date: March 14, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Darshan GANDHI, Manish K. SHAH, Raghu PRABHAKAR, Gregory Frederick GROHOSKI, Youngmoon CHOI, Jinuk SHIN
  • Publication number: 20240086235
    Abstract: Reconfigurable dataflow architecture is an emerging design for deep learning training accelerator. This architecture maps model operators to an accelerator in a spatial way, enabling pipeline parallelization for high throughput. An essential ingredient to exploit this throughput advantage is compiler Performance Optimization (PO) which searches for optimal model mappings. The convention in industry-leading dataflow compilation uses hand-tuned rules to guide PO, requiring immense engineering cost to develop. This paper challenges this convention and asks if data-driven learned performance optimization can reduce the engineering cost while improving training throughput over hand-tuned rules. We present a workflow which guides PO using simple machine learning models trained from throughput observations of randomly generated mappings.
    Type: Application
    Filed: September 13, 2023
    Publication date: March 14, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Tianxiao JIANG, Jian ZHANG, Etash Kumar GUHA, Andrew DENG, Muthiah ANNAMALAI
  • Patent number: 11928512
    Abstract: A reconfigurable data processor comprises an array of configurable units configurable to allocate a plurality of sets of configurable units in the array to implement respective execution fragments of the data processing operation. Quiesce logic is coupled to configurable units in the array, configurable to respond to a quiesce control signal to quiesce the sets of configurable units in the array on quiesce boundaries of the respective execution fragments, and to forward quiesce ready signals for the respective execution fragments when the corresponding sets of processing units are ready. An array quiesce controller distributes the quiesce control signal to configurable units in the array, and receives quiesce ready signals for the respective execution fragments from the quiesce logic.
    Type: Grant
    Filed: May 17, 2021
    Date of Patent: March 12, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Raghu Prabhakar, Manish K. Shah, Pramod Nataraja, David Brian Jackson, Kin Hing Leung, Ram Sivaramakrishnan, Sumti Jairath, Gregory Frederick Grohoski
  • Patent number: 11928445
    Abstract: A complier produces a configuration file to configure a fracturable data path of a configurable unit in a coarse-grained reconfigurable processor to concurrently generate different address sequences generated using different address associated with different operations. The fracturable data path includes multiple computation stages respectively including a pipeline register. The compiler analyzes a first address calculation and a second address calculation and assigns a first set of stages to the first operation to generate the first address sequence and a second set of stages to the second operation to generate the second address sequence using the second set of stages, based on the analysis. A configuration file for the configurable unit is generated by the compiler that assigns the first set of stages to the first operation and the second set of stages to the second operation and includes two or more immediate values for each computation stage.
    Type: Grant
    Filed: January 19, 2023
    Date of Patent: March 12, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Raghu Prabhakar, David Brian Jackson, Scott Burson
  • Publication number: 20240078098
    Abstract: In a method, in response to an interface a computer-implemented analysis assistant initiates a presentation of inefficiency results, determined an efficiency analyzer based on a mapping of a dataflow program to execute on hardware of a computing system. The assistant receives an inefficiency included among the inefficiency results and composes formatted inefficiency results comprising a presentation format of the inefficiency to assist a developer of the dataflow program to interpret the inefficiency. The analysis assistant outputs the formatted inefficiency results to an interface, which can comprise an interface to output the formatted inefficiency results for use by the developer to improve the dataflow program in association with the inefficiency. In implementations the presentation can comprise an interactive presentation with a developer of the dataflow program. A computer program product and a computing system can implement the method.
    Type: Application
    Filed: November 8, 2023
    Publication date: March 7, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Blaine RISTER, Qingjian LI, Bowen YANG, Junjue WANG, Chen LIU, Zhuo CHEN, Arvind SUJEETH, Sumti JAIRATH
  • Publication number: 20240069770
    Abstract: A system includes a coarse-grained reconfigurable (CGR) processor and a compiler configured to generate one or more configuration files for an application for execution on the CGR processor including an array of pattern compute units (PCUs) and pattern memory units (PMUs). A PCU is configured to perform an operation. A PMU comprises a plurality of data structures including a plurality of portions of operation-specific data related to the operation. The PMU is coupled to the PCU via a multi-segment datapath pipeline. The CGR processor is coupled to configure a segment of the datapath pipeline using a set of configurations bits corresponding to a portion of the operation-specific data related to the operation to activate to the segment, to further communicate the operation-specific data to the PCU via the activated segment. The CGR processor is coupled to switch among multiple PMU contexts in various segments sequentially to concurrently.
    Type: Application
    Filed: August 22, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventor: Raghu PRABHAKAR
  • Publication number: 20240069959
    Abstract: A system includes a coarse-grained reconfigurable (CGR) processor and a compiler configured to generate one or more configuration files for an application for execution on the CGR processor including an array of pattern compute units (PCUs) and pattern memory units (PMUs). A PCU is configured to perform an operation. A PMU includes operation-specific data related to the operation. The PMU is coupled to the PCU via a multi-segment datapath pipeline. The CGR processor is coupled to configure a segment of the datapath pipeline using a set of configurations bits corresponding to the operation-specific data to activate to the segment, to communicate the operation-specific data to the PCU via the activated segment. A finite state machine (FSM) is configured to progress through a plurality of states corresponding to the plurality of PMU contexts and allow the PMU to switch among multiple PMU contexts sequentially or concurrently.
    Type: Application
    Filed: August 23, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventor: Raghu PRABHAKAR
  • Publication number: 20240073136
    Abstract: A reconfigurable processing unit is disclosed, comprising a first internal network and a second internal network with different protocols, an interface to an external network with a different protocol, a first configurable unit sending a request to access an external memory over the first internal network, a second configurable unit receiving the request on the first internal network, obtaining a memory address, determining an identifier for the target reconfigurable processing unit, and sending the request, identifier, and memory address over the second internal network, and a third configurable unit receiving the request, identifier, and memory address on the second internal network, determining a routable address on the external network based on the identifier, synthesizing a payload with the request, address, and identifier, and sending the payload to the routable address on the external network.
    Type: Application
    Filed: October 25, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Manish K. SHAH, Ram SIVARAMAKRISHNAN, Gregory Frederick GROHOSKI, Raghu PRABHAKAR
  • Publication number: 20240070111
    Abstract: A reconfigurable processing unit is disclosed, comprising a first internal network and a second internal network with different protocols, an interface to an external network with a different protocol, a first configurable unit connected to the first internal network, a second configurable unit connected to both the first internal network and the second internal network, and a third configurable unit connected to both the second internal network and the interface to the external network. The third configurable unit is configured to receive a payload from the external network and send the transaction type identifier and the source application ID to the second configurable unit over the second internal network. The second configurable unit sends information to the first configurable unit based on the transaction type identifier and the source application ID matching the local application ID retrieved from the register.
    Type: Application
    Filed: October 25, 2023
    Publication date: February 29, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Manish K. SHAH, Ram SIVARAMAKRISHNAN, Gregory Frederick GROHOSKI, Raghu PRABHAKAR