Patents Assigned to SambaNova Systems, Inc.
  • Publication number: 20230297349
    Abstract: A computer-implemented method of transforming a high-level program for mapping onto a coarse-grained reconfigurable (CGR) processor with an array of CGR units, including sectioning a dataflow graph into a plurality of sections; extracting performance information for each of the plurality of sections; on a CGR unit: assigning to a section at least two computations dependent on a first data element; scheduling an additional load of the first data element in response to available memory bandwidth for that section; eliminating a buffer between the additional load of the first data element and one of the two computations, for that section; generating configuration data for the and communication channels, wherein the configuration data, when loaded onto an instance of the array of CGR units, causes the array of CGR units to implement the dataflow graph; and storing the configuration data in a non-transitory computer-readable storage medium.
    Type: Application
    Filed: March 15, 2023
    Publication date: September 21, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Gao DENG, Weihang FAN, Fei WANG, Yun DU
  • Patent number: 11762665
    Abstract: A system includes a multidimensional array of homogenous Functional Configurable Units (FCUs), coupled using a multidimensional array of switches, and a parameter store on the device which stores parameters that tag a subarray of FCUs as unusable. Technologies are described which change the pattern of placement of configuration data, in dependence on the tagged subarray, by changing the routing through the array of switches. As a result, a multidimensional array of FCUs having unusable elements can still be used.
    Type: Grant
    Filed: May 5, 2022
    Date of Patent: September 19, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Gregory F. Grohoski, Manish K. Shah, Kin Hing Leung
  • Publication number: 20230289310
    Abstract: A reconfigurable data processor comprises an array of configurable units and a bus system. The bus system is connected to the array of configurable units. The bus system includes a top level network and an array level network. The top level network is connected to an external data interface for communication with memory outside of the array of configurable units. The array level network is connected to configurable units in the array of configurable units.
    Type: Application
    Filed: May 18, 2023
    Publication date: September 14, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Gregory Frederick GROHOSKI, Sumti JAIRATH, Mark LUTTRELL, Raghu PRABHAKAR, Ram SIVARAMAKRISHNAN, Manish K. SHAH
  • Publication number: 20230281156
    Abstract: A method for partitioning executable operations for a reconfigurable computing system includes receiving a set of expressions comprising a plurality of operations and dependencies for those operations, partitioning the plurality of operations into selected executable partitions wherein each selected executable partition conforms to resource constraints for a reconfigurable unit of the reconfigurable computing system. Partitioning the plurality of operations into selected executable partitions may include seeding a candidate partition with an operation, recursively generating an additional candidate partition for each operation adjacent to the candidate partition whose dependent operations are already within the candidate partition or a previously selected partition, and selecting a best candidate partition based on resource cost. A corresponding system and computer-readable medium are also disclosed herein.
    Type: Application
    Filed: August 23, 2022
    Publication date: September 7, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Yaqi Zhang, Mark Wagner, Matthew Feldman, Weiwei Chen
  • Publication number: 20230273879
    Abstract: A method for reducing latency and increasing throughput in a reconfigurable computing system includes receiving a user program for execution on a reconfigurable dataflow computing system, comprising a grid of compute units and grid of memory units interconnected with a switching array. The user program includes multiple tensor-based algebraic expressions that are converted to an intermediate representation comprising multiple stages. Each stage includes one or more logical operations executable via dataflow through compute units, and each stage is preceded by and followed by a buffer, each buffer corresponding to one or more memory units. The method includes detecting a memory mapping operation within a critical stage and moving the memory mapping operation to an adjacent stage, wherein the memory mapping operation is executable by memory units within the adjacent stage and dataflow through the buffer is controlled by one or more memory units within the grid of memory units.
    Type: Application
    Filed: February 28, 2023
    Publication date: August 31, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Adam BORDELON, David Alan KOEPLINGER
  • Patent number: 11740911
    Abstract: A system includes a multidimensional array of homogenous Functional Configurable Units (FCUs), coupled using a multidimensional array of switches, and a parameter store on the device which stores parameters that tag a subarray of FCUs as unusable. Technologies are described which change the pattern of placement of configuration data, in dependence on the tagged subarray, by changing the routing through the array of switches. As a result, a multidimensional array of FCUs having unusable elements can still be used.
    Type: Grant
    Filed: May 6, 2022
    Date of Patent: August 29, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: Gregory F. Grohoski, Manish K. Shah, Kin Hing Leung
  • Publication number: 20230259823
    Abstract: In a method an orchestrator of a computing system determines that results of Machine Learning model computations are available and dispatches a worker to perform model computations that include computing gradients of the results. The orchestrator determines that a set of gradients of the results is available and dispatches a gradient worker to compute a sum of the gradients. The orchestrator determines that a second set of gradients of the results is available and dispatches a second gradient worker to compute a sum of the second set of gradients. The orchestrator determines that the sums of the first and second gradients are available and dispatches a third gradient worker to compute synchronized gradients. The gradient workers compute the sums and synchronized gradients concurrent with training workers computing additional model computations results and/or gradients. A computer program product can include the method and a computing system can include the orchestrator.
    Type: Application
    Filed: February 13, 2023
    Publication date: August 17, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Greg DYKEMA, Fansheng CHENG, Kuan ZHOU, Arnav GOEL, Subhra MAZUMDAR, Milad SHARIF, Po-Yu WU, Bowen YANG, Qi ZHENG
  • Publication number: 20230259477
    Abstract: A data processing system for implementing operations that generate a dynamically-sized output is presented. The data processing system includes a reconfigurable processor that is configured to implement a first operation, a second operation, a recording unit, and a control unit. The first operation generates an output, wherein a size of the output is unknown during a configuration phase. The second operation receives the output of the first operation as an input. The recording unit generates control data that is indicative of the size of the output. The control unit that provides the control data to the second operation, wherein the second operation processes the input based on the control data.
    Type: Application
    Filed: February 14, 2023
    Publication date: August 17, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Abhishek SRIVASTAVA, Matthew VILIM, Raghu PRABHAKAR, Sankar RACHURU, Zhekun ZHANG, Matheen MUSADDIQ, Apurv VIVEK, Sitanshu GUPTA, Ayesha Siddiqua
  • Publication number: 20230252106
    Abstract: A method generates pairs of split matrices based on a left and a right matrix sharing dimension K. A first column-split matrix comprises columns 1 to Q of the left matrix and a second column-split matrix comprises columns Q+1 to Q+P of the left matrix. A first row-split matrix comprises rows 1 to Q of the right matrix and a second row-split matrix comprises columns rows Q+1 to Q+P of the right matrix. The method multiplies the first column-matrix and first row matrix to compute a first dot product, and multiplies the second column-matrix and second row matrix to compute a second dot product. The method adds the dot products to compute a third dot product. The method can compute the first and second dot products concurrently. A computing system can comprise a matrix splitter to generate the matrices and can comprise matrix processing units to compute the dot products.
    Type: Application
    Filed: February 3, 2023
    Publication date: August 10, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Pramod NATARAJA, Raghu PRABHAKAR
  • Publication number: 20230251683
    Abstract: A timing margin sensor circuit includes one or more time-to-digital converters (TDCs), a predictor, and a translation circuit. The TDC(s) measure(s) progress of a clock signal through one or more chains of delay stages. The progress depends on sense conditions acting upon the delay chain, such as the supply voltage and the temperature. The predictor receives the measured progress. If the delay chain becomes slower, the predictor extrapolates a predicted progress value. If the delay chain becomes faster, the predictor outputs the actual progress value. The translator translates the predictor output value to sense information that can be used in a clock stretcher circuit. The timing margin sensor may further have an averager/selector to average or select from the results of multiple TDCs. The timing margin sensor may further have a calibrator to compensate for nominal sense conditions, and one or more tunable delays circuits.
    Type: Application
    Filed: January 31, 2023
    Publication date: August 10, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Mahmood KHAYATZADEH, Satyajit SARKAR, Jinuk SHIN
  • Publication number: 20230251993
    Abstract: A coarse-grained reconfigurable (CGR) processor includes agents coupled to a first network, an array of CGR units connected by a second network, and a tile agent coupled between the first and second networks. The tile agent includes links to receive requests for transactions on the first network, request queues respectively associated with the links, credit counters associated with respective agents, a first arbiter, and a second arbiter. The first arbiter selects a request from the received requests for transactions and enters the selected request into a request queue associated with a link that received the selected request. The second arbiter chooses a request from an oldest entry of each request queue based on the credit counters, sends a transaction based on the chosen request over the first network, and removes the chosen request from its respective request queue.
    Type: Application
    Filed: February 9, 2023
    Publication date: August 10, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Manish K. SHAH, John Philipp BAXLEY
  • Publication number: 20230251989
    Abstract: A data processing system is presented that includes multiple local buses, a host processor, a network interface controller (NIC) for connecting to external storage via a network, one or more reconfigurable processors, and a bus switch. The bus switch couples the multiple local busses, thereby operatively coupling the one or more reconfigurable processors, the host processor, and the NIC. The one or more reconfigurable processors are configured to implement a virtual function that uses a virtual address for a memory access operation. The host processor is configured to implement an application programming interface (API) that translates the virtual address into a physical address, and the NIC uses the physical address to initiate a direct data access operation at the external storage that moves data directly between the one or more reconfigurable processors and the external storage, wherein the data bypasses the host processor.
    Type: Application
    Filed: February 9, 2023
    Publication date: August 10, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Subhra MAZUMDAR, Guoyao FENG, Neal SANGHVI
  • Publication number: 20230251994
    Abstract: A reconfigurable processor includes an array of configurable units connected by a bus system. Each configurable unit has a configuration data store, organized as a shift register, to store configuration data. The configuration data store also includes individually addressable argument registers respectively made up of word-sized portions of the shift register to provide arguments to the configurable unit. The configurable unit also includes program load logic shift data into the configuration data store, and argument load logic to directly load data into the argument registers without shifting the received argument data through the shift register. A program load controller is associated with the array to respond to a program load command by executing a program load process, and a fast argument load (FAL) controller is associated with the array to respond to an FAL command by executing an FAL process.
    Type: Application
    Filed: February 2, 2023
    Publication date: August 10, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Manish K. SHAH, Gregory Frederick GROHOSKI
  • Publication number: 20230251839
    Abstract: A coarse-grained reconfigurable (CGR) processor comprises a first network and a second network; a plurality of agents coupled to the first network; an array of CGR units coupled together by the second network; and a tile agent coupled between the first network and the second network. The tile agent comprises a plurality of links, a plurality of credit counters associated with respective agents of the plurality of agents, a plurality of credit-hog counters associated with respective links of the plurality of links, and an arbiter to manage access to the first network from the plurality of links based their associated credit-hog counters. Furthermore, a credit-hog counter of the plurality of credit-hog counters changes in response to processing a request for a transaction from its associated link.
    Type: Application
    Filed: February 9, 2023
    Publication date: August 10, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Manish K. SHAH, John Philipp BAXLEY
  • Publication number: 20230244515
    Abstract: A system is presented that includes a communication link, a runtime processor, and a reconfigurable processor. The reconfigurable processor is adapted for generating an interrupt to the runtime processor in response to a predetermined event and includes first and second dies arranged in a package, having respective first and second arrays of coarse-grained reconfigurable (CGR) units, and respective first and second communication link interfaces coupled to the communication link. The runtime processor is adapted for configuring the first and second communication link interfaces to provide access to the first and second arrays of coarse-grained reconfigurable units from first and second physical function drivers and from at least one virtual function driver, and the reconfigurable processor is adapted for sending the interrupt to the first or to the second physical function driver and for sending the interrupt to a virtual function driver of the at least one virtual function driver.
    Type: Application
    Filed: March 7, 2023
    Publication date: August 3, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Manish K. SHAH, Paul JORDAN, Maran WILSON, Ravinder KUMAR
  • Publication number: 20230244748
    Abstract: A method for multiplying matrices in a coarse-grained computing grid includes assigning each compute unit c of C compute units to a unique submatrix Rc of a result matrix R, wherein the C compute units are arranged in a 2D computing grid, configuring one or more source memory units to provide relevant matrix A data and matrix B data to the C compute units via a plurality of packets, configuring each compute unit c to produce the unique submatrix Rc and send the unique submatrix Rc to one or more desired memory units. The method also includes initiating data flow in the computing grid to produce the result matrix R within the desired memory units. To reduce packet traffic, Matrix B data corresponding to a column of compute units may be narrow-casted to each column of compute units. A corresponding system and computer-readable medium are also disclosed herein.
    Type: Application
    Filed: May 25, 2022
    Publication date: August 3, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Pramod Natarja, Sitanshu Gupta, Ram Sivaramakrishnan, Ajit Punj
  • Publication number: 20230244462
    Abstract: A system is presented that includes a communication link, a runtime processor coupled to the communication link, and a reconfigurable processor. The reconfigurable processor is adapted for generating an interrupt to the runtime processor in response to a predetermined event and includes multiple arrays of coarse-grained reconfigurable (CGR) units and an interface to the communication link that couples the reconfigurable processor to the runtime processor via the communication link. The runtime processor is adapted for configuring the interface to the communication link to provide access to the multiple arrays of coarse-grained reconfigurable units from a physical function driver and from at least one virtual function driver, and the reconfigurable processor is adapted for sending the interrupt to the physical function driver and to a virtual function driver of the at least one virtual function driver within the runtime processor.
    Type: Application
    Filed: March 7, 2023
    Publication date: August 3, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Manish K. SHAH, Paul JORDAN, Maran WILSON, Ravinder KUMAR
  • Publication number: 20230244461
    Abstract: A data processing system is presented that includes a communication link, a runtime processor coupled to the communication link, and one or more reconfigurable processors. A reconfigurable processor of the one or more reconfigurable processors is adapted for generating an interrupt to the runtime processor in response to a predetermined event and includes arrays of coarse-grained reconfigurable (CGR) units and an interface to the communication link that couples the reconfigurable processor to the runtime processor via the communication link. The runtime processor is adapted for configuring the interface to the communication link to provide access to the arrays of CGR units through the communication link from a physical function driver and from a virtual function driver.
    Type: Application
    Filed: February 1, 2023
    Publication date: August 3, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Manish K. SHAH, Paul JORDAN, Maran WILSON, Ravinder KUMAR
  • Patent number: 11714780
    Abstract: The technology disclosed partitions a dataflow graph of a high-level program into memory allocations and execution fragments. The memory allocations represent creation of logical memory spaces in on-processor and/or off-processor memories for data required to implement the dataflow graph. The execution fragments represent operations on the data. The technology disclosed designates the memory allocations to virtual memory units and the execution fragments to virtual compute units. The technology disclosed partitions the execution fragments into memory fragments and compute fragments, and assigns the memory fragments to the virtual memory units and the compute fragments to the virtual compute units. The technology disclosed then allocates the virtual memory units to physical memory units and the virtual compute units to physical compute units.
    Type: Grant
    Filed: May 20, 2021
    Date of Patent: August 1, 2023
    Assignee: SambaNova Systems, Inc.
    Inventors: David Alan Koeplinger, Raghu Prabhakar, Sumti Jairath
  • Publication number: 20230237013
    Abstract: A system for a data-parallel execution of at least two implementations of an application on reconfigurable processors with different layouts is presented. The system comprises a pool of reconfigurable data flow resources with data transfer resources that interconnect first and second reconfigurable processors having first and second layouts that impose respective first and second constraints for the data-parallel execution of the application. The system further comprises an archive of configuration files and a host system that is operatively coupled to the first and second reconfigurable processors. The host system comprises first and second compilers that generate for the application, based on the respective first and second constraints, first and second configuration files that are stored in the archive of configuration files and adapted to be executed data-parallel compatible on respective first and second reconfigurable processors.
    Type: Application
    Filed: September 9, 2022
    Publication date: July 27, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Greg Dykema, Maran Wilson, Guoyao Feng, Kuan Zhou, Tianyu Sun, Taylor Lee, Kin Hing LEUNG, Arnav Goel, Conrad Turlik, Milad Sharif