Patents by Inventor Sumti Jairath
Sumti Jairath has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11782729Abstract: A data processing system comprises a pool of reconfigurable data flow resources and a runtime processor. The pool of reconfigurable data flow resources includes arrays of physical configurable units and memory. The runtime processor includes logic to receive a plurality of configuration files for user applications. The configuration files include configurations of virtual data flow resources required to execute the user applications. The runtime processor also includes logic to allocate physical configurable units and memory in the pool of reconfigurable data flow resources to the virtual data flow resources and load the configuration files to the allocated physical configurable units. The runtime processor further includes logic to execute the user applications using the allocated physical configurable units and memory.Type: GrantFiled: August 18, 2020Date of Patent: October 10, 2023Assignee: SambaNova Systems, Inc.Inventors: Gregory Frederick Grohoski, Manish K. Shah, Raghu Prabhakar, Mark Luttrell, Ravinder Kumar, Kin Hing Leung, Ranen Chatterjee, Sumti Jairath, David Alan Koeplinger, Ram Sivaramakrishnan, Matthew Thomas Grimm
-
Publication number: 20230315410Abstract: A method comprises a compiler analyzing a graph to determine a pipeline of operators based on a shared dimension of input and output tensors among the operators. The operators are included in the graph and the graph corresponds to a dataflow application. The compiler determines a tiling decision associated with the pipeline and a tiling cost associated with the tiling decision. The tiling decision can comprise a tile shape to slice tensors of operators of the pipeline. Based on the tiling cost, the compiler determines that the tiling decision improves an optimization objective and includes the pipeline and tiling decision in mapping decisions associated with executing the application on a computing system. The compiler can apply a tiling cost model to determine the tiling costs. A computer program product and a computing system can implement the method.Type: ApplicationFiled: March 31, 2023Publication date: October 5, 2023Applicant: SambaNova Systems, Inc.Inventors: Bowen YANG, Zhuo CHEN, Chen LIU, Fei WANG, Ruobing WANG, Qinghua Li, Weiwei CHEN, Junjue WANG, Sumti JAIRATH
-
Publication number: 20230315802Abstract: A method comprises a compiler generating a MI (mixed integer) model to determine mapping decisions to map a dataflow application to hardware of a computing system to execute the application. The MI model comprises MI equations to solve by an MI solver. The MI equations include equations of an objective function corresponding to an optimization objective. The MI equations can comprise decision variables and equations and constraint variables and equations. The compiler outputs the MI model to the MI solver and invokes the MI solver to compute an MI solution comprising solutions to equations among the equations included in the MI model. The compiler receives the MI solution and generates a globally optimized mapping decision based on the MI solution. The MI solver can comprise a commercial program to solve MI linear equations. A computer program product and a computing system can implement the method.Type: ApplicationFiled: March 29, 2023Publication date: October 5, 2023Applicant: SambaNova Systems, Inc.Inventors: Junjue WANG, Blaine Burton RISTER, Zhichao MA, Zhuo CHEN, Andrew DENG, Sumti JAIRATH, Arvind Krishna SUJEETH
-
Publication number: 20230315407Abstract: According to a computing method a compiler determines a recompute node included in a dataflow application and a checkpoint tensor produced by the recompute node. The compiler determines a recompute cost to recompute the checkpoint tensor, and a memory cost to checkpoint the checkpoint tensor in a memory. Based on the recompute cost and/or the memory cost, the compiler determines a solution cost and compares the solution cost to a solution threshold. Based on comparing the solution cost to the solution threshold, the compiler determines a checkpoint solution to execute the dataflow application. The checkpoint solution can comprise recomputing or checkpointing the checkpoint tensor. In some implementations, the compiler can determine a recompute ratio of the recompute cost to the memory cost and can compare the recompute ratio to the solution threshold. A computer program product and a computing system can implement aspects of the method.Type: ApplicationFiled: March 31, 2023Publication date: October 5, 2023Applicant: SambaNova Systems, Inc.Inventors: Bowen YANG, Zhuo CHEN, Fei WANG, Venkat Krishna SRINIVASAN, Chen LIU, Junjue WANG, Arvind Krishna SUJEETH, Sumti JAIRATH
-
Publication number: 20230289310Abstract: A reconfigurable data processor comprises an array of configurable units and a bus system. The bus system is connected to the array of configurable units. The bus system includes a top level network and an array level network. The top level network is connected to an external data interface for communication with memory outside of the array of configurable units. The array level network is connected to configurable units in the array of configurable units.Type: ApplicationFiled: May 18, 2023Publication date: September 14, 2023Applicant: SambaNova Systems, Inc.Inventors: Gregory Frederick GROHOSKI, Sumti JAIRATH, Mark LUTTRELL, Raghu PRABHAKAR, Ram SIVARAMAKRISHNAN, Manish K. SHAH
-
Patent number: 11714780Abstract: The technology disclosed partitions a dataflow graph of a high-level program into memory allocations and execution fragments. The memory allocations represent creation of logical memory spaces in on-processor and/or off-processor memories for data required to implement the dataflow graph. The execution fragments represent operations on the data. The technology disclosed designates the memory allocations to virtual memory units and the execution fragments to virtual compute units. The technology disclosed partitions the execution fragments into memory fragments and compute fragments, and assigns the memory fragments to the virtual memory units and the compute fragments to the virtual compute units. The technology disclosed then allocates the virtual memory units to physical memory units and the virtual compute units to physical compute units.Type: GrantFiled: May 20, 2021Date of Patent: August 1, 2023Assignee: SambaNova Systems, Inc.Inventors: David Alan Koeplinger, Raghu Prabhakar, Sumti Jairath
-
Patent number: 11709664Abstract: A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.Type: GrantFiled: June 2, 2020Date of Patent: July 25, 2023Assignee: SambaNova Systems, Inc.Inventors: Weiwei Chen, Raghu Prabhakar, David Alan Koeplinger, Sitanshu Gupta, Ruddhi Arun Chaphekar, Ajit Punj, Sumti Jairath
-
Patent number: 11681645Abstract: A reconfigurable data processor includes a plurality of configurable units, and a configuration controller. The configuration controller is configured to start execution of a first application graph in a first set of configurable units. Then, concurrently with the execution of the first application graph in the first set of configurable units, the configuration controllers receive a command to load a configuration file into a second set of configurable units and obtain the configuration file. The configuration file contains information to configure the second set of configurable units to execute a second application graph. The configuration file is then loaded into the second set of configurable units and execution of the second application graph is started in the second set of configurable units.Type: GrantFiled: January 31, 2022Date of Patent: June 20, 2023Assignee: SambaNova Systems, Inc.Inventors: Gregory Frederick Grohoski, Sumti Jairath, Mark Luttrell, Raghu Prabhakar, Ram Sivaramakrishnan, Manish K. Shah
-
Patent number: 11625284Abstract: The technology disclosed relates to inter-node execution of configuration files on reconfigurable processors using smart network interface controller (SmartNIC) buffers. In particular, the technology disclosed relates to a runtime logic that is configured to execute configuration files that define applications and process application data for applications using a first reconfigurable processor on a first node, and a second host processor on a second node. The execution includes streaming configuration data in the configuration files and the application data between the first reconfigurable processor and the second host processor using one or more SmartNIC buffers.Type: GrantFiled: November 9, 2021Date of Patent: April 11, 2023Assignee: SambaNova Systems, Inc.Inventors: Ram Sivaramakrishnan, Sumti Jairath, Emre Ali Burhan, Manish K. Shah, Raghu Prabhakar, Ravinder Kumar, Arnav Goel, Ranen Chatterjee, Gregory Frederick Grohoski, Kin Hing Leung, Dawei Huang, Manoj Unnikrishnan, Martin Russell Raumann, Bandish B. Shah
-
Patent number: 11625283Abstract: The technology disclosed relates to inter-processor execution of configuration files on reconfigurable processors using smart network interface controller (SmartNIC) buffers. In particular, the technology disclosed relates to a runtime logic that is configured to execute configuration files that define applications and process application data for applications using a first reconfigurable processor and a second reconfigurable processor. The execution includes streaming configuration data in the configuration files and the application data between the first reconfigurable processor and the second reconfigurable processor using one or more SmartNIC buffers.Type: GrantFiled: November 9, 2021Date of Patent: April 11, 2023Assignee: SambaNova Systems, Inc.Inventors: Ram Sivaramakrishnan, Sumti Jairath, Emre Ali Burhan, Manish K. Shah, Raghu Prabhakar, Ravinder Kumar, Arnav Goel, Ranen Chatterjee, Gregory Frederick Grohoski, Kin Hing Leung, Dawei Huang, Manoj Unnikrishnan, Martin Russell Raumann, Bandish B. Shah
-
Patent number: 11609769Abstract: A reconfigurable data processor comprises a bus system, and an array of configurable units connected to the bus system, configurable units in the array including configuration data stores to store unit files comprising a plurality of sub-files of configuration data particular to the corresponding configurable units. Configurable units in the plurality of configurable units each include logic to execute a unit configuration load process, including receiving via the bus system, sub-files of a unit file particular to the configurable unit, and loading the received sub-files into the configuration store of the configurable unit. A configuration load controller connected to the bus system, including logic to execute an array configuration load process, including distributing a configuration file comprising unit files for a plurality of the configurable units in the array.Type: GrantFiled: November 9, 2020Date of Patent: March 21, 2023Assignee: SambaNova Systems, Inc.Inventors: Manish K. Shah, Ram Sivaramakrishnan, Mark Luttrell, David B. Jackson, Raghu Prabhakar, Sumti Jairath, Gregory Frederick Grohoski, Pramod Nataraja
-
Patent number: 11609798Abstract: The technology disclosed relates to runtime execution of configuration files on reconfigurable processors with varying configuration granularity. In particular, the technology disclosed relates to a runtime logic that is configured to receive a set of configuration files for an application, and load and execute a first subset of configuration files in the set of configuration files and associated application data on a first reconfigurable processor. The first reconfigurable processor has a first level of configurable granularity. The runtime logic is further configured to load and execute a second subset of configuration files in the set of configuration files and associated application data on a second reconfigurable processor. The second reconfigurable processor has a second level of configurable granularity that is different from the first level of configurable granularity.Type: GrantFiled: November 9, 2021Date of Patent: March 21, 2023Assignee: SambaNova Systems, Inc.Inventors: Ram Sivaramakrishnan, Sumti Jairath, Emre Ali Burhan, Manish K. Shah, Raghu Prabhakar, Ravinder Kumar, Arnav Goel, Ranen Chatterjee, Gregory Frederick Grohoski, Kin Hing Leung, Dawei Huang, Manoj Unnikrishnan, Martin Russell Raumann, Bandish B. Shah
-
Patent number: 11561925Abstract: A method of processing partitions of a tensor in a target order includes receiving, by a reorder unit and from two or more producer units, a plurality of partitions of a tensor in a first order that is different from the target order, storing the plurality of partitions in the reorder unit, and providing, from the reorder unit, the plurality of partitions in the target order to one or more consumer units. In an example, the one or more consumer units process the plurality of partitions in the target order.Type: GrantFiled: September 16, 2021Date of Patent: January 24, 2023Assignee: SambaNova Systems, Inc.Inventors: Raghu Prabhakar, Nathan Francis Sheeley, Matheen Musaddiq, Scott Layson Burson, Sitanshu Gupta, Sumti Jairath, Pramod Nataraja, Ajit Punj
-
Publication number: 20220374695Abstract: The technology disclosed relates to allocating available physical compute units (PCUs) and/or physical memory units (PMUs) of a reconfigurable data processor to operation units of an operation unit graph for execution thereof. In particular, it relates to selecting, for evaluation, an intermediate stage compute processing time between lower and upper search bounds of a generic stage compute processing time, determining a pipeline number of the PCUs and/or the PMUs required to process the operation unit graph, and iteratively, initializing new lower and upper search bounds of the generic stage compute processing time and selecting, for evaluation in a next iteration, a new intermediate stage compute processing time taking into account whether the pipeline number of the PCUs and/or the PMUs produced for a prior intermediate stage compute processing time in a previous iteration is lower or higher than the available PCUs and/or PMUs.Type: ApplicationFiled: August 8, 2022Publication date: November 24, 2022Applicant: SambaNova Systems, Inc.Inventors: Zhuo CHEN, Sumti JAIRATH
-
Publication number: 20220309324Abstract: A processing graph of an application with a sequence of processing nodes is obtained which processes an input and generates an intermediate representation a further intermediate representation, and an output representation of the input at stages in the sequence of processing nodes. Graph metadata is generated that specifies a non-overlapping target tiling configuration for the output representation, an overlapping tiling configuration for the input, an overlapping tiling configuration for the intermediate representation, and a third tiling configuration for the further intermediate representation. The processing graph is modified based on the graph metadata to conform to the parameters specified by the graph metadata. A set of computer instructions is then created to execute the modified processing graph on a target processing system.Type: ApplicationFiled: March 21, 2022Publication date: September 29, 2022Applicant: SambaNova Systems, Inc.Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
-
Publication number: 20220309322Abstract: A data processing system receives a graph that includes a sequence of layers and executes graph cuts between a preceding layer in the graph and a succeeding layer in the graph that succeeds the preceding layer. The preceding layer generates a set of tiles on a tile-by-tile basis and the succeeding layer processes a tensor that includes multiple tiles in the set of tiles. Thus the graph is partitioned into a sequence of subgraphs, and a subgraph in the sequence of subgraphs including a sub-sequence of layers in the sequence of layers. One or more configuration files is generated to configure runtime logic to execute the sequence of subgraphs and the one or more configuration files are stored on a computer-readable media.Type: ApplicationFiled: March 4, 2022Publication date: September 29, 2022Applicant: SambaNova Systems, Inc.Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
-
Publication number: 20220309318Abstract: Disclosed is a method that includes generating by an output processing node of a first section of a processing graph, a plurality of output tiles of an output tensor. The plurality of output tiles of the output tensor is written in a memory, where the writing includes zero-padding the plurality of output tiles of the output tensor in the memory. The zero-padded plurality of output tiles of the output tensor are tiled, to generate a plurality of input tiles of an input tensor. The plurality of input tiles of the input tensor is processed in a second section of the processing graph.Type: ApplicationFiled: June 30, 2021Publication date: September 29, 2022Applicant: SambaNova Systems, Inc.Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
-
Publication number: 20220309028Abstract: Disclosed is a data processing system that includes a plurality of reconfigurable processors and processor memory. Runtime logic, operatively coupled to the plurality of reconfigurable processors and the processor memory, is configured to configure at least one reconfigurable processor in the plurality of reconfigurable processors with a first subgraph in a sequence of subgraphs of a graph; load an input onto the processor memory; on a tile-by-tile basis, process a first set of input tiles from the input through the first subgraph and generate a first set of intermediate tiles, load the first set of intermediate tiles onto the processor memory, and process the first set of intermediate tiles through the first subgraph and generate a first set of output tiles; and compose output tiles in the first set of output tiles into a first composed input, and load the first composed input onto the processor memory.Type: ApplicationFiled: July 23, 2021Publication date: September 29, 2022Applicant: SambaNova Systems, Inc.Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
-
Publication number: 20220309029Abstract: A method of processing partitions of a tensor in a target order includes receiving, by a reorder unit and from two or more producer units, a plurality of partitions of a tensor in a first order that is different from the target order, storing the plurality of partitions in the reorder unit, and providing, from the reorder unit, the plurality of partitions in the target order to one or more consumer units. In an example, the one or more consumer units process the plurality of partitions in the target order.Type: ApplicationFiled: September 16, 2021Publication date: September 29, 2022Applicant: SambaNova Systems, Inc.Inventors: Raghu PRABHAKAR, Nathan Francis SHEELEY, Matheen MUSADDIQ, Scott Layson BURSON, Sitanshu GUPTA, Sumti JAIRATH, Pramod NATARAJA, Ajit PUNJ
-
Publication number: 20220309325Abstract: A data processing system includes compile time logic to section a graph into a sequence of sections, including a first section followed by a second section. The compile time logic configured the first section to generate a first output in a first non-overlapping target configuration in response to processing an input in a first overlapping input configuration, and configures the second section to generate a second output in a second non-overlapping target configuration in response to processing the first output in a second overlapping input configuration. The compile time logic also creates a set of computer instructions to execute the first section and the second section on a target processing system.Type: ApplicationFiled: April 4, 2022Publication date: September 29, 2022Applicant: SambaNova Systems, Inc.Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH