Patents by Inventor Sumti Jairath

Sumti Jairath has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Runtime patching of configuration files

Patent number: 11782729

Abstract: A data processing system comprises a pool of reconfigurable data flow resources and a runtime processor. The pool of reconfigurable data flow resources includes arrays of physical configurable units and memory. The runtime processor includes logic to receive a plurality of configuration files for user applications. The configuration files include configurations of virtual data flow resources required to execute the user applications. The runtime processor also includes logic to allocate physical configurable units and memory in the pool of reconfigurable data flow resources to the virtual data flow resources and load the configuration files to the allocated physical configurable units. The runtime processor further includes logic to execute the user applications using the allocated physical configurable units and memory.

Type: Grant

Filed: August 18, 2020

Date of Patent: October 10, 2023

Assignee: SambaNova Systems, Inc.

Inventors: Gregory Frederick Grohoski, Manish K. Shah, Raghu Prabhakar, Mark Luttrell, Ravinder Kumar, Kin Hing Leung, Ranen Chatterjee, Sumti Jairath, David Alan Koeplinger, Ram Sivaramakrishnan, Matthew Thomas Grimm
OPTIMIZING TENSOR TILING IN NEURAL NETWORKS BASED ON A TILING COST MODEL

Publication number: 20230315410

Abstract: A method comprises a compiler analyzing a graph to determine a pipeline of operators based on a shared dimension of input and output tensors among the operators. The operators are included in the graph and the graph corresponds to a dataflow application. The compiler determines a tiling decision associated with the pipeline and a tiling cost associated with the tiling decision. The tiling decision can comprise a tile shape to slice tensors of operators of the pipeline. Based on the tiling cost, the compiler determines that the tiling decision improves an optimization objective and includes the pipeline and tiling decision in mapping decisions associated with executing the application on a computing system. The compiler can apply a tiling cost model to determine the tiling costs. A computer program product and a computing system can implement the method.

Type: Application

Filed: March 31, 2023

Publication date: October 5, 2023

Applicant: SambaNova Systems, Inc.

Inventors: Bowen YANG, Zhuo CHEN, Chen LIU, Fei WANG, Ruobing WANG, Qinghua Li, Weiwei CHEN, Junjue WANG, Sumti JAIRATH
COMPILER OPTIMIZATION OF DATAFLOW APPLICATIONS USING MIXED INTEGER EQUATIONS

Publication number: 20230315802

Abstract: A method comprises a compiler generating a MI (mixed integer) model to determine mapping decisions to map a dataflow application to hardware of a computing system to execute the application. The MI model comprises MI equations to solve by an MI solver. The MI equations include equations of an objective function corresponding to an optimization objective. The MI equations can comprise decision variables and equations and constraint variables and equations. The compiler outputs the MI model to the MI solver and invokes the MI solver to compute an MI solution comprising solutions to equations among the equations included in the MI model. The compiler receives the MI solution and generates a globally optimized mapping decision based on the MI solution. The MI solver can comprise a commercial program to solve MI linear equations. A computer program product and a computing system can implement the method.

Type: Application

Filed: March 29, 2023

Publication date: October 5, 2023

Applicant: SambaNova Systems, Inc.

Inventors: Junjue WANG, Blaine Burton RISTER, Zhichao MA, Zhuo CHEN, Andrew DENG, Sumti JAIRATH, Arvind Krishna SUJEETH
TENSOR CHECKPOINT OPTIMIZATION IN DATAFLOW COMPUTING APPLICATIONS

Publication number: 20230315407

Abstract: According to a computing method a compiler determines a recompute node included in a dataflow application and a checkpoint tensor produced by the recompute node. The compiler determines a recompute cost to recompute the checkpoint tensor, and a memory cost to checkpoint the checkpoint tensor in a memory. Based on the recompute cost and/or the memory cost, the compiler determines a solution cost and compares the solution cost to a solution threshold. Based on comparing the solution cost to the solution threshold, the compiler determines a checkpoint solution to execute the dataflow application. The checkpoint solution can comprise recomputing or checkpointing the checkpoint tensor. In some implementations, the compiler can determine a recompute ratio of the recompute cost to the memory cost and can compare the recompute ratio to the solution threshold. A computer program product and a computing system can implement aspects of the method.

Type: Application

Filed: March 31, 2023

Publication date: October 5, 2023

Applicant: SambaNova Systems, Inc.

Inventors: Bowen YANG, Zhuo CHEN, Fei WANG, Venkat Krishna SRINIVASAN, Chen LIU, Junjue WANG, Arvind Krishna SUJEETH, Sumti JAIRATH
TOP LEVEL NETWORK AND ARRAY LEVEL NETWORK FOR RECONFIGURABLE DATA PROCESSORS

Publication number: 20230289310

Abstract: A reconfigurable data processor comprises an array of configurable units and a bus system. The bus system is connected to the array of configurable units. The bus system includes a top level network and an array level network. The top level network is connected to an external data interface for communication with memory outside of the array of configurable units. The array level network is connected to configurable units in the array of configurable units.

Type: Application

Filed: May 18, 2023

Publication date: September 14, 2023

Applicant: SambaNova Systems, Inc.

Inventors: Gregory Frederick GROHOSKI, Sumti JAIRATH, Mark LUTTRELL, Raghu PRABHAKAR, Ram SIVARAMAKRISHNAN, Manish K. SHAH
Compiler flow logic for reconfigurable architectures

Patent number: 11714780

Abstract: The technology disclosed partitions a dataflow graph of a high-level program into memory allocations and execution fragments. The memory allocations represent creation of logical memory spaces in on-processor and/or off-processor memories for data required to implement the dataflow graph. The execution fragments represent operations on the data. The technology disclosed designates the memory allocations to virtual memory units and the execution fragments to virtual compute units. The technology disclosed partitions the execution fragments into memory fragments and compute fragments, and assigns the memory fragments to the virtual memory units and the compute fragments to the virtual compute units. The technology disclosed then allocates the virtual memory units to physical memory units and the virtual compute units to physical compute units.

Type: Grant

Filed: May 20, 2021

Date of Patent: August 1, 2023

Assignee: SambaNova Systems, Inc.

Inventors: David Alan Koeplinger, Raghu Prabhakar, Sumti Jairath
Anti-congestion flow control for reconfigurable processors

Patent number: 11709664

Abstract: A compiler configured to configure memory nodes with a ready-to-read credit counter and a write credit counter. The ready-to-read credit counter of a particular upstream memory node initialized with as many read credits as a buffer depth of a corresponding downstream memory node. The ready-to-read credit counter configured to decrement when a buffer data unit is written by the particular upstream memory node into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a read ready token. The write credit counter of the particular upstream memory node initialized with one or more write credits and configured to decrement when the particular upstream memory node begins writing the buffer data unit into the corresponding downstream memory node, and to increment when the particular upstream memory node receives from the corresponding downstream memory node a write done token.

Type: Grant

Filed: June 2, 2020

Date of Patent: July 25, 2023

Assignee: SambaNova Systems, Inc.

Inventors: Weiwei Chen, Raghu Prabhakar, David Alan Koeplinger, Sitanshu Gupta, Ruddhi Arun Chaphekar, Ajit Punj, Sumti Jairath
Independent control of multiple concurrent application graphs in a reconfigurable data processor

Patent number: 11681645

Abstract: A reconfigurable data processor includes a plurality of configurable units, and a configuration controller. The configuration controller is configured to start execution of a first application graph in a first set of configurable units. Then, concurrently with the execution of the first application graph in the first set of configurable units, the configuration controllers receive a command to load a configuration file into a second set of configurable units and obtain the configuration file. The configuration file contains information to configure the second set of configurable units to execute a second application graph. The configuration file is then loaded into the second set of configurable units and execution of the second application graph is started in the second set of configurable units.

Type: Grant

Filed: January 31, 2022

Date of Patent: June 20, 2023

Assignee: SambaNova Systems, Inc.

Inventors: Gregory Frederick Grohoski, Sumti Jairath, Mark Luttrell, Raghu Prabhakar, Ram Sivaramakrishnan, Manish K. Shah
Inter-node execution of configuration files on reconfigurable processors using smart network interface controller (smartnic) buffers

Patent number: 11625284

Abstract: The technology disclosed relates to inter-node execution of configuration files on reconfigurable processors using smart network interface controller (SmartNIC) buffers. In particular, the technology disclosed relates to a runtime logic that is configured to execute configuration files that define applications and process application data for applications using a first reconfigurable processor on a first node, and a second host processor on a second node. The execution includes streaming configuration data in the configuration files and the application data between the first reconfigurable processor and the second host processor using one or more SmartNIC buffers.

Type: Grant

Filed: November 9, 2021

Date of Patent: April 11, 2023

Assignee: SambaNova Systems, Inc.

Inventors: Ram Sivaramakrishnan, Sumti Jairath, Emre Ali Burhan, Manish K. Shah, Raghu Prabhakar, Ravinder Kumar, Arnav Goel, Ranen Chatterjee, Gregory Frederick Grohoski, Kin Hing Leung, Dawei Huang, Manoj Unnikrishnan, Martin Russell Raumann, Bandish B. Shah
Inter-processor execution of configuration files on reconfigurable processors using smart network interface controller (SmartNIC) buffers

Patent number: 11625283

Abstract: The technology disclosed relates to inter-processor execution of configuration files on reconfigurable processors using smart network interface controller (SmartNIC) buffers. In particular, the technology disclosed relates to a runtime logic that is configured to execute configuration files that define applications and process application data for applications using a first reconfigurable processor and a second reconfigurable processor. The execution includes streaming configuration data in the configuration files and the application data between the first reconfigurable processor and the second reconfigurable processor using one or more SmartNIC buffers.

Type: Grant

Filed: November 9, 2021

Date of Patent: April 11, 2023

Assignee: SambaNova Systems, Inc.

Inventors: Ram Sivaramakrishnan, Sumti Jairath, Emre Ali Burhan, Manish K. Shah, Raghu Prabhakar, Ravinder Kumar, Arnav Goel, Ranen Chatterjee, Gregory Frederick Grohoski, Kin Hing Leung, Dawei Huang, Manoj Unnikrishnan, Martin Russell Raumann, Bandish B. Shah
Configuration of a reconfigurable data processor using sub-files

Patent number: 11609769

Abstract: A reconfigurable data processor comprises a bus system, and an array of configurable units connected to the bus system, configurable units in the array including configuration data stores to store unit files comprising a plurality of sub-files of configuration data particular to the corresponding configurable units. Configurable units in the plurality of configurable units each include logic to execute a unit configuration load process, including receiving via the bus system, sub-files of a unit file particular to the configurable unit, and loading the received sub-files into the configuration store of the configurable unit. A configuration load controller connected to the bus system, including logic to execute an array configuration load process, including distributing a configuration file comprising unit files for a plurality of the configurable units in the array.

Type: Grant

Filed: November 9, 2020

Date of Patent: March 21, 2023

Assignee: SambaNova Systems, Inc.

Inventors: Manish K. Shah, Ram Sivaramakrishnan, Mark Luttrell, David B. Jackson, Raghu Prabhakar, Sumti Jairath, Gregory Frederick Grohoski, Pramod Nataraja
Runtime execution of configuration files on reconfigurable processors with varying configuration granularity

Patent number: 11609798

Abstract: The technology disclosed relates to runtime execution of configuration files on reconfigurable processors with varying configuration granularity. In particular, the technology disclosed relates to a runtime logic that is configured to receive a set of configuration files for an application, and load and execute a first subset of configuration files in the set of configuration files and associated application data on a first reconfigurable processor. The first reconfigurable processor has a first level of configurable granularity. The runtime logic is further configured to load and execute a second subset of configuration files in the set of configuration files and associated application data on a second reconfigurable processor. The second reconfigurable processor has a second level of configurable granularity that is different from the first level of configurable granularity.

Type: Grant

Filed: November 9, 2021

Date of Patent: March 21, 2023

Assignee: SambaNova Systems, Inc.

Inventors: Ram Sivaramakrishnan, Sumti Jairath, Emre Ali Burhan, Manish K. Shah, Raghu Prabhakar, Ravinder Kumar, Arnav Goel, Ranen Chatterjee, Gregory Frederick Grohoski, Kin Hing Leung, Dawei Huang, Manoj Unnikrishnan, Martin Russell Raumann, Bandish B. Shah
Tensor partitioning and partition access order

Patent number: 11561925

Abstract: A method of processing partitions of a tensor in a target order includes receiving, by a reorder unit and from two or more producer units, a plurality of partitions of a tensor in a first order that is different from the target order, storing the plurality of partitions in the reorder unit, and providing, from the reorder unit, the plurality of partitions in the target order to one or more consumer units. In an example, the one or more consumer units process the plurality of partitions in the target order.

Type: Grant

Filed: September 16, 2021

Date of Patent: January 24, 2023

Assignee: SambaNova Systems, Inc.

Inventors: Raghu Prabhakar, Nathan Francis Sheeley, Matheen Musaddiq, Scott Layson Burson, Sitanshu Gupta, Sumti Jairath, Pramod Nataraja, Ajit Punj
PERFORMANCE ESTIMATION-BASED RESOURCE ALLOCATION FOR RECONFIGURABLE ARCHITECTURES

Publication number: 20220374695

Abstract: The technology disclosed relates to allocating available physical compute units (PCUs) and/or physical memory units (PMUs) of a reconfigurable data processor to operation units of an operation unit graph for execution thereof. In particular, it relates to selecting, for evaluation, an intermediate stage compute processing time between lower and upper search bounds of a generic stage compute processing time, determining a pipeline number of the PCUs and/or the PMUs required to process the operation unit graph, and iteratively, initializing new lower and upper search bounds of the generic stage compute processing time and selecting, for evaluation in a next iteration, a new intermediate stage compute processing time taking into account whether the pipeline number of the PCUs and/or the PMUs produced for a prior intermediate stage compute processing time in a previous iteration is lower or higher than the available PCUs and/or PMUs.

Type: Application

Filed: August 8, 2022

Publication date: November 24, 2022

Applicant: SambaNova Systems, Inc.

Inventors: Zhuo CHEN, Sumti JAIRATH
LOSSLESS TILING IN CONVOLUTION NETWORKS - GRAPH METADATA GENERATION

Publication number: 20220309324

Abstract: A processing graph of an application with a sequence of processing nodes is obtained which processes an input and generates an intermediate representation a further intermediate representation, and an output representation of the input at stages in the sequence of processing nodes. Graph metadata is generated that specifies a non-overlapping target tiling configuration for the output representation, an overlapping tiling configuration for the input, an overlapping tiling configuration for the intermediate representation, and a third tiling configuration for the further intermediate representation. The processing graph is modified based on the graph metadata to conform to the parameters specified by the graph metadata. A set of computer instructions is then created to execute the modified processing graph on a target processing system.

Type: Application

Filed: March 21, 2022

Publication date: September 29, 2022

Applicant: SambaNova Systems, Inc.

Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
LOSSLESS TILING IN CONVOLUTION NETWORKS - SECTION CUTS

Publication number: 20220309322

Abstract: A data processing system receives a graph that includes a sequence of layers and executes graph cuts between a preceding layer in the graph and a succeeding layer in the graph that succeeds the preceding layer. The preceding layer generates a set of tiles on a tile-by-tile basis and the succeeding layer processes a tensor that includes multiple tiles in the set of tiles. Thus the graph is partitioned into a sequence of subgraphs, and a subgraph in the sequence of subgraphs including a sub-sequence of layers in the sequence of layers. One or more configuration files is generated to configure runtime logic to execute the sequence of subgraphs and the one or more configuration files are stored on a computer-readable media.

Type: Application

Filed: March 4, 2022

Publication date: September 29, 2022

Applicant: SambaNova Systems, Inc.

Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
Lossless Tiling in Convolution Networks - Padding and Re-Tilling at Section Boundaries

Publication number: 20220309318

Abstract: Disclosed is a method that includes generating by an output processing node of a first section of a processing graph, a plurality of output tiles of an output tensor. The plurality of output tiles of the output tensor is written in a memory, where the writing includes zero-padding the plurality of output tiles of the output tensor in the memory. The zero-padded plurality of output tiles of the output tensor are tiled, to generate a plurality of input tiles of an input tensor. The plurality of input tiles of the input tensor is processed in a second section of the processing graph.

Type: Application

Filed: June 30, 2021

Publication date: September 29, 2022

Applicant: SambaNova Systems, Inc.

Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
Lossless Tiling in Convolution Networks - Materialization of Tensors

Publication number: 20220309028

Abstract: Disclosed is a data processing system that includes a plurality of reconfigurable processors and processor memory. Runtime logic, operatively coupled to the plurality of reconfigurable processors and the processor memory, is configured to configure at least one reconfigurable processor in the plurality of reconfigurable processors with a first subgraph in a sequence of subgraphs of a graph; load an input onto the processor memory; on a tile-by-tile basis, process a first set of input tiles from the input through the first subgraph and generate a first set of intermediate tiles, load the first set of intermediate tiles onto the processor memory, and process the first set of intermediate tiles through the first subgraph and generate a first set of output tiles; and compose output tiles in the first set of output tiles into a first composed input, and load the first composed input onto the processor memory.

Type: Application

Filed: July 23, 2021

Publication date: September 29, 2022

Applicant: SambaNova Systems, Inc.

Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
Tensor Partitioning and Partition Access Order

Publication number: 20220309029

Abstract: A method of processing partitions of a tensor in a target order includes receiving, by a reorder unit and from two or more producer units, a plurality of partitions of a tensor in a first order that is different from the target order, storing the plurality of partitions in the reorder unit, and providing, from the reorder unit, the plurality of partitions in the target order to one or more consumer units. In an example, the one or more consumer units process the plurality of partitions in the target order.

Type: Application

Filed: September 16, 2021

Publication date: September 29, 2022

Applicant: SambaNova Systems, Inc.

Inventors: Raghu PRABHAKAR, Nathan Francis SHEELEY, Matheen MUSADDIQ, Scott Layson BURSON, Sitanshu GUPTA, Sumti JAIRATH, Pramod NATARAJA, Ajit PUNJ
LOSSLESS TILING IN CONVOLUTION NETWORKS - RESETTING OVERLAP FACTOR TO ZERO AT SECTION BOUNDARIES

Publication number: 20220309325

Abstract: A data processing system includes compile time logic to section a graph into a sequence of sections, including a first section followed by a second section. The compile time logic configured the first section to generate a first output in a first non-overlapping target configuration in response to processing an input in a first overlapping input configuration, and configures the second section to generate a second output in a second non-overlapping target configuration in response to processing the first output in a second overlapping input configuration. The compile time logic also creates a set of computer instructions to execute the first section and the second section on a target processing system.

Type: Application

Filed: April 4, 2022

Publication date: September 29, 2022

Applicant: SambaNova Systems, Inc.

Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH

prev 1 2 3 4 5 6 7 next