Patents by Inventor Arvind Krishna Sujeeth

Arvind Krishna Sujeeth has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11934343
    Abstract: Disclosed is a data processing system to receive a processing graph of an application. A compile time logic is configured to modify the processing graph and generate a modified processing graph. The modified processing graph is configured to apply a post-padding tiling after applying a cumulative input padding that confines padding to an input. The cumulative input padding pads the input into a padded input. The post-padding tiling tiles the padded input into a set of pre-padded input tiles with a same tile size, tiles intermediate representation of the input into a set of intermediate tiles with a same tile size, and tiles output representation of the input into a set of non-overlapping output tiles with a same tile size. Runtime logic is configured with the compile time logic to execute the modified processing graph to execute the application.
    Type: Grant
    Filed: July 23, 2021
    Date of Patent: March 19, 2024
    Assignee: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth
  • Publication number: 20240020170
    Abstract: A cost estimation tool in a system for implementing an operation unit graph on a reconfigurable processor is presented as well as a method of operating a cost estimation tool for estimating a cost of implementing an operation unit graph. The operation unit graph may include first and second logical units that perform first and second data operations and have first and second ports, respectively, coupled by a logical edge, on a reconfigurable processor. The method includes receiving the operation unit graph, determining first and second upper bandwidth limits of the first and second ports, respectively, determining a logical edge bandwidth of the logical edge based on the first and second upper bandwidth limits, determining a timing group for the logical edge, and providing the logical edge bandwidth and the timing group as a cost estimation of implementing the operation unit graph on the reconfigurable processor.
    Type: Application
    Filed: July 13, 2023
    Publication date: January 18, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Yue FU, Kin Hing LEUNG, Arvind Krishna SUJEETH, Sumti JAIRATH, Andrew DENG, Chris RÉ, Raghu PRABHAKAR
  • Publication number: 20240020264
    Abstract: A cost estimation tool in a system for implementing an operation unit graph on a reconfigurable processor is presented as well as a method of operating a cost estimation tool for determining scaled logical edge bandwidths in an operation unit graph in preparation of placing and routing the operation unit graph onto a reconfigurable processor. The cost estimation tool may be configured to receive the operation unit graph, divide the operation unit graph in first and second subgraphs, determine maximum latencies of the first and second subgraphs, and determine a scaled logical edge bandwidth of a logical edge that couples a first logical unit of M logical units in the first subgraph with a second logical unit of N logical units in the first subgraph based on M, N, and scaled bandwidth limits of the M and N logical units.
    Type: Application
    Filed: July 13, 2023
    Publication date: January 18, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Yue FU, Kin Hing LEUNG, Joshua BROT, Arvind Krishna SUJEETH, Sumti JAIRATH, Andrew DENG, Chris RÉ, Raghu PRABHAKAR
  • Publication number: 20240020265
    Abstract: A system with a cost estimation tool for estimating a realized bandwidth consumption of a logical edge between a logical producer unit and a logical consumer unit of an operation unit graph during placement and routing of the logical producer unit, the logical consumer unit, and the logical edge onto a reconfigurable processor is presented as well as a method of operating such a cost estimation tool and a non-transitory computer-readable storage medium including instructions that, when executed by a processing unit, cause the processing unit to operate such a cost estimation tool The cost estimation tool may be configured to determine the realized bandwidth consumption of the tentative assignment based on an upper bandwidth limit of the logical edge, an end-to-end bandwidth, a scaling factor of a realized bandwidth, and a congestion estimation of the physical link.
    Type: Application
    Filed: July 13, 2023
    Publication date: January 18, 2024
    Applicant: SambaNova Systems, Inc.
    Inventors: Yue FU, Kin Hing LEUNG, Likun HAO, Arvind Krishna SUJEETH, Sumti JAIRATH, Andrew DENG, Chris RÉ, Raghu PRABHAKAR
  • Publication number: 20230315407
    Abstract: According to a computing method a compiler determines a recompute node included in a dataflow application and a checkpoint tensor produced by the recompute node. The compiler determines a recompute cost to recompute the checkpoint tensor, and a memory cost to checkpoint the checkpoint tensor in a memory. Based on the recompute cost and/or the memory cost, the compiler determines a solution cost and compares the solution cost to a solution threshold. Based on comparing the solution cost to the solution threshold, the compiler determines a checkpoint solution to execute the dataflow application. The checkpoint solution can comprise recomputing or checkpointing the checkpoint tensor. In some implementations, the compiler can determine a recompute ratio of the recompute cost to the memory cost and can compare the recompute ratio to the solution threshold. A computer program product and a computing system can implement aspects of the method.
    Type: Application
    Filed: March 31, 2023
    Publication date: October 5, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Bowen YANG, Zhuo CHEN, Fei WANG, Venkat Krishna SRINIVASAN, Chen LIU, Junjue WANG, Arvind Krishna SUJEETH, Sumti JAIRATH
  • Publication number: 20230315802
    Abstract: A method comprises a compiler generating a MI (mixed integer) model to determine mapping decisions to map a dataflow application to hardware of a computing system to execute the application. The MI model comprises MI equations to solve by an MI solver. The MI equations include equations of an objective function corresponding to an optimization objective. The MI equations can comprise decision variables and equations and constraint variables and equations. The compiler outputs the MI model to the MI solver and invokes the MI solver to compute an MI solution comprising solutions to equations among the equations included in the MI model. The compiler receives the MI solution and generates a globally optimized mapping decision based on the MI solution. The MI solver can comprise a commercial program to solve MI linear equations. A computer program product and a computing system can implement the method.
    Type: Application
    Filed: March 29, 2023
    Publication date: October 5, 2023
    Applicant: SambaNova Systems, Inc.
    Inventors: Junjue WANG, Blaine Burton RISTER, Zhichao MA, Zhuo CHEN, Andrew DENG, Sumti JAIRATH, Arvind Krishna SUJEETH
  • Publication number: 20220309317
    Abstract: Disclosed is a method that includes sectioning a graph into a sequence of sections, the sequence of sections including at least a first section followed by a second section. The first section is configured to generate a first output in a first target tiling configuration in response to processing a first input in a first input tiling configuration. The graph is configured to reconfigure the first output in the first target tiling configuration to a second input in a second input tiling configuration. The second section is configured to generate a second output in a second target tiling configuration in response to processing the second input in the second input tiling configuration.
    Type: Application
    Filed: June 30, 2021
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Publication number: 20220309318
    Abstract: Disclosed is a method that includes generating by an output processing node of a first section of a processing graph, a plurality of output tiles of an output tensor. The plurality of output tiles of the output tensor is written in a memory, where the writing includes zero-padding the plurality of output tiles of the output tensor in the memory. The zero-padded plurality of output tiles of the output tensor are tiled, to generate a plurality of input tiles of an input tensor. The plurality of input tiles of the input tensor is processed in a second section of the processing graph.
    Type: Application
    Filed: June 30, 2021
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Publication number: 20220309322
    Abstract: A data processing system receives a graph that includes a sequence of layers and executes graph cuts between a preceding layer in the graph and a succeeding layer in the graph that succeeds the preceding layer. The preceding layer generates a set of tiles on a tile-by-tile basis and the succeeding layer processes a tensor that includes multiple tiles in the set of tiles. Thus the graph is partitioned into a sequence of subgraphs, and a subgraph in the sequence of subgraphs including a sub-sequence of layers in the sequence of layers. One or more configuration files is generated to configure runtime logic to execute the sequence of subgraphs and the one or more configuration files are stored on a computer-readable media.
    Type: Application
    Filed: March 4, 2022
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Publication number: 20220309323
    Abstract: A data processing system includes memory and reconfigurable processors, operatively coupled to the memory, configured to execute a sequence of subgraphs of a graph. The sequence of subgraphs includes a preceding subgraph and a succeeding subgraph. The data processing system also includes data flow logic, operatively coupled to the reconfigurable processors and the memory, configured to store a tiled output of the preceding subgraph as a composed input in the memory and make available parts of the composed input for processing by the succeeding subgraph.
    Type: Application
    Filed: March 21, 2022
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Publication number: 20220309325
    Abstract: A data processing system includes compile time logic to section a graph into a sequence of sections, including a first section followed by a second section. The compile time logic configured the first section to generate a first output in a first non-overlapping target configuration in response to processing an input in a first overlapping input configuration, and configures the second section to generate a second output in a second non-overlapping target configuration in response to processing the first output in a second overlapping input configuration. The compile time logic also creates a set of computer instructions to execute the first section and the second section on a target processing system.
    Type: Application
    Filed: April 4, 2022
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Publication number: 20220309324
    Abstract: A processing graph of an application with a sequence of processing nodes is obtained which processes an input and generates an intermediate representation a further intermediate representation, and an output representation of the input at stages in the sequence of processing nodes. Graph metadata is generated that specifies a non-overlapping target tiling configuration for the output representation, an overlapping tiling configuration for the input, an overlapping tiling configuration for the intermediate representation, and a third tiling configuration for the further intermediate representation. The processing graph is modified based on the graph metadata to conform to the parameters specified by the graph metadata. A set of computer instructions is then created to execute the modified processing graph on a target processing system.
    Type: Application
    Filed: March 21, 2022
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Publication number: 20220309316
    Abstract: Disclosed is a data processing system that includes compile time logic to section a graph into a sequence of sections including a first section and a second section. The compile time logic is to configure the first section with a first topology of tiling configurations in which to tile inputs, intermediate outputs, and final outputs of the first section, and configure the second section with a second topology of tiling configurations in which to tile inputs, intermediate outputs, and final outputs of the second section. The data processing system further includes runtime logic configured with the compile time logic to execute the first section to generate the inputs, intermediate outputs, and final outputs of the first section in the first topology of tiling configurations, and execute the second section to generate the inputs, intermediate outputs, and final outputs of the second section in the second topology of tiling configurations.
    Type: Application
    Filed: June 30, 2021
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Publication number: 20220309027
    Abstract: Disclosed is a data processing system to receive a processing graph of an application. A compile time logic is configured to modify the processing graph and generate a modified processing graph. The modified processing graph is configured to apply a post-padding tiling after applying a cumulative input padding that confines padding to an input. The cumulative input padding pads the input into a padded input. The post-padding tiling tiles the padded input into a set of pre-padded input tiles with a same tile size, tiles intermediate representation of the input into a set of intermediate tiles with a same tile size, and tiles output representation of the input into a set of non-overlapping output tiles with a same tile size. Runtime logic is configured with the compile time logic to execute the modified processing graph to execute the application.
    Type: Application
    Filed: July 23, 2021
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Publication number: 20220309028
    Abstract: Disclosed is a data processing system that includes a plurality of reconfigurable processors and processor memory. Runtime logic, operatively coupled to the plurality of reconfigurable processors and the processor memory, is configured to configure at least one reconfigurable processor in the plurality of reconfigurable processors with a first subgraph in a sequence of subgraphs of a graph; load an input onto the processor memory; on a tile-by-tile basis, process a first set of input tiles from the input through the first subgraph and generate a first set of intermediate tiles, load the first set of intermediate tiles onto the processor memory, and process the first set of intermediate tiles through the first subgraph and generate a first set of output tiles; and compose output tiles in the first set of output tiles into a first composed input, and load the first composed input onto the processor memory.
    Type: Application
    Filed: July 23, 2021
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Publication number: 20220309319
    Abstract: Disclosed is a data processing system that includes compile time logic to section a graph into a sequence of sections, configure a first section to generate a first set of output tiles in a first target tiling configuration in response to processing a first set of input tiles in a first input tiling configuration, and configure a second section to generate a second set of output tiles in a second target tiling configuration in response to processing the first set of output tiles in a second input tiling configuration. Runtime logic is configured to pad a first input into a first padded input, read the first set of input tiles from the first padded input in the first input tiling configuration, and process the first set of input tiles through the first section to generate the first set of output tiles in the first target tiling configuration.
    Type: Application
    Filed: September 16, 2021
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Patent number: 11263170
    Abstract: Disclosed is a data processing system to receive a processing graph of an application. A compile time logic is configured to modify the processing graph and generate a modified processing graph. The modified processing graph is configured to apply a post-padding tiling after applying a cumulative input padding that confines padding to an input. The cumulative input padding pads the input into a padded input. The post-padding tiling tiles the padded input into a set of pre-padded input tiles with a same tile size, tiles intermediate representation of the input into a set of intermediate tiles with a same tile size, and tiles output representation of the input into a set of non-overlapping output tiles with a same tile size. Runtime logic is configured with the compile time logic to execute the modified processing graph to execute the application.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: March 1, 2022
    Assignee: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth
  • Patent number: 11232360
    Abstract: Disclosed is a data processing system that includes compile time logic configured to process a processing graph to generate a modified processing graph, which includes a plurality of forward processing nodes of a forward pass and a plurality of backward processing nodes of a backward pass. The data processing system also includes runtime logic configured with the compile time logic to execute the modified processing graph to generate, at a backward processing node of the plurality of backward processing nodes, a plurality of partial weight gradients, based on processing a corresponding plurality of gradient tiles of a gradient tensor, and generate, based on the plurality of partial weight gradients, a final weight gradient corresponding to the gradient tensor.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: January 25, 2022
    Assignee: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth
  • Patent number: 11227207
    Abstract: Disclosed is a data processing system that includes compile time logic to section a graph into a sequence of sections, configure a first section to generate a first set of output tiles in a first target tiling configuration in response to processing a first set of input tiles in a first input tiling configuration, and configure a second section to generate a second set of output tiles in a second target tiling configuration in response to processing the first set of output tiles in a second input tiling configuration. Runtime logic is configured to pad a first input into a first padded input, read the first set of input tiles from the first padded input in the first input tiling configuration, and process the first set of input tiles through the first section to generate the first set of output tiles in the first target tiling configuration.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: January 18, 2022
    Assignee: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth
  • Patent number: 11195080
    Abstract: Disclosed is a data processing system that includes compile time logic configured to section a graph into a sequence of sections, and configure each section of the sequence of sections such that an input layer of a section processes an input, one or more intermediate layers of the corresponding section processes corresponding one or more intermediate outputs, and a final layer of the corresponding section generates a final output. The final output has a non-overlapping final tiling configuration, the one or more intermediate outputs have corresponding one or more overlapping intermediate tiling configurations, and the input has an overlapping input tiling configuration. The compile time logic is further to determine the various tiling configurations by starting from the final layer and reverse traversing through the one or more intermediate layers, and ending with the input layer.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: December 7, 2021
    Assignee: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth