Patents by Inventor Sujeeth

Sujeeth has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220309325
    Abstract: A data processing system includes compile time logic to section a graph into a sequence of sections, including a first section followed by a second section. The compile time logic configured the first section to generate a first output in a first non-overlapping target configuration in response to processing an input in a first overlapping input configuration, and configures the second section to generate a second output in a second non-overlapping target configuration in response to processing the first output in a second overlapping input configuration. The compile time logic also creates a set of computer instructions to execute the first section and the second section on a target processing system.
    Type: Application
    Filed: April 4, 2022
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Publication number: 20220309322
    Abstract: A data processing system receives a graph that includes a sequence of layers and executes graph cuts between a preceding layer in the graph and a succeeding layer in the graph that succeeds the preceding layer. The preceding layer generates a set of tiles on a tile-by-tile basis and the succeeding layer processes a tensor that includes multiple tiles in the set of tiles. Thus the graph is partitioned into a sequence of subgraphs, and a subgraph in the sequence of subgraphs including a sub-sequence of layers in the sequence of layers. One or more configuration files is generated to configure runtime logic to execute the sequence of subgraphs and the one or more configuration files are stored on a computer-readable media.
    Type: Application
    Filed: March 4, 2022
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Publication number: 20220309316
    Abstract: Disclosed is a data processing system that includes compile time logic to section a graph into a sequence of sections including a first section and a second section. The compile time logic is to configure the first section with a first topology of tiling configurations in which to tile inputs, intermediate outputs, and final outputs of the first section, and configure the second section with a second topology of tiling configurations in which to tile inputs, intermediate outputs, and final outputs of the second section. The data processing system further includes runtime logic configured with the compile time logic to execute the first section to generate the inputs, intermediate outputs, and final outputs of the first section in the first topology of tiling configurations, and execute the second section to generate the inputs, intermediate outputs, and final outputs of the second section in the second topology of tiling configurations.
    Type: Application
    Filed: June 30, 2021
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Publication number: 20220309318
    Abstract: Disclosed is a method that includes generating by an output processing node of a first section of a processing graph, a plurality of output tiles of an output tensor. The plurality of output tiles of the output tensor is written in a memory, where the writing includes zero-padding the plurality of output tiles of the output tensor in the memory. The zero-padded plurality of output tiles of the output tensor are tiled, to generate a plurality of input tiles of an input tensor. The plurality of input tiles of the input tensor is processed in a second section of the processing graph.
    Type: Application
    Filed: June 30, 2021
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Publication number: 20220309317
    Abstract: Disclosed is a method that includes sectioning a graph into a sequence of sections, the sequence of sections including at least a first section followed by a second section. The first section is configured to generate a first output in a first target tiling configuration in response to processing a first input in a first input tiling configuration. The graph is configured to reconfigure the first output in the first target tiling configuration to a second input in a second input tiling configuration. The second section is configured to generate a second output in a second target tiling configuration in response to processing the second input in the second input tiling configuration.
    Type: Application
    Filed: June 30, 2021
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Publication number: 20220309324
    Abstract: A processing graph of an application with a sequence of processing nodes is obtained which processes an input and generates an intermediate representation a further intermediate representation, and an output representation of the input at stages in the sequence of processing nodes. Graph metadata is generated that specifies a non-overlapping target tiling configuration for the output representation, an overlapping tiling configuration for the input, an overlapping tiling configuration for the intermediate representation, and a third tiling configuration for the further intermediate representation. The processing graph is modified based on the graph metadata to conform to the parameters specified by the graph metadata. A set of computer instructions is then created to execute the modified processing graph on a target processing system.
    Type: Application
    Filed: March 21, 2022
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Publication number: 20220309028
    Abstract: Disclosed is a data processing system that includes a plurality of reconfigurable processors and processor memory. Runtime logic, operatively coupled to the plurality of reconfigurable processors and the processor memory, is configured to configure at least one reconfigurable processor in the plurality of reconfigurable processors with a first subgraph in a sequence of subgraphs of a graph; load an input onto the processor memory; on a tile-by-tile basis, process a first set of input tiles from the input through the first subgraph and generate a first set of intermediate tiles, load the first set of intermediate tiles onto the processor memory, and process the first set of intermediate tiles through the first subgraph and generate a first set of output tiles; and compose output tiles in the first set of output tiles into a first composed input, and load the first composed input onto the processor memory.
    Type: Application
    Filed: July 23, 2021
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Publication number: 20220309319
    Abstract: Disclosed is a data processing system that includes compile time logic to section a graph into a sequence of sections, configure a first section to generate a first set of output tiles in a first target tiling configuration in response to processing a first set of input tiles in a first input tiling configuration, and configure a second section to generate a second set of output tiles in a second target tiling configuration in response to processing the first set of output tiles in a second input tiling configuration. Runtime logic is configured to pad a first input into a first padded input, read the first set of input tiles from the first padded input in the first input tiling configuration, and process the first set of input tiles through the first section to generate the first set of output tiles in the first target tiling configuration.
    Type: Application
    Filed: September 16, 2021
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Publication number: 20220309323
    Abstract: A data processing system includes memory and reconfigurable processors, operatively coupled to the memory, configured to execute a sequence of subgraphs of a graph. The sequence of subgraphs includes a preceding subgraph and a succeeding subgraph. The data processing system also includes data flow logic, operatively coupled to the reconfigurable processors and the memory, configured to store a tiled output of the preceding subgraph as a composed input in the memory and make available parts of the composed input for processing by the succeeding subgraph.
    Type: Application
    Filed: March 21, 2022
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Publication number: 20220309027
    Abstract: Disclosed is a data processing system to receive a processing graph of an application. A compile time logic is configured to modify the processing graph and generate a modified processing graph. The modified processing graph is configured to apply a post-padding tiling after applying a cumulative input padding that confines padding to an input. The cumulative input padding pads the input into a padded input. The post-padding tiling tiles the padded input into a set of pre-padded input tiles with a same tile size, tiles intermediate representation of the input into a set of intermediate tiles with a same tile size, and tiles output representation of the input into a set of non-overlapping output tiles with a same tile size. Runtime logic is configured with the compile time logic to execute the modified processing graph to execute the application.
    Type: Application
    Filed: July 23, 2021
    Publication date: September 29, 2022
    Applicant: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu NAMA, Ruddhi CHAPHEKAR, Ram SIVARAMAKRISHNAN, Raghu PRABHAKAR, Sumti JAIRATH, Junjue WANG, Kaizhao LIANG, Adi FUCHS, Matheen MUSADDIQ, Arvind Krishna SUJEETH
  • Patent number: 11449752
    Abstract: Methods for gradient accumulation with free momentum are performed by systems and devices during neural network model training. An accumulator that includes a processor circuit and a memory element generates free momentum between passes of a neural network model training process. The processor circuit receives a difference weight (gradient) and generates a first input by applying a weighting parameter thereto. The processor circuit obtains a prior weight from the memory element and generates a second input by applying another weighting parameter thereto. The processor circuit generates a filtered input with momentum by filtering the first and second input. The memory element generates a stored next pass weight by accumulating the filtered input with the prior weight. A computing resource then processes the next pass of the neural network model training using the stored next pass weight. The methods, systems, and devices are applicable to pipelined model parallelism training processes.
    Type: Grant
    Filed: March 31, 2020
    Date of Patent: September 20, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Andrew Wagner, Marc Tremblay, Saurabh M. Kulkarni, Tiyasa Mitra, Sujeeth S. Bharadwaj
  • Publication number: 20220283820
    Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be adjusted to reduce the communication overhead. Multi-level parallel parameters reduction may be performed at the parameter server and the target device.
    Type: Application
    Filed: May 24, 2022
    Publication date: September 8, 2022
    Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Sujeeth Subramanya Bharadwaj, Devangkumar Patel, Jinwen Xi, Maral Mesmakhosroshahi
  • Patent number: 11436491
    Abstract: Improved convolutional neural network-based machine learning models are disclosed herein. A convolutional neural network is configured to decompose feature maps generated based on a data item to be classified. The feature maps are decomposed into a first and second subsets. The first subset is representative of high frequency components of the data item, and the second subset is representative of low frequency components of the data item. The second subset is upsampled and is combined with the first subset. The combined feature maps are convolved with a filter to extract a set of features associated with the data item. The first subset is also downsampled and combined with the second subset. The combined feature maps are convolved with a filter to extract another set of features. The data item is classified based on the sets of features extracted based on the convolution operations.
    Type: Grant
    Filed: March 13, 2020
    Date of Patent: September 6, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Sujeeth S. Bharadwaj, Bharadwaj Pudipeddi, Marc Tremblay
  • Patent number: 11436019
    Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be adjusted to reduce the communication overhead. Multi-level parallel parameters reduction may be performed at the parameter server and the target device.
    Type: Grant
    Filed: September 30, 2019
    Date of Patent: September 6, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Sujeeth Subramanya Bharadwaj, Devangkumar Patel, Jinwen Xi, Maral Mesmakhosroshahi
  • Publication number: 20220249076
    Abstract: A device having a sheath, a medical device, a stylet, a handle and a plunger device. The handle includes portions connected proximal ends of the sheath, the medical device and the stylet and a chamber portion connected to the actuator. The chamber portion includes a volume of space configured to volumetrically connect to at a lumen of the medical device, the sheath or the stylet and a plunger device configured to be slidably received within the chamber portion. The plunger device is able to pneumatically isolate a proximal portion of the volume of space from a distal portion of the volume of space. Proximal movement of the stylet and the plunger device cause a suction effect (i.e., reduced pressure) at the distal end of the sheath, the medical device and/or the stylet.
    Type: Application
    Filed: March 29, 2022
    Publication date: August 11, 2022
    Inventors: Hugo X. Gonzalez, Sujeeth Parthiban, Chenhao Fu, Michael S. Smith
  • Patent number: 11406365
    Abstract: A device having a sheath, a medical device, a stylet, a handle having a chamber portion and a plunger device. The handle connects to proximal ends of the sheath, the medical device, and the stylet. The chamber portion includes a volume of space configured to volumetrically connect to at a lumen of the medical device, the sheath or the stylet. The plunger device slides and moves within the chamber portion. The plunger device is able to pneumatically or hydraulically isolate a proximal portion of the volume of space from a distal portion of the volume of space. Proximal movement of the stylet and the plunger device cause a suction effect (i.e., negative or reduced pressure) at the distal end of the sheath and/or the medical device.
    Type: Grant
    Filed: March 23, 2017
    Date of Patent: August 9, 2022
    Assignee: Gyrus ACMI, Inc.
    Inventors: Hugo X. Gonzalez, Sujeeth Parthiban, Chenhao Fu, Michael S. Smith
  • Patent number: 11354579
    Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. This paradigm of executing one portion of the AI model at a time allows for dynamic execution of the large AI model.
    Type: Grant
    Filed: September 30, 2019
    Date of Patent: June 7, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Sujeeth Subramanya Bharadwaj, Jinwen Xi, Maral Mesmakhosroshahi
  • Patent number: 11263170
    Abstract: Disclosed is a data processing system to receive a processing graph of an application. A compile time logic is configured to modify the processing graph and generate a modified processing graph. The modified processing graph is configured to apply a post-padding tiling after applying a cumulative input padding that confines padding to an input. The cumulative input padding pads the input into a padded input. The post-padding tiling tiles the padded input into a set of pre-padded input tiles with a same tile size, tiles intermediate representation of the input into a set of intermediate tiles with a same tile size, and tiles output representation of the input into a set of non-overlapping output tiles with a same tile size. Runtime logic is configured with the compile time logic to execute the modified processing graph to execute the application.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: March 1, 2022
    Assignee: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth
  • Patent number: 11250061
    Abstract: Disclosed is a data processing system which includes compile time logic configured to section a graph into a sequence of subgraphs, the sequence of subgraphs including at least a first subgraph. The compile time logic configures the first subgraph to generate a plurality of output tiles of an output tensor. A runtime logic configured with the compile time logic is to execute the sequence of subgraphs to generate, at the output of the first subgraph, the plurality of output tiles of the output tensor, and write the plurality of output tiles in a memory in an overlapping configuration. In an example, an overlapping region between any two neighboring output tiles of the plurality of output tiles comprises a summation of a corresponding region of a first neighboring output tile and a corresponding region of a second neighboring output tile.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: February 15, 2022
    Assignee: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu Nama, Ruddhi Arun Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Sujeeth
  • Patent number: 11232360
    Abstract: Disclosed is a data processing system that includes compile time logic configured to process a processing graph to generate a modified processing graph, which includes a plurality of forward processing nodes of a forward pass and a plurality of backward processing nodes of a backward pass. The data processing system also includes runtime logic configured with the compile time logic to execute the modified processing graph to generate, at a backward processing node of the plurality of backward processing nodes, a plurality of partial weight gradients, based on processing a corresponding plurality of gradient tiles of a gradient tensor, and generate, based on the plurality of partial weight gradients, a final weight gradient corresponding to the gradient tensor.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: January 25, 2022
    Assignee: SambaNova Systems, Inc.
    Inventors: Tejas Nagendra Babu Nama, Ruddhi Chaphekar, Ram Sivaramakrishnan, Raghu Prabhakar, Sumti Jairath, Junjue Wang, Kaizhao Liang, Adi Fuchs, Matheen Musaddiq, Arvind Krishna Sujeeth