Patents by Inventor Bharadwaj Pudipeddi

Bharadwaj Pudipeddi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230385118
    Abstract: Systems and methods for selective execution of workloads using hardware accelerators are described. A method includes a client application submitting a command for execution of a workload directly to a hardware accelerator, where the command includes an indication of a performance expectation from the hardware accelerator, and where the workload can be executed either by a compute core accessible to the client application or by the hardware accelerator. The method further includes upon receiving a retry response from the hardware accelerator, the client application executing the workload using the compute core accessible to the client application, where the hardware accelerator is configured to provide the retry response directly to the client application after determining that the hardware accelerator is unable to meet the performance expectation.
    Type: Application
    Filed: May 26, 2023
    Publication date: November 30, 2023
    Inventors: John Allen TARDIF, Bharadwaj PUDIPEDDI
  • Publication number: 20230274130
    Abstract: Systems and methods related to hardware-assisted gradient optimization using streamed gradients are described. An example method in a system comprising a memory configured to store weights associated with a neural network model comprising L layers, where L is an integer greater than one, a gradient optimizer, and a plurality of workers is described. The method includes during a single burst cycle moving a first set of gradients, received from each of the plurality of workers, from at least one gradient buffer to the gradient optimizer and moving weights from at least one buffer, coupled to the memory, to the gradient optimizer. The method further includes during the single burst cycle writing back the new weights, calculated by the gradient optimizer, to the memory. The method further includes during the single burst cycle transmitting the new weights, from the gradient optimizer, to each of the plurality of workers.
    Type: Application
    Filed: May 3, 2023
    Publication date: August 31, 2023
    Inventors: Jinwen XI, Bharadwaj PUDIPEDDI, Marc TREMBLAY
  • Publication number: 20230244945
    Abstract: Systems and methods related to dual-momentum gradient optimization with reduced memory requirements are described. An example method in a system comprising a gradient optimizer and a memory configured to store momentum values associated with a neural network model comprising L layers is described. The method includes retrieving from the memory a first set of momentum values and a second set of momentum values, corresponding to a layer of the neural network model, having a selected storage format. The method further includes converting the first set of momentum values to a third set of momentum values having a training format associated with the gradient optimizer and converting the second set of momentum values to a fourth set of momentum values having a training format associated with the gradient optimizer. The method further includes performing gradient optimization using the third set of momentum values and the fourth set of momentum values.
    Type: Application
    Filed: April 11, 2023
    Publication date: August 3, 2023
    Inventors: Jinwen XI, Bharadwaj PUDIPEDDI, Marc TREMBLAY
  • Patent number: 11681905
    Abstract: Systems and methods related to hardware-assisted gradient optimization using streamed gradients are described. An example method in a system comprising a memory configured to store weights associated with a neural network model comprising L layers, where L is an integer greater than one, a gradient optimizer, and a plurality of workers is described. The method includes during a single burst cycle moving a first set of gradients, received from each of the plurality of workers, from at least one gradient buffer to the gradient optimizer and moving weights from at least one buffer, coupled to the memory, to the gradient optimizer. The method further includes during the single burst cycle writing back the new weights, calculated by the gradient optimizer, to the memory. The method further includes during the single burst cycle transmitting the new weights, from the gradient optimizer, to each of the plurality of workers.
    Type: Grant
    Filed: March 23, 2020
    Date of Patent: June 20, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jinwen Xi, Bharadwaj Pudipeddi, Marc Tremblay
  • Patent number: 11675654
    Abstract: Embodiments of the present disclosure include an error recovery method comprising detecting a computing error, restarting a first artificial intelligence processor of a plurality of artificial intelligence processors processing a data set, and loading a model in the artificial intelligence processor, wherein the model corresponds to a same model processed by the plurality of artificial intelligence processors during a previous processing iteration by the plurality of artificial intelligence processors on data from the data set.
    Type: Grant
    Filed: December 16, 2021
    Date of Patent: June 13, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Bharadwaj Pudipeddi, Maral Mesmakhosroshahi, Jinwen Xi, Saurabh M. Kulkarni, Marc Tremblay, Matthias Baenninger, Nuno Claudino Pereira Lopes
  • Patent number: 11651228
    Abstract: Systems and methods related to dual-momentum gradient optimization with reduced memory requirements are described. An example method in a system comprising a gradient optimizer and a memory configured to store momentum values associated with a neural network model comprising L layers is described. The method includes retrieving from the memory a first set of momentum values and a second set of momentum values, corresponding to a layer of the neural network model, having a selected storage format. The method further includes converting the first set of momentum values to a third set of momentum values having a training format associated with the gradient optimizer and converting the second set of momentum values to a fourth set of momentum values having a training format associated with the gradient optimizer. The method further includes performing gradient optimization using the third set of momentum values and the fourth set of momentum values.
    Type: Grant
    Filed: April 17, 2020
    Date of Patent: May 16, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jinwen Xi, Bharadwaj Pudipeddi, Marc Tremblay
  • Patent number: 11615301
    Abstract: Systems, methods, and apparatuses are provided for compressing values. A plurality of parameters may be obtained from a memory, each parameter comprising a floating-point number that is used in a relationship between artificial neurons or nodes in a model. A mantissa value and an exponent value may be extracted from each floating-point number to generate a set of mantissa values and a set of exponent values. The set of mantissa values may be compressed to generate a mantissa lookup table (LUT) and a plurality of mantissa LUT index values. The set of exponent values may be encoded to generate an exponent LUT and a plurality of exponent LUT index values. The mantissa LUT, mantissa LUT index values, exponent LUT, and exponent LUT index values may be provided to one or more processing entities to train the model.
    Type: Grant
    Filed: September 3, 2019
    Date of Patent: March 28, 2023
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Jinwen Xi, Bharadwaj Pudipeddi, Marc Tremblay
  • Patent number: 11520592
    Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be manually or automatically adjusted to reduce the communication overhead.
    Type: Grant
    Filed: September 20, 2019
    Date of Patent: December 6, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Gautham Popuri, Layali Rashid, Tiyasa Mitra, Mohit Mittal, Maral Mesmakhosroshahi
  • Publication number: 20220283820
    Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be adjusted to reduce the communication overhead. Multi-level parallel parameters reduction may be performed at the parameter server and the target device.
    Type: Application
    Filed: May 24, 2022
    Publication date: September 8, 2022
    Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Sujeeth Subramanya Bharadwaj, Devangkumar Patel, Jinwen Xi, Maral Mesmakhosroshahi
  • Patent number: 11436019
    Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be adjusted to reduce the communication overhead. Multi-level parallel parameters reduction may be performed at the parameter server and the target device.
    Type: Grant
    Filed: September 30, 2019
    Date of Patent: September 6, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Sujeeth Subramanya Bharadwaj, Devangkumar Patel, Jinwen Xi, Maral Mesmakhosroshahi
  • Patent number: 11436491
    Abstract: Improved convolutional neural network-based machine learning models are disclosed herein. A convolutional neural network is configured to decompose feature maps generated based on a data item to be classified. The feature maps are decomposed into a first and second subsets. The first subset is representative of high frequency components of the data item, and the second subset is representative of low frequency components of the data item. The second subset is upsampled and is combined with the first subset. The combined feature maps are convolved with a filter to extract a set of features associated with the data item. The first subset is also downsampled and combined with the second subset. The combined feature maps are convolved with a filter to extract another set of features. The data item is classified based on the sets of features extracted based on the convolution operations.
    Type: Grant
    Filed: March 13, 2020
    Date of Patent: September 6, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Sujeeth S. Bharadwaj, Bharadwaj Pudipeddi, Marc Tremblay
  • Publication number: 20220276871
    Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be manually or automatically adjusted to reduce the communication overhead.
    Type: Application
    Filed: May 18, 2022
    Publication date: September 1, 2022
    Inventors: Bharadwaj PUDIPEDDI, Marc TREMBLAY, Gautham POPURI, Layali RASHID, Tiyasa MITRA, Mohit MITTAL, Maral MESMAKHOSROSHAHI
  • Patent number: 11354579
    Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. This paradigm of executing one portion of the AI model at a time allows for dynamic execution of the large AI model.
    Type: Grant
    Filed: September 30, 2019
    Date of Patent: June 7, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Sujeeth Subramanya Bharadwaj, Jinwen Xi, Maral Mesmakhosroshahi
  • Publication number: 20220108209
    Abstract: Techniques for shared memory spaces in data and model parallelism are provided to improve memory efficiency and memory access speed. A shared memory space may be established at a host system or in a hardware memory agent. The shared memory may store training data or model parameters for an artificial intelligence model at a memory address in one or more memory circuits. Data for the artificial intelligence model may be processed across a plurality of artificial intelligence accelerators using the training data or the model parameters of the shared memory space. That is, multiple accelerators access one copy of the data from the shared memory space instead of accessing their own separate memory space.
    Type: Application
    Filed: October 5, 2020
    Publication date: April 7, 2022
    Inventors: Bharadwaj PUDIPEDDI, Jinwen XI, Maral MESMAKHOSROSHAHI, Gurupurna VASISHT
  • Publication number: 20220107864
    Abstract: Embodiments of the present disclosure include an error recovery method comprising detecting a computing error, restarting a first artificial intelligence processor of a plurality of artificial intelligence processors processing a data set, and loading a model in the artificial intelligence processor, wherein the model corresponds to a same model processed by the plurality of artificial intelligence processors during a previous processing iteration by the plurality of artificial intelligence processors on data from the data set.
    Type: Application
    Filed: December 16, 2021
    Publication date: April 7, 2022
    Inventors: Bharadwaj PUDIPEDDI, Maral MESMAKHOSROSHAHI, Jinwen XI, Saurabh M. KULKARNI, Marc TREMBLAY, Matthias BAENNINGER, Nuno CLAUDINO PEREIRA LOPES
  • Publication number: 20220027724
    Abstract: Embodiments of the present disclosure include systems and methods for training neural networks. In one embodiment, data for an artificial intelligence model is processed in a first plurality of stages and in a second plurality of stages. The first and second pluralities of stages form a pipeline. One or more of the first plurality of stages uses at least one memory associated with a corresponding one or more of the second plurality of stages to balance memory across the pipeline.
    Type: Application
    Filed: July 27, 2020
    Publication date: January 27, 2022
    Inventors: Bharadwaj PUDIPEDDI, Juliana PatrĂ­cia Vicente FRANCO
  • Patent number: 11226859
    Abstract: Embodiments of the present disclosure include an error recovery method comprising detecting a computing error, restarting a first artificial intelligence processor of a plurality of artificial intelligence processors processing a data set, and loading a model in the artificial intelligence processor, wherein the model corresponds to a same model processed by the plurality of artificial intelligence processors during a previous processing iteration by the plurality of artificial intelligence processors on data from the data set.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: January 18, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Bharadwaj Pudipeddi, Maral Mesmakhosroshahi, Jinwen Xi, Saurabh M. Kulkarni, Marc Tremblay, Matthias Baenninger, Nuno Claudino Pereira Lopes
  • Publication number: 20210326711
    Abstract: Systems and methods related to dual-momentum gradient optimization with reduced memory requirements are described. An example method in a system comprising a gradient optimizer and a memory configured to store momentum values associated with a neural network model comprising L layers is described. The method includes retrieving from the memory a first set of momentum values and a second set of momentum values, corresponding to a layer of the neural network model, having a selected storage format. The method further includes converting the first set of momentum values to a third set of momentum values having a training format associated with the gradient optimizer and converting the second set of momentum values to a fourth set of momentum values having a training format associated with the gradient optimizer. The method further includes performing gradient optimization using the third set of momentum values and the fourth set of momentum values.
    Type: Application
    Filed: April 17, 2020
    Publication date: October 21, 2021
    Inventors: Jinwen XI, Bharadwaj PUDIPEDDI, Marc TREMBLAY
  • Publication number: 20210295141
    Abstract: Systems and methods related to hardware-assisted gradient optimization using streamed gradients are described. An example method in a system comprising a memory configured to store weights associated with a neural network model comprising L layers, where L is an integer greater than one, a gradient optimizer, and a plurality of workers is described. The method includes during a single burst cycle moving a first set of gradients, received from each of the plurality of workers, from at least one gradient buffer to the gradient optimizer and moving weights from at least one buffer, coupled to the memory, to the gradient optimizer. The method further includes during the single burst cycle writing back the new weights, calculated by the gradient optimizer, to the memory. The method further includes during the single burst cycle transmitting the new weights, from the gradient optimizer, to each of the plurality of workers.
    Type: Application
    Filed: March 23, 2020
    Publication date: September 23, 2021
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Jinwen XI, Bharadwaj PUDIPEDDI, Marc TREMBLAY
  • Publication number: 20210287083
    Abstract: Improved convolutional neural network-based machine learning models are disclosed herein. A convolutional neural network is configured to decompose feature maps generated based on a data item to be classified. The feature maps are decomposed into a first and second subsets. The first subset is representative of high frequency components of the data item, and the second subset is representative of low frequency components of the data item. The second subset is upsampled and is combined with the first subset. The combined feature maps are convolved with a filter to extract a set of features associated with the data item. The first subset is also downsampled and combined with the second subset. The combined feature maps are convolved with a filter to extract another set of features. The data item is classified based on the sets of features extracted based on the convolution operations.
    Type: Application
    Filed: March 13, 2020
    Publication date: September 16, 2021
    Inventors: Sujeeth S. Bharadwaj, Bharadwaj Pudipeddi, Marc Tremblay