Patents by Inventor Urmish Ajit Thakker

Urmish Ajit Thakker has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ALL-SHOT TRAINING OF LARGE LANGUAGE MODELS

Publication number: 20250148276

Abstract: Embodiments described herein provide systems and techniques for training large language models. In one aspect, a process for performing in-context training of a language model is disclosed. This process may begin by receiving a language model that includes a context window of a predetermined size, as well as receiving a set of in-context prompt/completion pairs prepared for a target task. The process then constructs a first token sequence based on the set of in-context prompt/completion pairs. Next, the process fits the first token sequence into the context window. The process subsequently performs a first in-context training pass using the first token sequence to train the language model to generate a next token in accordance with the target task.

Type: Application

Filed: November 6, 2023

Publication date: May 8, 2025

Applicant: SambaNova Systems, Inc.

Inventors: Zoltan Csaki, Bo Li, Urmish Ajit Thakker, Venkat Krishna SRINIVASAN
COMBINING IN-CONTEXT LEARNING AND INSTRUCTION TUNING TO ENHANCE LARGE LANGUAGE MODELS

Publication number: 20250148205

Abstract: Embodiments described herein provide systems and techniques for training large language models. In one aspect, a process for performing in-context few-shot training for a transformer-based language model is disclosed. This process may begin by receiving the transformer-based language model having a context window of a predetermined size, as well as a training dataset comprising a set of prompt/completion examples. The process then constructs a training sequence based on the training dataset. Next, the process performs a single forward pass using the training sequence as input. The process subsequently performs a set of backward passes from a subset of examples in the training sequence, wherein each backward pass is conditioned on a selected subset of prompt/completion examples in the training sequence.

Type: Application

Filed: November 6, 2023

Publication date: May 8, 2025

Applicant: SambaNova Systems, Inc.

Inventors: Zoltan Csaki, Bo Li, Urmish Ajit Thakker, Venkat Krishna SRINIVASAN
Skip predictor for pre-trained recurrent neural networks

Patent number: 11663814

Abstract: The present disclosure advantageously provides a system and a method for skipping recurrent neural network (RNN) state updates using a skip predictor. Sequential input data are received and divided into sequences of input data values, each input data value being associated with a different time step for a pre-trained RNN model. At each time step, the hidden state vector for a prior time step is received from the pre-trained RNN model, and a determination, based on the input data value and the hidden state vector for at least one prior time step, is made whether to provide or not provide the input data value associated with the time step to the pre-trained RNN model for processing. When the input data value is not provided, the pre-trained RNN model does not update its hidden state vector. Importantly, the skip predictor is trained without retraining the pre-trained RNN model.

Type: Grant

Filed: April 22, 2020

Date of Patent: May 30, 2023

Assignee: Arm Limited

Inventors: Urmish Ajit Thakker, Jin Tao, Ganesh Suryanarayan Dasika, Jesse Garrett Beu
SYSTEM, DEVICES AND/OR PROCESSES FOR ADAPTING NEURAL NETWORK PROCESSING DEVICES

Publication number: 20220405597

Abstract: Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, using one or more computing devices to adapt a computing device to classify physical features in a deployment environment. In a particular implementation, computing resources may be selectively de-allocated from at least one of one or more elements of a computing architecture based, at least in part, on assessed impacts to the one or more elements of the computing architecture.

Type: Application

Filed: June 16, 2021

Publication date: December 22, 2022

Inventors: Urmish Ajit Thakker, Jesse Garrett Beu, Dibakar Gope, Mark John O'Connor
Hardware accelerator for natural language processing applications

Patent number: 11507841

Abstract: The present disclosure advantageously provides a hardware accelerator for a natural language processing application including a first memory, a second memory, and a computing engine (CE). The first memory is configured to store a configurable NLM and a set of NLM fixed weights. The second memory is configured to store an ANN model, a set of ANN weights, a set of NLM delta weights, input data and output data. The set of NLM delta weights may be smaller than the set of NLM fixed weights, and each NLM delta weight corresponds to an NLM fixed weight. The CE is configured to execute the NLM, based on the input data, the set of NLM fixed weights and the set of NLM delta weights, to generate intermediate output data, and execute the ANN model, based on the intermediate output data and the set of ANN weights, to generate the output data.

Type: Grant

Filed: February 10, 2020

Date of Patent: November 22, 2022

Assignee: Arm Limited

Inventors: Urmish Ajit Thakker, Ganesh Suryanarayan Dasika
Hybrid memory artificial neural network hardware accelerator

Patent number: 11468305

Abstract: The present disclosure advantageously provides a hybrid memory artificial neural network hardware accelerator that includes a communication bus interface, a static memory, a non-refreshed dynamic memory, a controller and a computing engine. The static memory stores at least a portion of an ANN model. The ANN model includes an input layer, one or more hidden layers and an output layer, ANN basis weights, input data and output data. The non-refreshed dynamic memory is configured to store ANN custom weights for the input, hidden and output layers, and output data. For each layer or layer portion, the computing engine generates the ANN custom weights based on the ANN basis weights, stores the ANN custom weights in the non-refreshed dynamic memory, executes the layer or layer portion, based on inputs and the ANN custom weights, to generate layer output data, and stores the layer output data.

Type: Grant

Filed: March 18, 2020

Date of Patent: October 11, 2022

Assignee: Arm Limited

Inventors: Urmish Ajit Thakker, Shidhartha Das, Ganesh Suryanarayan Dasika
Hybrid Memory Artificial Neural Network Hardware Accelerator

Publication number: 20210295137

Abstract: The present disclosure advantageously provides a hybrid memory artificial neural network hardware accelerator that includes a communication bus interface, a static memory, a non-refreshed dynamic memory, a controller and a computing engine. The static memory stores at least a portion of an ANN model. The ANN model includes an input layer, one or more hidden layers and an output layer, ANN basis weights, input data and output data. The non-refreshed dynamic memory is configured to store ANN custom weights for the input, hidden and output layers, and output data. For each layer or layer portion, the computing engine generates the ANN custom weights based on the ANN basis weights, stores the ANN custom weights in the non-refreshed dynamic memory, executes the layer or layer portion, based on inputs and the ANN custom weights, to generate layer output data, and stores the layer output data.

Type: Application

Filed: March 18, 2020

Publication date: September 23, 2021

Applicant: Arm Limited

Inventors: Urmish Ajit Thakker, Shidhartha Das, Ganesh Suryanarayan Dasika
Hardware Accelerator for Natural Language Processing Applications

Publication number: 20210248008

Abstract: The present disclosure advantageously provides a hardware accelerator for a natural language processing application including a first memory, a second memory, and a computing engine (CE). The first memory is configured to store a configurable NLM and a set of NLM fixed weights. The second memory is configured to store an ANN model, a set of ANN weights, a set of NLM delta weights, input data and output data. The set of NLM delta weights may be smaller than the set of NLM fixed weights, and each NLM delta weight corresponds to an NLM fixed weight. The CE is configured to execute the NLM, based on the input data, the set of NLM fixed weights and the set of NLM delta weights, to generate intermediate output data, and execute the ANN model, based on the intermediate output data and the set of ANN weights, to generate the output data.

Type: Application

Filed: February 10, 2020

Publication date: August 12, 2021

Inventors: Urmish Ajit Thakker, Ganesh Suryanarayan Dasika
Skip Predictor for Pre-Trained Recurrent Neural Networks

Publication number: 20210056422

Abstract: The present disclosure advantageously provides a system and a method for skipping recurrent neural network (RNN) state updates using a skip predictor. Sequential input data are received and divided into sequences of input data values, each input data value being associated with a different time step for a pre-trained RNN model. At each time step, the hidden state vector for a prior time step is received from the pre-trained RNN model, and a determination, based on the input data value and the hidden state vector for at least one prior time step, is made whether to provide or not provide the input data value associated with the time step to the pre-trained RNN model for processing. When the input data value is not provided, the pre-trained RNN model does not update its hidden state vector. Importantly, the skip predictor is trained without retraining the pre-trained RNN model.

Type: Application

Filed: April 22, 2020

Publication date: February 25, 2021

Applicant: Arm Limited

Inventors: Urmish Ajit Thakker, Jin Tao, Ganesh Suryanarayan Dasika, Jesse Garrett Beu