Patents by Inventor Urmish Ajit Thakker
Urmish Ajit Thakker has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250148205Abstract: Embodiments described herein provide systems and techniques for training large language models. In one aspect, a process for performing in-context few-shot training for a transformer-based language model is disclosed. This process may begin by receiving the transformer-based language model having a context window of a predetermined size, as well as a training dataset comprising a set of prompt/completion examples. The process then constructs a training sequence based on the training dataset. Next, the process performs a single forward pass using the training sequence as input. The process subsequently performs a set of backward passes from a subset of examples in the training sequence, wherein each backward pass is conditioned on a selected subset of prompt/completion examples in the training sequence.Type: ApplicationFiled: November 6, 2023Publication date: May 8, 2025Applicant: SambaNova Systems, Inc.Inventors: Zoltan Csaki, Bo Li, Urmish Ajit Thakker, Venkat Krishna SRINIVASAN
-
Publication number: 20250148276Abstract: Embodiments described herein provide systems and techniques for training large language models. In one aspect, a process for performing in-context training of a language model is disclosed. This process may begin by receiving a language model that includes a context window of a predetermined size, as well as receiving a set of in-context prompt/completion pairs prepared for a target task. The process then constructs a first token sequence based on the set of in-context prompt/completion pairs. Next, the process fits the first token sequence into the context window. The process subsequently performs a first in-context training pass using the first token sequence to train the language model to generate a next token in accordance with the target task.Type: ApplicationFiled: November 6, 2023Publication date: May 8, 2025Applicant: SambaNova Systems, Inc.Inventors: Zoltan Csaki, Bo Li, Urmish Ajit Thakker, Venkat Krishna SRINIVASAN
-
Patent number: 11663814Abstract: The present disclosure advantageously provides a system and a method for skipping recurrent neural network (RNN) state updates using a skip predictor. Sequential input data are received and divided into sequences of input data values, each input data value being associated with a different time step for a pre-trained RNN model. At each time step, the hidden state vector for a prior time step is received from the pre-trained RNN model, and a determination, based on the input data value and the hidden state vector for at least one prior time step, is made whether to provide or not provide the input data value associated with the time step to the pre-trained RNN model for processing. When the input data value is not provided, the pre-trained RNN model does not update its hidden state vector. Importantly, the skip predictor is trained without retraining the pre-trained RNN model.Type: GrantFiled: April 22, 2020Date of Patent: May 30, 2023Assignee: Arm LimitedInventors: Urmish Ajit Thakker, Jin Tao, Ganesh Suryanarayan Dasika, Jesse Garrett Beu
-
Publication number: 20220405597Abstract: Example methods, apparatuses, and/or articles of manufacture are disclosed that may be implemented, in whole or in part, using one or more computing devices to adapt a computing device to classify physical features in a deployment environment. In a particular implementation, computing resources may be selectively de-allocated from at least one of one or more elements of a computing architecture based, at least in part, on assessed impacts to the one or more elements of the computing architecture.Type: ApplicationFiled: June 16, 2021Publication date: December 22, 2022Inventors: Urmish Ajit Thakker, Jesse Garrett Beu, Dibakar Gope, Mark John O'Connor
-
Patent number: 11507841Abstract: The present disclosure advantageously provides a hardware accelerator for a natural language processing application including a first memory, a second memory, and a computing engine (CE). The first memory is configured to store a configurable NLM and a set of NLM fixed weights. The second memory is configured to store an ANN model, a set of ANN weights, a set of NLM delta weights, input data and output data. The set of NLM delta weights may be smaller than the set of NLM fixed weights, and each NLM delta weight corresponds to an NLM fixed weight. The CE is configured to execute the NLM, based on the input data, the set of NLM fixed weights and the set of NLM delta weights, to generate intermediate output data, and execute the ANN model, based on the intermediate output data and the set of ANN weights, to generate the output data.Type: GrantFiled: February 10, 2020Date of Patent: November 22, 2022Assignee: Arm LimitedInventors: Urmish Ajit Thakker, Ganesh Suryanarayan Dasika
-
Patent number: 11468305Abstract: The present disclosure advantageously provides a hybrid memory artificial neural network hardware accelerator that includes a communication bus interface, a static memory, a non-refreshed dynamic memory, a controller and a computing engine. The static memory stores at least a portion of an ANN model. The ANN model includes an input layer, one or more hidden layers and an output layer, ANN basis weights, input data and output data. The non-refreshed dynamic memory is configured to store ANN custom weights for the input, hidden and output layers, and output data. For each layer or layer portion, the computing engine generates the ANN custom weights based on the ANN basis weights, stores the ANN custom weights in the non-refreshed dynamic memory, executes the layer or layer portion, based on inputs and the ANN custom weights, to generate layer output data, and stores the layer output data.Type: GrantFiled: March 18, 2020Date of Patent: October 11, 2022Assignee: Arm LimitedInventors: Urmish Ajit Thakker, Shidhartha Das, Ganesh Suryanarayan Dasika
-
Publication number: 20210295137Abstract: The present disclosure advantageously provides a hybrid memory artificial neural network hardware accelerator that includes a communication bus interface, a static memory, a non-refreshed dynamic memory, a controller and a computing engine. The static memory stores at least a portion of an ANN model. The ANN model includes an input layer, one or more hidden layers and an output layer, ANN basis weights, input data and output data. The non-refreshed dynamic memory is configured to store ANN custom weights for the input, hidden and output layers, and output data. For each layer or layer portion, the computing engine generates the ANN custom weights based on the ANN basis weights, stores the ANN custom weights in the non-refreshed dynamic memory, executes the layer or layer portion, based on inputs and the ANN custom weights, to generate layer output data, and stores the layer output data.Type: ApplicationFiled: March 18, 2020Publication date: September 23, 2021Applicant: Arm LimitedInventors: Urmish Ajit Thakker, Shidhartha Das, Ganesh Suryanarayan Dasika
-
Publication number: 20210248008Abstract: The present disclosure advantageously provides a hardware accelerator for a natural language processing application including a first memory, a second memory, and a computing engine (CE). The first memory is configured to store a configurable NLM and a set of NLM fixed weights. The second memory is configured to store an ANN model, a set of ANN weights, a set of NLM delta weights, input data and output data. The set of NLM delta weights may be smaller than the set of NLM fixed weights, and each NLM delta weight corresponds to an NLM fixed weight. The CE is configured to execute the NLM, based on the input data, the set of NLM fixed weights and the set of NLM delta weights, to generate intermediate output data, and execute the ANN model, based on the intermediate output data and the set of ANN weights, to generate the output data.Type: ApplicationFiled: February 10, 2020Publication date: August 12, 2021Inventors: Urmish Ajit Thakker, Ganesh Suryanarayan Dasika
-
Publication number: 20210056422Abstract: The present disclosure advantageously provides a system and a method for skipping recurrent neural network (RNN) state updates using a skip predictor. Sequential input data are received and divided into sequences of input data values, each input data value being associated with a different time step for a pre-trained RNN model. At each time step, the hidden state vector for a prior time step is received from the pre-trained RNN model, and a determination, based on the input data value and the hidden state vector for at least one prior time step, is made whether to provide or not provide the input data value associated with the time step to the pre-trained RNN model for processing. When the input data value is not provided, the pre-trained RNN model does not update its hidden state vector. Importantly, the skip predictor is trained without retraining the pre-trained RNN model.Type: ApplicationFiled: April 22, 2020Publication date: February 25, 2021Applicant: Arm LimitedInventors: Urmish Ajit Thakker, Jin Tao, Ganesh Suryanarayan Dasika, Jesse Garrett Beu