Patents by Inventor Christopher Lott

Christopher Lott has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20260119905
    Abstract: Certain aspects of the present disclosure provide techniques and apparatus for generating a response to a query input into a generative artificial intelligence model. The method generally includes generating, based on an input query and a first generative model, a plurality of sets of tokens, each set of tokens in the plurality of sets of tokens corresponding to a candidate response to the input query; outputting, to a second generative model, the plurality of sets of tokens for verification; receiving, from the second generative model, an indication of a selected set of tokens from the plurality of sets of tokens based on the input query and the plurality of sets of tokens; and outputting the selected set of tokens as a response to the input query.
    Type: Application
    Filed: October 2, 2023
    Publication date: April 30, 2026
    Inventors: Christopher LOTT, Mingu LEE, Joseph Binamira SORIAGA, Jilei HOU
  • Publication number: 20260099683
    Abstract: Disclosed are systems, apparatuses, processes, and computer-readable media for model training. A device may process, using a linear layer, an embedding generated from a first output token and input features to generate first features, wherein the first output token is generated by a previous iteration of a token predictor and wherein the input features are generated by a previous iteration of a decoding layer. A device may process, using the decoding layer, the first features to generate second features having first dimensions. A device may process, using a down-projection layer, the second features to generate third features having second dimensions smaller than the first dimensions. A device may generate, using the token predictor and the third features, a second output token.
    Type: Application
    Filed: February 11, 2025
    Publication date: April 9, 2026
    Inventors: Mingu LEE, Wonseok JEON, Junyoung PARK, Kanghoon YOON, Christopher LOTT
  • Publication number: 20260099673
    Abstract: Disclosed are systems, apparatuses, processes, and computer-readable media for model training. A device may process, using a linear layer, an embedding generated from a first output token and input features to generate first features, wherein the first output token is generated by a previous iteration of a token predictor and wherein the input features are generated by a previous iteration of a decoding layer. A device may process, using the decoding layer, the first features to generate second features having first dimensions. A device may process, using a down-projection layer, the second features to generate third features having second dimensions smaller than the first dimensions. A device may generate, using the token predictor and the third features, a second output token.
    Type: Application
    Filed: February 11, 2025
    Publication date: April 9, 2026
    Inventors: Mingu LEE, Wonseok JEON, Junyoung PARK, Kanghoon YOON, Christopher LOTT
  • Publication number: 20260087374
    Abstract: Certain aspects provide techniques and apparatus for executing queries in a computing system using machine learning models. An example method generally includes receiving a plan to satisfy a request in the computing system and event log data associated with execution of the plan. The plan generally specifies a first plurality of function calls at a first level of granularity. Using a plan refinement machine learning model, a refined plan is generated when the event log data indicates that execution of the generated plan results in one or more execution errors and the one or more execution errors are solvable. Generally, the refined plan specifies a second plurality of function calls at a second level of granularity, the second level of granularity being finer than the first level of granularity.
    Type: Application
    Filed: September 20, 2024
    Publication date: March 26, 2026
    Inventors: Amr Mamoun MARTINI, Arvind Vardarajan SANTHANAM, Swagarika Jaharlal GIRI, Christopher LOTT
  • Patent number: 12579063
    Abstract: Certain aspects of the present disclosure provide techniques and apparatus for machine learning. In an example method, an input prompt comprising a set of tokens is accessed as input to a generative machine learning model. A first key tensor and a first value tensor are generated for a first token of the set of tokens, and the first key tensor and the first value tensor are stored in a memory. A first retention score is generated, for the first token, based on the first key tensor, the first value tensor, and a second token of the set of tokens. The first key tensor and the first value tensor are evicted from the memory in response to determining that the first retention score is a lowest retention score of the memory.
    Type: Grant
    Filed: September 5, 2024
    Date of Patent: March 17, 2026
    Assignee: QUALCOMM Incorporated
    Inventors: Raghavv Goel, Mukul Gagrani, Junyoung Park, Dalton James Jones, Mingu Lee, Wonseok Jeon, Matthew James Morse, Matthew Harper Langston, Christopher Lott
  • Publication number: 20260065048
    Abstract: Certain aspects of the present disclosure provide techniques and apparatus for generating a response to a query input in a generative artificial intelligence model. An example method generally includes receiving an input prompt for processing; generating a set of forecasted parameters for the input prompt using a parameter prediction model; generating, using a generative artificial intelligence model, a response to the input prompt based on the input prompt and the set of forecasted parameters; and outputting the generated response.
    Type: Application
    Filed: December 18, 2024
    Publication date: March 5, 2026
    Inventors: Mingu LEE, Raghavv GOEL, Wonseok JEON, Mukul GAGRANI, Junyoung PARK, Christopher LOTT
  • Publication number: 20260050766
    Abstract: Certain aspects of the present disclosure provide techniques and apparatus for efficient inferencing using a machine learning model. An example method generally includes receiving an input including a set of tokens for processing by a transformer neural network. The set of tokens for processing by the transformer neural network is partitioned into a first set of tokens and a second set of tokens. Using at least one state space model, at least one compressed token representing the first set of tokens is generated. An output token is generated, using the transformer neural network, based on the compressed token and the second set of tokens. A response to the input is generated based on the output token.
    Type: Application
    Filed: January 7, 2025
    Publication date: February 19, 2026
    Inventors: Mukul GAGRANI, Junyoung PARK, Raghavv GOEL, Dalton James JONES, Wonseok JEON, Matthew James MORSE, Matthew Harper LANGSTON, Mingu LEE, Christopher LOTT
  • Publication number: 20260044745
    Abstract: Certain aspects of the present disclosure provide techniques and apparatus for machine learning. In an example method, a machine learning model comprising a plurality of layers, and a set of input data for the machine learning model, are accessed. A combination of hyperparameters for the machine learning model is selected based on the set of input data, comprising selecting, for each respective layer of the plurality of layers, a respective cache size based on the input data. The machine learning model is deployed according to the combination of hyperparameters.
    Type: Application
    Filed: August 8, 2024
    Publication date: February 12, 2026
    Inventors: Dalton James JONES, Junyoung PARK, Matthew James MORSE, Raghavv GOEL, Mukul GAGRANI, Mingu LEE, Matthew Harper LANGSTON, Pierre-David LETOURNEAU, Christopher LOTT
  • Publication number: 20260044449
    Abstract: Certain aspects of the present disclosure provide techniques and apparatus for machine learning. In an example method, an input prompt comprising a set of tokens is accessed as input to a generative machine learning model. A first key tensor and a first value tensor are generated for a first token of the set of tokens, and the first key tensor and the first value tensor are stored in a memory. A first retention score is generated, for the first token, based on the first key tensor, the first value tensor, and a second token of the set of tokens. The first key tensor and the first value tensor are evicted from the memory in response to determining that the first retention score is a lowest retention score of the memory.
    Type: Application
    Filed: October 21, 2025
    Publication date: February 12, 2026
    Inventors: Raghavv GOEL, Mukul GAGRANI, Junyoung PARK, Dalton James JONES, Mingu LEE, Wonseok JEON, Matthew James MORSE, Matthew Harper LANGSTON, Christopher LOTT
  • Publication number: 20260017192
    Abstract: Certain aspects of the present disclosure provide techniques and apparatus for machine learning. In an example method, an input prompt comprising a set of tokens is accessed as input to a generative machine learning model. A first key tensor and a first value tensor are generated for a first token of the set of tokens, and the first key tensor and the first value tensor are stored in a memory. A first retention score is generated, for the first token, based on the first key tensor, the first value tensor, and a second token of the set of tokens. The first key tensor and the first value tensor are evicted from the memory in response to determining that the first retention score is a lowest retention score of the memory.
    Type: Application
    Filed: September 5, 2024
    Publication date: January 15, 2026
    Inventors: Raghavv GOEL, Mukul GAGRANI, Junyoung PARK, Dalton James JONES, Mingu LEE, Wonseok JEON, Matthew James MORSE, Matthew Harper LANGSTON, Christopher LOTT
  • Publication number: 20260017323
    Abstract: Techniques and apparatus for efficiently adapting a machine learning model to perform a variety of tasks using different adapters are provided. An example method generally includes receiving an input including a sequence of tokens associated with at least an input prompt into a neural network. The sequence of tokens is generated by a transformer block and a first set of adapters associated with the transformer block. A second set of adapters associated with the transformer block is loaded. An output of the transformer block is generated based on a key-value cache associated with the input and on weights associated with the transformer block. An output of the second set of adapters associated with the transformer block is generated based on the key-value cache associated with the input and on adapter weights associated with the second set of adapters.
    Type: Application
    Filed: December 4, 2024
    Publication date: January 15, 2026
    Inventors: Amr Mamoun MARTINI, Arvind Vardarajan SANTHANAM, Christopher LOTT
  • Publication number: 20260017564
    Abstract: Techniques and apparatus for efficiently adapting a machine learning model to perform tasks using adapters are provided. An example method generally includes receiving an input for processing by a transformer block in a neural network. An output of the transformer block is generated based on the received input and weights associated with the transformer block. An output of an adapter associated with the transformer block is generated based on a copy of the received input and adapter weights associated with the adapter. Key-value data associated with the output of the transformer block and key-value data associated with a combination of the output of the transformer block and the output of the adapter are stored in a cache for subsequent inferencing rounds. A response to the input is generated based on the combination of the output of the transformer block and the output of the adapter.
    Type: Application
    Filed: December 4, 2024
    Publication date: January 15, 2026
    Inventors: Amr Mamoun MARTINI, Arvind Vardarajan SANTHANAM, Christopher LOTT
  • Patent number: 12524405
    Abstract: Certain aspects provide techniques and apparatus for executing queries in a computing system using machine learning models. An example method generally includes receiving a plan to satisfy a request in the computing system and event log data associated with execution of the plan. The plan generally specifies a first plurality of actions to be performed by the computing system at a first level of granularity. Using a plan refinement machine learning model, a refined plan is generated when the event log data indicates that execution of the generated plan results in one or more execution errors and the one or more execution errors are solvable. Generally, the refined plan specifies a second plurality of actions to be performed by the computing system at a second level of granularity, the second level of granularity being finer than the first level of granularity.
    Type: Grant
    Filed: September 20, 2024
    Date of Patent: January 13, 2026
    Assignee: Qualcomm Incorporated
    Inventors: Amr Mamoun Martini, Arvind Vardarajan Santhanam, Christopher Lott
  • Patent number: 12493827
    Abstract: A method for optimizing the compilation of a machine learning model to be executed on target edge devices is provided. Compute nodes of a plurality of compute nodes are allocated to a compiler optimization process for a compiler of said machine learning model. The machine learning model has a compute graph representation having nodes that are kernel operators necessary to execute the machine learning model and edges that connect said kernel operators to define precedence constraints. A round of optimization is scheduled for the process amongst the allocated compute nodes. At each allocated compute node a sequencing and scheduling solution is applied per round to obtain a performance metric for the machine learning model. From each compute node the performance metric is received and a solution that has the best performance metric is identified and implemented for execution of the machine learning model on the target edge devices.
    Type: Grant
    Filed: November 17, 2022
    Date of Patent: December 9, 2025
    Assignee: Qualcomm Incorporated
    Inventors: Weiliang Zeng, Christopher Lott, Edward Teague, Yang Yang, Joseph Binamira Soriaga
  • Publication number: 20250356184
    Abstract: Certain aspects of the present disclosure provide techniques and apparatus for improved machine learning. In an example method, a sequence of tokens is accessed as input to an attention operation. For a first token, an attention output is generated based on a window of tokens relative to the first token, comprising generating a first positional embedding for an influential token, generating a second positional embedding for the first token, and generating the attention output based on the first and second positional embeddings. For a second token, an attention output is generated based on a window of tokens relative to the second token, where the second window of tokens includes the first token, comprising generating a third positional embedding for the influential token, generating a fourth positional embedding for the second token, and generating the attention output based on the second, third, and fourth positional embeddings.
    Type: Application
    Filed: May 17, 2024
    Publication date: November 20, 2025
    Inventors: Junyoung PARK, Mukul GAGRANI, Raghavv GOEL, Wonseok JEON, Mingu LEE, Christopher LOTT
  • Patent number: 12450486
    Abstract: A method performed by a computing device includes determining a partition for depth-first processing by a multi-layer artificial neural network (ANN) of the computing device. The computing device comprising a processor, on-chip memory, and off-chip memory. The first partition determined based on an amount of on-chip memory used by the first partition, an available amount of on-chip memory, and a size of a write back to the off-chip memory. The method also includes processing, at the device via the multi-layer ANN, an input, using the depth-first processing in accordance with the partition.
    Type: Grant
    Filed: December 14, 2020
    Date of Patent: October 21, 2025
    Assignee: QUALCOMM Incorporated
    Inventors: Piero Zappi, Jin Won Lee, Christopher Lott, Rexford Alan Hill
  • Publication number: 20250245530
    Abstract: Certain aspects of the present disclosure provide techniques and apparatus for generating a response to a query input in a generative artificial intelligence model using variable draft length. An example method generally includes determining (e.g., measuring or accessing) one or more operational properties of a device on which inferencing operations using a machine learning model are performed. A first draft set of tokens is generated using the machine learning model. A number of tokens included in the first draft set of tokens is based on the one or more operational properties of the device and a defined scheduling function for the machine learning model. The first draft set of tokens are output for verification.
    Type: Application
    Filed: January 26, 2024
    Publication date: July 31, 2025
    Inventors: Raghavv GOEL, Mingu LEE, Mukul GAGRANI, Wonseok JEON, Christopher LOTT, Faisal Maen Tawfiq ZAGHLOUL, Maksim KRASNYANSKIY
  • Publication number: 20250245430
    Abstract: Certain aspects of the present disclosure provide techniques and apparatus for efficiently generating a response to a query input in a generative artificial intelligence model. An example method generally includes generating, based on an input prompt and using a first machine learning model, a set of tokens including one or more subsets of tokens. Each respective subset of the one or more subsets corresponds to a respective portion of a response to the input prompt and includes a fixed number of tokens corresponding to a beam width for a beam search through the set of tokens. The set of tokens is output to a second machine learning model for verification, and information identifying a selected sequence of tokens from the generated set of tokens is received from the second machine learning model. The selected sequence of tokens is output as the response to the input prompt.
    Type: Application
    Filed: January 26, 2024
    Publication date: July 31, 2025
    Inventors: Wonseok JEON, Mukul GAGRANI, Mingu LEE, Raghavv GOEL, Junyoung PARK, Christopher LOTT
  • Patent number: 12373494
    Abstract: Certain aspects of the present disclosure provide techniques and apparatus for generating a response to a query input in a generative artificial intelligence model. An example method generally includes receiving a plurality of sets of tokens generated based on an input prompt and a first generative artificial intelligence model, each set of tokens in the plurality of sets of tokens corresponding to a candidate response to the input prompt; selecting, using a second generative artificial intelligence model and recursive adjustment of a target distribution associated with the received plurality of sets of tokens, a set of tokens from the plurality of sets of tokens; and outputting the selected set of tokens as a response to the input prompt.
    Type: Grant
    Filed: December 13, 2023
    Date of Patent: July 29, 2025
    Assignee: QUALCOMM Incorporated
    Inventors: Christopher Lott, Mingu Lee, Wonseok Jeon, Roland Memisevic
  • Publication number: 20250231989
    Abstract: Certain aspects of the present disclosure provide techniques and apparatus for generating a response to a query input in a generative artificial intelligence model. An example method generally includes receiving a plurality of sets of tokens generated based on an input prompt and a first generative artificial intelligence model, each set of tokens in the plurality of sets of tokens corresponding to a candidate response to the input prompt; selecting, using a second generative artificial intelligence model and recursive adjustment of a target distribution associated with the received plurality of sets of tokens, a set of tokens from the plurality of sets of tokens; and outputting the selected set of tokens as a response to the input prompt.
    Type: Application
    Filed: April 4, 2025
    Publication date: July 17, 2025
    Inventors: Christopher LOTT, Mingu LEE, Wonseok JEON, Roland MEMISEVIC