Patents by Inventor Tijmen Pieter Frederik BLANKEVOORT

Tijmen Pieter Frederik BLANKEVOORT has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

CODEBOOK COMPRESSION FOR VECTOR QUANTIZED NEURAL NETWORKS

Publication number: 20250245494

Abstract: Systems and techniques are described herein for quantizing a codebook used in the context of quantizing post-training parameters (e.g., vectors of weights) of a pre-trained model. For example, a device can perform rank reduction on a tensor of a codebook associated with parameters of a layer of a pre-trained machine learning model to generate a first tensor factor having a first shape and a second tensor factor having a second shape. The device can perform an optimization technique on the first tensor factor and the second tensor factor to minimize an output reconstruction error of the layer. The device can quantize the first tensor factor to generate a reduced size codebook.

Type: Application

Filed: August 30, 2024

Publication date: July 31, 2025

Inventors: Marinus Willem VAN BAALEN, Eric Wayne MAHURIN, Paul Nicholas WHATMOUGH, Andrey KUZMIN, Markus NAGEL, Tijmen Pieter Frederik BLANKEVOORT
EFFICIENT DECODING USING LARGE AND SMALL GENERATIVE ARTIFICIAL INTELLIGENCE MODELS

Publication number: 20250124255

Abstract: Certain aspects of the present disclosure provide techniques and apparatus for generating a response to an input query using a generative artificial intelligence model. The method generally includes receiving an input query for processing. Using a first generative artificial intelligence model, an embedding representation of the received input query is generated. The embedding representation generally includes an embedding of the received input query in a first dimensionality. The embedding representation is projected into a projected representation of the received input query. Generally, the projected representation comprises a representation in a second dimensionality. A response to the received input query is generated using a second generative artificial intelligence model and the projected representation, and the generated response is output.

Type: Application

Filed: October 13, 2023

Publication date: April 17, 2025

Inventors: Benjamin BERGNER, Andrii SKLIAR, Babak EHTESHAMI BEJNORDI, Yuki Markus ASANO, Tijmen Pieter Frederik BLANKEVOORT, Joseph Binamira SORIAGA
Conditional computation for continual learning

Patent number: 12271800

Abstract: Various aspects provide methods for learning, such as continual learning, that support task-incremental learning using a multi-head classification architecture. Various aspects may enable conditional computing to support multi-head classification. Various aspects provide methods for learning, such as continual learning, that support class-incremental learning using a single-head classification architecture. Various aspects may enable conditional computing to support single-head classification by predicting the task associated with a given test input and selecting an associated classification head based at least in part on the task prediction.

Type: Grant

Filed: November 13, 2020

Date of Patent: April 8, 2025

Assignee: QUALCOMM Incorporated

Inventors: Davide Abati, Babak Ehteshami Bejnordi, Jakub Mikolaj Tomczak, Tijmen Pieter Frederik Blankevoort
LARGE LANGUAGE MODEL (LLM) PRUNING USING EXTENDED KRONECKER APPROXIMATIONS

Publication number: 20250111232

Abstract: An apparatus has one or more memories and one or more processor(s) coupled to the memories. The processor(s) is configured to estimate a local curvature of a loss landscape of a neural network. The processor(s) is also configured to dynamically allocate parameters to be removed from the neural network based on the local curvature. The processor(s) is further configured to update remaining weights of the neural network based on the parameters to be removed.

Type: Application

Filed: September 28, 2023

Publication date: April 3, 2025

Inventors: Tycho VAN DER OUDERAA, Markus NAGEL, Marinus Willem VAN BAALEN, Tijmen Pieter Frederik BLANKEVOORT
EFFICIENT ADAPTATION OF MACHINE LEARNING MODELS USING RANDOM MATRICES

Publication number: 20250103882

Abstract: Certain aspects of the present disclosure provide techniques and apparatus for efficiently adapting a machine learning model from a base task to a downstream task based on frozen matrices. An example method generally includes receiving an input for processing through a layer of a neural network. An output of the layer of the neural network is generated based on a first product, the first product being based on a first trainable scaling vector, a first frozen matrix, a second trainable scaling vector, a second frozen matrix, and the received input.

Type: Application

Filed: February 21, 2024

Publication date: March 27, 2025

Inventors: Dawid Jan KOPICZKO, Yuki Markus ASANO, Tijmen Pieter Frederik BLANKEVOORT
MULTI-TASK GATING FOR MACHINE LEARNING SYSTEMS

Publication number: 20250094781

Abstract: Systems and techniques are described herein for training and using multitask machine learning models. For example, a computing device can obtain training data for a first task in a layer in a neural network; perform, based on a determination from a first gating mechanism, the shared function on shared features of the training data using at least one shared channel to generate a shared feature map; perform, based on the determination from the first gating mechanism, the first task-specific function on first task-specific features of the training data using at least one first task-specific channel to generate a first task-specific feature map; generate an output for the first task-specific branch based on performing the shared function on the shared features and performing the first task-specific function on the first task-specific features; and update at least one parameter of the first gating mechanism based on the output.

Type: Application

Filed: September 18, 2023

Publication date: March 20, 2025

Inventors: Babak EHTESHAMI BEJNORDI, Gaurav KUMAR, Amelie Marie Estelle ROYER, Eduardo ESTEVES, Tijmen Pieter Frederik BLANKEVOORT, Mohsen GHAFOORIAN
MULTI-BRANCH MACHINE LEARNING MODELS FOR MULTI-DOMAIN AND MULTI-TASK PROCESSING

Publication number: 20250094768

Abstract: Certain aspects of the present disclosure provide techniques and apparatus for training and inferencing using a multi-domain machine learning model. An example method generally includes extracting, using a first neural network block, a plurality of features associated with inputs in a multi-domain input data set. A confusion matrix is generated based on the extracted plurality of features. A plurality of clusters is identified from the confusion matrix. Each cluster in the plurality of clusters generally corresponds to one or more data domains in the multi-domain input data set. A first gating neural network is trained based on the multi-domain input data set and the identified plurality of clusters. A plurality of second neural network blocks is trained based on a division of the multi-domain input data set into data associated with each cluster of the plurality of clusters.

Type: Application

Filed: September 15, 2023

Publication date: March 20, 2025

Inventors: Andrii SKLIAR, Babak EHTESHAMI BEJNORDI, Amelie Marie Estelle ROYER, Tijmen Pieter Frederik BLANKEVOORT, Dushyant MEHTA
Systems and methods of cross layer rescaling for improved quantization performance

Patent number: 12242956

Abstract: Various embodiments include methods and neural network computing devices implementing the methods for performing quantization in neural networks. Various embodiments may include equalizing ranges of weight tensors or output channel weights within a first layer of the neural network by scaling each of the output channel weights of the first layer by a corresponding scaling factor, and scaling each of a second adjacent layer's corresponding input channel weights by applying an inverse of the corresponding scaling factor to the input channel weights. The corresponding scaling factor may be determined using a black-box optimizer on a quantization error metric or based on heuristics, equalization of dynamic ranges, equalization of range extrema (minima or maxima), differential learning using straight through estimator (STE) methods and a local or global loss, or using an error metric for the quantization error and a black-box optimizer that minimizes the error metric with respect to the scaling.

Type: Grant

Filed: March 23, 2020

Date of Patent: March 4, 2025

Assignee: QUALCOMM Incorporated

Inventors: Markus Nagel, Marinus Willem van Baalen, Tijmen Pieter Frederik Blankevoort
WEIGHT OSCILLATION MITIGATION DURING MACHINE LEARNING

Publication number: 20250005452

Abstract: Certain aspects of the present disclosure provide techniques and apparatus for mitigating weight oscillation during quantization-aware training. In one example, a method includes identifying oscillation of a parameter of a machine learning model during quantization-aware training of the machine learning model, and applying an oscillation mitigation procedure during the quantization-aware training of the machine learning model in response to identifying the oscillation, the oscillation mitigation procedure comprising at least one of oscillation dampening or parameter freezing.

Type: Application

Filed: January 24, 2023

Publication date: January 2, 2025

Inventors: Markus NAGEL, Marios FOURNARAKIS, Tijmen Pieter Frederik BLANKEVOORT, Yelysei BONDARENKO
OUTLIER ATTENUATION IN TRANSFORMER NEURAL NETWORKS

Publication number: 20240386239

Abstract: Certain aspects of the present disclosure provide techniques and apparatus for processing data using a transformer neural network. The method generally includes receiving an input for processing using a transformer neural network. An attention output is generated in the transformer neural network. Generally, the attention output may be generated such that outlier values for the attention output are attenuated in the transformer neural network. An output of the transformer neural network is generated based on the generated attention output.

Type: Application

Filed: October 6, 2023

Publication date: November 21, 2024

Inventors: Yelysei BONDARENKO, Markus NAGEL, Tijmen Pieter Frederik BLANKEVOORT
Joint pruning and quantization scheme for deep neural networks

Patent number: 12131258

Abstract: A method for compressing a deep neural network includes determining a pruning ratio for a channel and a mixed-precision quantization bit-width based on an operational budget of a device implementing the deep neural network. The method further includes quantizing a weight parameter of the deep neural network and/or an activation parameter of the deep neural network based on the quantization bit-width. The method also includes pruning the channel of the deep neural network based on the pruning ratio.

Type: Grant

Filed: September 23, 2020

Date of Patent: October 29, 2024

Assignee: QUALCOMM Incorporated

Inventors: Yadong Lu, Ying Wang, Tijmen Pieter Frederik Blankevoort, Christos Louizos, Matthias Reisser, Jilei Hou
ADAPTIVE MIXED-RESOLUTION PROCESSING

Publication number: 20240161487

Abstract: Systems and techniques are described for adaptive mixed-resolution processing. According to some aspects, a device can divide an input image into first tokens having a first resolution and second tokens having a second resolution. The device can generate first token representations for token(s) from the first tokens corresponding to a first region of the input image and generate second token representations for token(s) from the second tokens corresponding to the first region of the input image. The device can process, using a neural network model, the first token representations and the second token representations to determine the first resolution or the second resolution as a scale for the first region of the input image. The device can process, using a transformer neural network model, the first region of the input image according to the scale for the first region.

Type: Application

Filed: September 29, 2023

Publication date: May 16, 2024

Inventors: Jakob DRACHMANN HAVTORN, Amelie Marie Estelle ROYER, Tijmen Pieter Frederik BLANKEVOORT, Babak EHTESHAMI BEJNORDI
FAST EIGHT-BIT FLOATING POINT (FP8) SIMULATION WITH LEARNABLE PARAMETERS

Publication number: 20230376272

Abstract: A processor-implemented method for fast floating point simulations with learnable parameters includes receiving a single precision input. An integer quantization process is performed on the input. Each element of the input is scaled based on a scaling parameter to generate an m-bit floating point output, where m is an integer.

Type: Application

Filed: January 27, 2023

Publication date: November 23, 2023

Inventors: Marinus Willem VAN BAALEN, Jorn Wilhelmus Timotheus PETERS, Markus NAGEL, Tijmen Pieter Frederik BLANKEVOORT, Andrey KUZMIN
Channel Gating For Conditional Computation

Publication number: 20230334324

Abstract: A computing device may be configured to intelligently activate gating within a current layer of a neural network that includes two or more filters. The computing device may receive a layer-specific input data that is specific to the current layer of the neural network, generate statistics based on the received layer-specific input data; and use the generated statistics to assign a relevance score to each of the two or more filters. Each assigned relevance score may indicate the relevance of the corresponding filter to the received layer-specific input data. The computing device may determine an activation status of each of the two or more filters in the current layer based on the identified relevance and apply the received layer-specific input data to the activated filters in the two or more filters to generate an output activation for the current layer of the neural network.

Type: Application

Filed: June 20, 2023

Publication date: October 19, 2023

Inventors: Babak EHTESHAMI BEJNORDI, Tijmen Pieter Frederik BLANKEVOORT, Max WELLING
SIMULATED LOW BIT-WIDTH QUANTIZATION USING BIT SHIFTED NEURAL NETWORK PARAMETERS

Publication number: 20230306233

Abstract: A processor-implemented method includes bit shifting a binary representation of a neural network parameter. The neural network parameter has fewer bits, b, than a number of hardware bits, B, supported by hardware that processes the neural network parameter. The bit shifting effectively multiplies the neural network parameter by 2B-b. The method also includes dividing a quantization scale by 2B-b to obtain an updated quantization scale. The method further includes quantizing the bit shifted binary representation with the updated quantization scale to obtain a value for the neural network parameter.

Type: Application

Filed: January 30, 2023

Publication date: September 28, 2023

Inventors: Marinus Willem VAN BAALEN, Brian KAHNE, Eric Wayne MAHURIN, Tijmen Pieter Frederik BLANKEVOORT, Andrey KUZMIN, Andrii SKLIAR, Markus NAGEL
MACHINE LEARNING MODEL ARCHITECTURE COMBINING MIXTURE OF EXPERTS AND MODEL ENSEMBLING

Publication number: 20230281510

Abstract: Certain aspects of the present disclosure provide techniques and apparatus for machine learning. In one aspect, base model output data is generated, the generating including processing input data with at least a portion of a base model of a machine learning model architecture, and the base model output data is processed with a routing model of the machine learning model architecture in order to determine a selected expert model, of a plurality of expert models, with which to process the base model output data. Expert model output data is generated, where generating the expert model output data includes processing the base model output data with the selected expert model, and final output data from the machine learning model architecture is generated, where generating the final output data includes processing the base model output data and the expert model output data with an ensemble model of the machine learning model architecture.

Type: Application

Filed: January 13, 2023

Publication date: September 7, 2023

Inventors: Amelie Marie Estelle ROYER, Ilia KARMANOV, Andrii SKLIAR, Babak EHTESHAMI BEJNORDI, Tijmen Pieter Frederik BLANKEVOORT
Learned threshold pruning for deep neural networks

Patent number: 11704571

Abstract: A method for pruning weights of an artificial neural network based on a learned threshold includes determining a pruning threshold for pruning a first set of pre-trained weights of multiple pre-trained weights based on a function of a classification loss and a regularization loss. Weights are pruned from the first set of pre-trained weights when a first value of the weight is less than the pruning threshold. A second set of pre-trained weights of the multiple pre-trained weights is fine-tuned or adjusted in response to a second value of each pre-trained weight in the second set of pre-trained weights being greater than the pruning threshold.

Type: Grant

Filed: October 9, 2020

Date of Patent: July 18, 2023

Assignee: QUALCOMM Incorporated

Inventors: Kambiz Azarian Yazdi, Tijmen Pieter Frederik Blankevoort, Jin Won Lee, Yash Sanjay Bhalgat
PER-EMBEDDING-GROUP ACTIVATION QUANTIZATION

Publication number: 20230139347

Abstract: A processor-implemented method for providing per-embedding-group activation quantization includes receiving sequential data at a first layer of a transformer neural network. The sequential data is processed via the first layer of the transformer neural network to generate an activation tensor. The activation tensor is split into multiple groups of embeddings. Each of the embeddings groups has a different set of quantization parameters. Each of the embedding groups is quantized separately based on the corresponding quantization parameters of the different set of quantization parameters. The quantized embedding groups are multiplied with a set of weights to generate an output.

Type: Application

Filed: October 28, 2022

Publication date: May 4, 2023

Inventors: Yelysei BONDARENKO, Markus NAGEL, Tijmen Pieter Frederik BLANKEVOORT
PROCESSING VIDEO CONTENT USING GATED TRANSFORMER NEURAL NETWORKS

Publication number: 20230090941

Abstract: Certain aspects of the present disclosure provide techniques and apparatus for processing a video stream using a machine learning model. An example method generally includes generating a first group of tokens from a first frame of the video stream and a second group of tokens from a second frame of the video stream. A first set of tokens associated with features to be reused from the first frame and a second set of tokens associated with features to be computed from the second frame are identified based on a comparison of tokens from the first group of tokens to corresponding tokens in the second group of tokens. A feature output is generated for portions of the second frame corresponding to the second set of tokens. Features associated with the first set of tokens are combined with the generated feature output into a representation of the second frame.

Type: Application

Filed: September 20, 2022

Publication date: March 23, 2023

Inventors: Yawei LI, Bert MOONS, Tijmen Pieter Frederik BLANKEVOORT, Amirhossein HABIBIAN, Babak EHTESHAMI BEJNORDI
Analytic and empirical correction of biased error introduced by approximation methods

Patent number: 11604987

Abstract: Various embodiments include methods and neural network computing devices implementing the methods, for generating an approximation neural network. Various embodiments may include performing approximation operations on a weights tensor associated with a layer of a neural network to generate an approximation weights tensor, determining an expected output error of the layer in the neural network due to the approximation weights tensor, subtracting the expected output error from a bias parameter of the layer to determine an adjusted bias parameter and substituting the adjusted bias parameter for the bias parameter in the layer. Such operations may be performed for one or more layers in a neural network to produce an approximation version of the neural network for execution on a resource limited processor.

Type: Grant

Filed: March 23, 2020

Date of Patent: March 14, 2023

Assignee: Qualcomm Incorporated

Inventors: Marinus Willem Van Baalen, Tijmen Pieter Frederik Blankevoort, Markus Nagel

1 2 next