Patents by Inventor Marinus Willem VAN BAALEN

Marinus Willem VAN BAALEN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Systems and methods of cross layer rescaling for improved quantization performance

Patent number: 12242956

Abstract: Various embodiments include methods and neural network computing devices implementing the methods for performing quantization in neural networks. Various embodiments may include equalizing ranges of weight tensors or output channel weights within a first layer of the neural network by scaling each of the output channel weights of the first layer by a corresponding scaling factor, and scaling each of a second adjacent layer's corresponding input channel weights by applying an inverse of the corresponding scaling factor to the input channel weights. The corresponding scaling factor may be determined using a black-box optimizer on a quantization error metric or based on heuristics, equalization of dynamic ranges, equalization of range extrema (minima or maxima), differential learning using straight through estimator (STE) methods and a local or global loss, or using an error metric for the quantization error and a black-box optimizer that minimizes the error metric with respect to the scaling.

Type: Grant

Filed: March 23, 2020

Date of Patent: March 4, 2025

Assignee: QUALCOMM Incorporated

Inventors: Markus Nagel, Marinus Willem van Baalen, Tijmen Pieter Frederik Blankevoort
FAST EIGHT-BIT FLOATING POINT (FP8) SIMULATION WITH LEARNABLE PARAMETERS

Publication number: 20230376272

Abstract: A processor-implemented method for fast floating point simulations with learnable parameters includes receiving a single precision input. An integer quantization process is performed on the input. Each element of the input is scaled based on a scaling parameter to generate an m-bit floating point output, where m is an integer.

Type: Application

Filed: January 27, 2023

Publication date: November 23, 2023

Inventors: Marinus Willem VAN BAALEN, Jorn Wilhelmus Timotheus PETERS, Markus NAGEL, Tijmen Pieter Frederik BLANKEVOORT, Andrey KUZMIN
SIMULATED LOW BIT-WIDTH QUANTIZATION USING BIT SHIFTED NEURAL NETWORK PARAMETERS

Publication number: 20230306233

Abstract: A processor-implemented method includes bit shifting a binary representation of a neural network parameter. The neural network parameter has fewer bits, b, than a number of hardware bits, B, supported by hardware that processes the neural network parameter. The bit shifting effectively multiplies the neural network parameter by 2B-b. The method also includes dividing a quantization scale by 2B-b to obtain an updated quantization scale. The method further includes quantizing the bit shifted binary representation with the updated quantization scale to obtain a value for the neural network parameter.

Type: Application

Filed: January 30, 2023

Publication date: September 28, 2023

Inventors: Marinus Willem VAN BAALEN, Brian KAHNE, Eric Wayne MAHURIN, Tijmen Pieter Frederik BLANKEVOORT, Andrey KUZMIN, Andrii SKLIAR, Markus NAGEL
MODEL COMPRESSION VIA QUANTIZED SPARSE PRINCIPAL COMPONENT ANALYSIS

Publication number: 20230108248

Abstract: A processor-implemented method includes retrieving, for a layer of a set of layers of an artificial neural network (ANN), a dense quantized matrix representing a codebook and a sparse quantized matrix representing linear coefficients. The dense quantized matrix and the sparse quantized matrix may be associated with a weight tensor of the layer. The processor-implemented method also includes determining, for the layer of the set of layers, the weight tensor based on a product of the dense quantized matrix and the sparse quantized matrix. The processor-implemented method further includes processing, at the layer, an input based on the weight tensor.

Type: Application

Filed: October 4, 2022

Publication date: April 6, 2023

Inventors: Andrey KUZMIN, Marinus Willem VAN BAALEN, Markus NAGEL, Arash BEHBOODI
Analytic and empirical correction of biased error introduced by approximation methods

Patent number: 11604987

Abstract: Various embodiments include methods and neural network computing devices implementing the methods, for generating an approximation neural network. Various embodiments may include performing approximation operations on a weights tensor associated with a layer of a neural network to generate an approximation weights tensor, determining an expected output error of the layer in the neural network due to the approximation weights tensor, subtracting the expected output error from a bias parameter of the layer to determine an adjusted bias parameter and substituting the adjusted bias parameter for the bias parameter in the layer. Such operations may be performed for one or more layers in a neural network to produce an approximation version of the neural network for execution on a resource limited processor.

Type: Grant

Filed: March 23, 2020

Date of Patent: March 14, 2023

Assignee: Qualcomm Incorporated

Inventors: Marinus Willem Van Baalen, Tijmen Pieter Frederik Blankevoort, Markus Nagel
ROUNDING MECHANISMS FOR POST-TRAINING QUANTIZATION

Publication number: 20230076290

Abstract: A method for quantizing a pre-trained neural network includes computing a loss on a training set of candidate weights of the neural network. A rounding parameter is assigned to each candidate weight. The rounding parameter is a binary random value or a multinomial value. A quantized weight value is computed based on the loss and the rounding parameter.

Type: Application

Filed: February 4, 2021

Publication date: March 9, 2023

Inventors: Rana Ali AMJAD, Markus NAGEL, Tijmen Pieter Frederik BLANKEVOORT, Marinus Willem VAN BAALEN, Christos LOUIZOS
Bayesian Bits Joint Mixed-Precision Quantization And Structured Pruning Using Decomposed Quantization And Bayesian Gates

Publication number: 20230058159

Abstract: Various embodiments include methods and devices for joint mixed-precision quantization and structured pruning. Embodiments may include determining whether a plurality of gates of quantization and pruning gates are selected for combination, and in response to determining that the plurality of gates are selected for combination, iteratively for each successive gate of the plurality of gates selected for combination quantizing a residual error of a quantized tensor to a scale of a next bit-width producing a residual error quantized tensor in which the next bit-width increases for each successive iteration, and adding the quantized tensor and the residual error quantized tensor producing a next quantized tensor in which the next quantized tensor has the next bit-width, and in which the next quantized tensor is the quantized tensor for a successive iteration.

Type: Application

Filed: April 29, 2021

Publication date: February 23, 2023

Inventors: Marinus Willem VAN BAALEN, Christos LOUIZOS, Markus NAGEL, Tijmen Pieter Frederik BLANKEVOORT, Rana Ali AMJAD
Neural Network Pruning With Cyclical Sparsity

Publication number: 20220245457

Abstract: Various embodiments include methods and devices for neural network pruning. Embodiments may include receiving as an input a weight tensor for a neural network, increasing a level of sparsity of the weight tensor generating a sparse weight tensor, updating the neural network using the sparse weight tensor generating an updated weight tensor, decreasing a level of sparsity of the updated weight tensor generating a dense weight tensor, increasing the level of sparsity of the dense weight tensor the dense weight tensor generating a final sparse weight tensor, and using the neural network with the final sparse weight tensor to generate inferences. Some embodiments may include increasing a level of sparsity of a first sparse weight tensor generating a second sparse weight tensor, updating the neural network using the second sparse weight tensor generating a second updated weight tensor, and decreasing the level of sparsity the second updated weight tensor.

Type: Application

Filed: November 23, 2021

Publication date: August 4, 2022

Inventors: Suraj SRINIVAS, Tijmen Pieter Frederik BLANKEVOORT, Andrey KUZMIN, Markus NAGEL, Marinus Willem VAN BAALEN, Andrii SKLIAR
Systems and Methods of Cross Layer Rescaling for Improved Quantization Performance

Publication number: 20200302299

Abstract: Various embodiments include methods and neural network computing devices implementing the methods for performing quantization in neural networks. Various embodiments may include equalizing ranges of weight tensors or output channel weights within a first layer of the neural network by scaling each of the output channel weights of the first layer by a corresponding scaling factor, and scaling each of a second adjacent layer's corresponding input channel weights by applying an inverse of the corresponding scaling factor to the input channel weights. The corresponding scaling factor may be determined using a black-box optimizer on a quantization error metric or based on heuristics, equalization of dynamic ranges, equalization of range extrema (minima or maxima), differential learning using straight through estimator (STE) methods and a local or global loss, or using an error metric for the quantization error and a black-box optimizer that minimizes the error metric with respect to the scaling.

Type: Application

Filed: March 23, 2020

Publication date: September 24, 2020

Inventors: Markus NAGEL, Marinus Willem VAN BAALEN, Tijmen Pieter Frederik BLANKEVOORT
Analytic And Empirical Correction Of Biased Error Introduced By Approximation Methods

Publication number: 20200302298

Abstract: Various embodiments include methods and neural network computing devices implementing the methods for methods for method for generating an approximation neural network correcting for errors due to approximation operations. Various embodiments may include performing approximation operations on a weights tensor associated with a layer of a neural network to generate an approximation weights tensor, determining an expected output error of the layer in the neural network due to the approximation weights tensor, subtracting the expected output error from a bias parameter of the layer to determine an adjusted bias parameter and substituting the adjusted bias parameter for the bias parameter in the layer. Such operations may be performed for all layers in a neural network to produce an approximation version of the neural network for execution on a resource limited processor.

Type: Application

Filed: March 23, 2020

Publication date: September 24, 2020

Inventors: Marinus Willem VAN BAALEN, Tijmen Pieter Frederik BLANKEVOORT, Markus NAGEL