Patents by Inventor Eric A. Sather

Eric A. Sather has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Training natural language processing network based on existing network

Patent number: 12675635

Abstract: Some embodiments provide a method for training a first NLP network based on a previously-trained second NLP network. The method propagates text inputs through the first network to generate a first set of output vectors and the second network to generate a second set of output vectors. Each first-set vector generated based on a text input has a corresponding second-set vector generated based on the same text input and each vector of the first and second sets has a same number of vector components. The method computes a value for a loss function that emphasizes a maximum disparity between components of first-set vectors and corresponding components of second-set vectors. The method trains the first network using the computed loss function value to minimize the maximum disparity between the first-set vector components and corresponding second-set vector components so that the first network produces outputs similar to outputs of the second network.

Type: Grant

Filed: January 2, 2024

Date of Patent: July 7, 2026

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Steven L. Teig, Eric A. Sather, Evgeny Sorkin
Selection of inputs for training machine-trained network

Patent number: 12596931

Abstract: Some embodiments provide a method for training a machine-trained network that includes multiple parameters. The method propagates a batch of input training items through the network to generate output values and compute values of a loss function for each of the input training items. The method uses the computed values of the loss function for the input training items to adjust the parameters of the network. The method computes a gradient of the loss function for each of the input training items. The method selects input training items for subsequent batches of input training items based on a ratio of the value of the loss function to the gradient of the loss function for each of the input training items.

Type: Grant

Filed: December 26, 2022

Date of Patent: April 7, 2026

Assignee: Amazon Technologies, Inc.

Inventors: Steven L. Teig, Eric A. Sather, Andrew F. Siegel, Evgeny Sorkin
Decomposition of weight tensors for structural sparsity

Patent number: 12579430

Abstract: Some embodiments provide a method for improving structural sparsity of a machine-trained (MT) network. The method receives a network having multiple layers. Each layer of a set of the layers includes multiple filters of weight values. The method replaces the filters of a particular layer of the network with (i) a first set of filters of weight values, (ii) a set of scale values for the first set of filters, and (iii) a second set of filters of weight values. Each scale value corresponds to a different one of the filters of the first set of filters. The method trains the network by applying constraints to bias at least a subset of the scale values towards zero. When a particular scale value falls below a threshold value, the particular scale value is set to zero.

Type: Grant

Filed: March 16, 2022

Date of Patent: March 17, 2026

Inventors: Eric A. Sather, Steven L. Teig
Accounting for compute time in training of network

Patent number: 12572798

Abstract: Some embodiments provide a method for training a machine-trained (MT) network. The method receives a network having multiple layers. Each layer of a set of the layers includes multiple weight values. The method trains the network by alternately (1) propagating inputs through the network to generate outputs and adjusting the weight values based on differences between the generated outputs and expected outputs and (2) identifying sets of the weight values for removal according to a set of constraints that accounts for (i) a total number of weight values and (ii) an amount of time required to execute the network on a particular type of integrated circuit.

Type: Grant

Filed: March 16, 2022

Date of Patent: March 10, 2026

Assignee: Amazon Technologies, Inc.

Inventors: Eric A. Sather, Steven L. Teig
Probabilistic projection of network parameters

Patent number: 12536413

Abstract: Some embodiments provide a method for training a machine-trained (MT) network. The method receives a network comprising a plurality of parameters. The method trains the network by iteratively (i) propagating inputs through the network to generate outputs and adjusting the parameters based on differences between the generated outputs and expected outputs to minimize a loss function with respect to the parameters, (ii) probabilistically projecting the parameters to minimize the loss function with respect to a set of constraints on the weight values, the probabilistic projection treating the parameters as probability distributions, and updating a set of variables of the loss function based on the probability distributions.

Type: Grant

Filed: March 16, 2022

Date of Patent: January 27, 2026

Assignee: Amazon Technologies, Inc.

Inventors: Eric A. Sather, Steven L. Teig
Circuit for executing stateful neural network

Patent number: 12462350

Abstract: Some embodiments provide a neural network inference circuit for executing a neural network that includes multiple nodes that use state data from previous executions of the neural network. The neural network inference circuit includes (i) a set of computation circuits configured to execute the nodes of the neural network and (ii) a set of memories configured to implement a set of one or more registers to store, while executing the neural network for a particular input, state data generated during at least two executions of the network for previous inputs. The state data is for use by the set of computation circuits when executing a set of the nodes of the neural network for the particular input.

Type: Grant

Filed: January 5, 2024

Date of Patent: November 4, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Andrew C. Mihal, Steven L. Teig, Eric A. Sather
Weighted selection of inputs for training machine-trained network

Patent number: 12367661

Abstract: Some embodiments provide a method for training a machine-trained network that includes multiple parameters. The method propagates a batch of input training items through the network to generate output values and compute values of a loss function for each of the input training items. The method computes a weight for each input training item based on the computed loss function values for each of the input training items. The method selects input training items with larger weights more often than input training items with smaller weights for subsequent batches of input training items.

Type: Grant

Filed: December 26, 2022

Date of Patent: July 22, 2025

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Steven L. Teig, Eric A. Sather, Andrew F. Siegel, Evgeny Sorkin
Training network with discrete weight values

Patent number: 12299555

Abstract: Some embodiments provide an electronic device that includes a set of processing units and a set of machine-readable media. The set of machine-readable media stores sets of instructions for applying a network of computation nodes to an input received by the device. The set of machine-readable media stores at least two sets of machine-trained parameters for configuring the network for different types of inputs. A first of the sets of parameters is used for applying the network to a first type of input and a second of the sets of parameters is used for applying the network to a second type of input.

Type: Grant

Filed: August 24, 2022

Date of Patent: May 13, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Steven L. Teig, Eric A. Sather
Using batches of training items for training a network

Patent number: 12248880

Abstract: Some embodiments provide a method for training a machine-trained (MT) network that processes inputs using network parameters. The method propagates a set of input training items through the MT network to generate a set of output values. The set of input training items comprises multiple training items for each of multiple categories. The method identifies multiple training item groupings in the set of input training items. Each grouping includes at least two training items in a first category and at least one training item in a second category. The method calculates a value of a loss function as a summation of individual loss functions for each of the identified training item groupings. The individual loss function for each particular training item grouping is based on the output values for the training items of the grouping. The method trains the network parameters using the calculated loss function value.

Type: Grant

Filed: August 27, 2023

Date of Patent: March 11, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Eric A. Sather, Steven L. Teig, Andrew C. Mihal
EXECUTING REPLICATED NEURAL NETWORK LAYERS ON INFERENCE CIRCUIT

Publication number: 20250028945

Abstract: Some embodiments provide a method for executing a layer of a neural network, for a circuit that restricts a number of weight values used per layer. The method applies a first set of weights to a set of inputs to generate a first set of results. The first set of weights are restricted to a first set of allowed values. For each of one or more additional sets of weights, the method applies the respective additional set of weights to the same set of inputs to generate a respective additional set of results. The respective additional set of weights is restricted to a respective additional set of allowed values that is related to the first set of allowed values and the other additional sets of allowed values. The method generates outputs for the particular layer by combining the first set of results with each respective additional set of results.

Type: Application

Filed: May 17, 2024

Publication date: January 23, 2025

Inventors: Eric A. Sather, Steven L. Teig
Training sparse networks with discrete weight values

Patent number: 12175368

Abstract: Some embodiments provide a method for training a machine-trained (MT) network. The method propagates multiple inputs through the MT network to generate an output for each of the inputs. each of the inputs is associated with an expected output, the MT network uses multiple network parameters to process the inputs, and each network parameter of a set of the network parameters is defined during training as a probability distribution across a discrete set of possible values for the network parameter. The method calculates a value of a loss function for the MT network that includes (i) a first term that measures network error based on the expected outputs compared to the generated outputs and (ii) a second term that penalizes divergence of the probability distribution for each network parameter in the set of network parameters from a predefined probability distribution for the network parameter.

Type: Grant

Filed: November 7, 2022

Date of Patent: December 24, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Steven L. Teig, Eric A. Sather
Training network to maximize true positive rate at low false positive rate

Patent number: 12165066

Abstract: Some embodiments provide a method for training a machine-trained (MT) network that processes input data using network parameters. The method maps a set of input instances to a set of output values by propagating the set of input instances through the MT network. The set of input instances includes input instances for each of multiple categories. For a particular input instance selected as an anchor instance, the method calculates a true positive rate (TPR) for the MT network as a function of a distance between the output value for the anchor instance and the output value for each input instance not in a same category as the anchor instance. The method calculates a loss function for the anchor instance that maximizes the TPR for the MT network at low false positive rate. The method trains the network parameters using the calculated loss function.

Type: Grant

Filed: March 14, 2018

Date of Patent: December 10, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Eric A. Sather, Steven L. Teig, Andrew C. Mihal
Optimizing global sparsity for neural network

Patent number: 12136039

Abstract: Some embodiments provide a method for training multiple parameters of a machine-trained (MT) network subject to a sparsity constraint that requires a threshold portion of the parameters to be equal to zero. A first set of the parameters subject to the sparsity constraint are grouped into groups of parameters. For each parameter of a second set of the parameters subject to the sparsity constraint, the method determines an accuracy penalty associated with the parameter being set to zero. For each group of parameters in the first set of parameters, the method determines a minimum accuracy penalty for each possible number of parameters in the group being set to zero. The method uses the determined accuracy penalties to set to the value zero at least the threshold portion of the plurality of parameters.

Type: Grant

Filed: July 7, 2020

Date of Patent: November 5, 2024

Assignee: PERCEIVE CORPORATION

Inventors: Eric A. Sather, Steven L. Teig
Optimizing loss function during training of network

Patent number: 12112254

Abstract: Some embodiments provide a method for training a machine-trained (MT) network. The method uses a set of training inputs to train parameters of the MT network according to an initial loss function. The method uses a set of validation inputs to compute an error measure for the MT network as trained by the first set of training inputs. The method modifies the loss function for subsequent training of the MT network based on the computed error measure. The method uses the set of training inputs to train the parameters of the MT network according to the modified loss function.

Type: Grant

Filed: February 3, 2020

Date of Patent: October 8, 2024

Assignee: PERCEIVE CORPORATION

Inventors: Steven L. Teig, Eric A. Sather
Decomposition of ternary weight tensors

Patent number: 12061988

Abstract: Some embodiments provide a method for training parameters of a network. The method receives a network with layers of nodes. Each node of a set of the layers computes an output value based on a set of input values and a set of trained weight values. A first layer of the network includes a first number of filters. The method replaces the first layer with a second layer having a second number of filters that is less than the first number and a third layer, following the second layer, having the first number of filters. Each weight value in the filters of the second and third layers is restricted to a set of allowed quantized weight values. A total number of weight values in the filters of the second and third layers is less than a total number of weight values in the filters of the first layer.

Type: Grant

Filed: November 4, 2020

Date of Patent: August 13, 2024

Assignee: PERCEIVE CORPORATION

Inventors: Eric A. Sather, Steven L. Teig
Decomposition of weight tensors in network with value quantization

Patent number: 12061981

Abstract: Some embodiments provide a method for training parameters of a network. the method receives a machine-trained (MT) network with multiple layers of computation nodes. Each computation node of a set of the layers computes an output value based on a set of input values and a set of trained weight values. A first layer of the MT network includes a first number of filters. The method replaces the first layer with (i) a second layer having a second number of filters that is less than the first number of filters and (ii) a third layer having the first number of filters. Output values of computation nodes of the second layer are quantized and the third layer using the quantized output values of the second layer as input values.

Type: Grant

Filed: November 4, 2020

Date of Patent: August 13, 2024

Assignee: PERCEIVE CORPORATION

Inventors: Eric A. Sather, Steven L. Teig
Batch normalization for replicated layers of neural network

Patent number: 12045725

Abstract: Some embodiments provide a method for training a network including layers that each includes multiple nodes. The method identifies a set of related layers of the network. Each node in one of the related layers has corresponding nodes in each of the other related layers. Each set of corresponding nodes receives a same set of inputs and applies different sets of weights to the inputs to generate an output. The method identifies an element-wise addition layer including nodes that each add outputs of a different set of corresponding nodes from the related layers to generate a sum. The method uses a set of outputs generated by the nodes of each related layer to determine batch normalization parameters specific to each layer of the set of related layers. The method uses data generated by the element-wise addition layer to determine batch normalization parameters for the set of related layers.

Type: Grant

Filed: July 7, 2020

Date of Patent: July 23, 2024

Assignee: PERCEIVE CORPORATION

Inventors: Eric A. Sather, Steven L. Teig
Quantizing Neural Networks Using Shifting and Scaling

Publication number: 20240193426

Abstract: Some embodiments of the invention provide a novel method for training a quantized machine-trained network. Some embodiments provide a method of scaling a feature map of a pre-trained floating-point neural network in order to match the range of output values provided by quantized activations in a quantized neural network. A quantization function is modified, in some embodiments, to be differentiable to fix the mismatch between the loss function computed in forward propagation and the loss gradient used in backward propagation. Variational information bottleneck, in some embodiments, is incorporated to train the network to be insensitive to multiplicative noise applied to each channel. In some embodiments, channels that finish training with large noise, for example, exceeding 100%, are pruned.

Type: Application

Filed: December 15, 2023

Publication date: June 13, 2024

Inventors: Eric A. Sather, Steven L. Teig
Training network with batches of input instances

Patent number: 11995537

Abstract: Some embodiments provide a method for training a machine-trained (MT) network that processes input data using network parameters. The method maps a set of input instances to a set of output values by propagating the set of input instances through the MT network. The set of input instances include input instances for each of multiple categories. The method selects multiple input instances as anchor instances. For each anchor instance, the method computes a loss function as a comparison between the output value for the anchor instance and each output value for an input instance in a different category than the anchor. The method computes a total loss function for the MT network as a sum of the loss function computed for each anchor instance. The method trains the network parameters using the computed total loss function.

Type: Grant

Filed: March 14, 2018

Date of Patent: May 28, 2024

Assignee: PERCEIVE CORPORATION

Inventors: Eric A. Sather, Steven L. Teig, Andrew C. Mihal
Executing replicated neural network layers on inference circuit

Patent number: 11995533

Abstract: Some embodiments provide a method for executing a layer of a neural network, for a circuit that restricts a number of weight values used per layer. The method applies a first set of weights to a set of inputs to generate a first set of results. The first set of weights are restricted to a first set of allowed values. For each of one or more additional sets of weights, the method applies the respective additional set of weights to the same set of inputs to generate a respective additional set of results. The respective additional set of weights is restricted to a respective additional set of allowed values that is related to the first set of allowed values and the other additional sets of allowed values. The method generates outputs for the particular layer by combining the first set of results with each respective additional set of results.

Type: Grant

Filed: November 14, 2019

Date of Patent: May 28, 2024

Assignee: PERCEIVE CORPORATION

Inventors: Eric A. Sather, Steven L. Teig

1 2 3 next