Patents by Inventor Juan Felipe Perez Vallejo

Juan Felipe Perez Vallejo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MODEL DISTILLATION FOR REDUCING ITERATIONS OF NON-AUTOREGRESSIVE DECODERS

Publication number: 20240020534

Abstract: A non-autoregressive transformer model is improved to maintain output quality while reducing a number of iterative applications of the model by training parameters of a student model based on a teacher model. The teacher model is applied several iterations to a masked output and a student model is applied one iteration, such that the respective output token predictions for the masked positions can be compared and a loss propagated to the student. The loss may be based on token distributions rather than the specific output tokens alone, and may additionally consider hidden state losses. The teacher model may also be updated for use in further training based on the updated model, for example, by updating its parameters as a moving average.

Type: Application

Filed: June 6, 2023

Publication date: January 18, 2024

Inventors: Juan Felipe Perez Vallejo, Maksims Volkovs, Sajad Norouzi, Rasa Hosseinzadeh
DISTANCE-BASED PAIR LOSS FOR COLLABORATIVE FILTERING

Publication number: 20230267367

Abstract: A recommendation system generates item recommendations for a user based on a distance between a user embedding and item embeddings. To train the item and user embeddings, the recommendation system user-item pairs as training data to focus on difficult items based on the positive and negative items with respect to individual users in the training set. In training, the weight of individual user-item pairs in affecting the user and item embeddings may be determined based on the distance of the particular user-item pair between user embedding and item embedding, as well as the comparative distance for other items of the same type for that user and for the distance of user-item pairs for other users, which may regulate the distances across types and across the training batch.

Type: Application

Filed: October 19, 2022

Publication date: August 24, 2023

Inventors: Maksims Volkovs, Zhaoyue Cheng, Juan Felipe Perez Vallejo, Jianing Sun, Zhaolin Gao
Initialization of Parameters for Machine-Learned Transformer Neural Network Architectures

Publication number: 20230252301

Abstract: An online system trains a transformer architecture by an initialization method which allows the transformer architecture to be trained without normalization layers of learning rate warmup, resulting in significant improvements in computational efficiency for transformer architectures. Specifically, an attention block included in an encoder or a decoder of the transformer architecture generates the set of attention representations by applying a key matrix to the input key, a query matrix to the input query, a value matrix to the input value to generate an output, and applying an output matrix to the output to generate the set of attention representations. The initialization method may be performed by scaling the parameters of the value matrix and the output matrix with a factor that is inverse to a number of the set of encoders or a number of the set of decoders.

Type: Application

Filed: April 19, 2023

Publication date: August 10, 2023

Inventors: Maksims Volkovs, Xiao Shi Huang, Juan Felipe Perez Vallejo
Initialization of parameters for machine-learned transformer neural network architectures

Patent number: 11663488

Abstract: An online system trains a transformer architecture by an initialization method which allows the transformer architecture to be trained without normalization layers of learning rate warmup, resulting in significant improvements in computational efficiency for transformer architectures. Specifically, an attention block included in an encoder or a decoder of the transformer architecture generates the set of attention representations by applying a key matrix to the input key, a query matrix to the input query, a value matrix to the input value to generate an output, and applying an output matrix to the output to generate the set of attention representations. The initialization method may be performed by scaling the parameters of the value matrix and the output matrix with a factor that is inverse to a number of the set of encoders or a number of the set of decoders.

Type: Grant

Filed: February 5, 2021

Date of Patent: May 30, 2023

Assignee: THE TORONTO-DOMINION BANK

Inventors: Maksims Volkovs, Xiao Shi Huang, Juan Felipe Perez Vallejo
TRANSLATION MODEL WITH LEARNED POSITION AND CORRECTIVE LOSS

Publication number: 20230119108

Abstract: An autoencoder model includes an encoder portion and a decoder portion. The encoder encodes an input token sequence to an input sequence representation that is decoded by the decoder to generate an output token sequence. The autoencoder model may decode multiple output tokens in parallel, such that the decoder may be applied iteratively. The decoder may receive an output estimate from a prior iteration to predict output tokens. To improve positional representation and reduce positional errors and repetitive tokens, the autoencoder may include a trained layer for combining token embeddings with positional encodings. In addition, the model may be trained with a corrective loss based on output predictions when the model receives a masked input as the output estimate.

Type: Application

Filed: October 18, 2022

Publication date: April 20, 2023

Inventors: Maksims Volkovs, Juan Felipe Perez Vallejo, Xiao Shi Huang
SYSTEM AND METHOD FOR DYNAMICALLY PREDICTING FRAUD USING MACHINE LEARNING

Publication number: 20220300903

Abstract: A computing device configured to communicate with a central server in order to predict likelihood of fraud in current transactions for a target claim. The computing device then extracts from information stored in the central server (relating to the target claim and past transactions for past claims including those marked as fraud), a plurality of distinct sets of features: text-based features derived from the descriptions of communications between the requesting device and the endpoint device, graph-based features derived from information relating to a network of claims and policies connected through shared information, and tabular features derived from the details related to claim information and exposure details. The features are input into a machine learning model for generating a likelihood of fraud in the current transactions and triggering an action based on the likelihood of fraud (e.g. stopping subsequent related transactions to the target claim).

Type: Application

Filed: March 19, 2021

Publication date: September 22, 2022

Inventors: XIAO SHI HUANG, SANDRA AZIZ, JUAN FELIPE PEREZ VALLEJO, JEAN-CHRISTOPHE BOUËTTÉ, JENNIFER BOUCHARD, MATHIEU JEAN RÉMI RAVAUT, MAKSIMS VOLKOVS, TOMI JOHAN POUTANEN, JOSEPH PUN, GHAITH KAZMA, OLIVIER GANDOUET
RECOMMENDATION WITH NEIGHBOR-AWARE HYPERBOLIC EMBEDDING

Publication number: 20220270155

Abstract: A recommendation system generates recommendations for user-item pairs based on embeddings in hyperbolic space. Each user and item may be associated with a local hyperbolic embedding representing the user or item in hyperbolic space. The hyperbolic embedding may be modified by neighborhood information. Because the hyperbolic space may have no closed form for combining neighbor information, the local embedding may be converted to a tangent space for neighborhood aggregation information and converted back to hyperbolic space for a neighborhood-aware embedding to be used in the recommendation score.

Type: Application

Filed: February 17, 2022

Publication date: August 25, 2022

Inventors: Maksims Volkovs, Zhaoyue Cheng, Juan Felipe Perez Vallejo, Jianing Sun, Saba Zuberi
Initialization of Parameters for Machine-Learned Transformer Neural Network Architectures

Publication number: 20210255862

Abstract: An online system trains a transformer architecture by an initialization method which allows the transformer architecture to be trained without normalization layers of learning rate warmup, resulting in significant improvements in computational efficiency for transformer architectures. Specifically, an attention block included in an encoder or a decoder of the transformer architecture generates the set of attention representations by applying a key matrix to the input key, a query matrix to the input query, a value matrix to the input value to generate an output, and applying an output matrix to the output to generate the set of attention representations. The initialization method may be performed by scaling the parameters of the value matrix and the output matrix with a factor that is inverse to a number of the set of encoders or a number of the set of decoders.

Type: Application

Filed: February 5, 2021

Publication date: August 19, 2021

Inventors: Maksims Volkovs, Xiao Shi Huang, Juan Felipe Perez Vallejo