Patents by Inventor Tomas Feith

Tomas Feith has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Contextual re-ranking based on cursor position for documentation recommender systems

Patent number: 12650994

Abstract: Here is dynamic and contextual ranking of reference documentation based on an interactively selected position in new source logic. A computer receives a vocabulary of lexical tokens, a sequence of references that contains a first reference to a first reference document before a second reference to a second reference document, respective subsets of the vocabulary that occur in the first and second reference documents, a new source logic that contains a sequence of lexical tokens, respective measurements of semantic distance between the new source logic and the first and second reference documents, and a selected position in the sequence of lexical tokens. Based on the selected position, the measurements of semantic distance are selectively increased. Based on that increasing the measurements of the semantic distance, a relative ordering of the first and second references is reversed to generate and display a reordered sequence of references.

Type: Grant

Filed: September 28, 2023

Date of Patent: June 9, 2026

Assignee: Oracle International Corporation

Inventors: Tomas Feith, Arno Schneuwly, Saeid Allahdadian, Matteo Casserini, Kristopher Leland Rice, Felix Schmidt
Graph path prediction and masked language modelling joint training algorithm for language models

Patent number: 12566596

Abstract: In an embodiment providing natural language processing (NLP), a computer generates a histogram that correctly represents a graph that represents a lexical text, and generates a token sequence encoder that is trainable and untrained. During training such as pretraining, the token sequence encoder infers an encoded sequence that incorrectly represents the lexical text, and the encoded sequence is dense and saves space. To increase the accuracy of the sequence encoder by learning, the token sequence encoder is adjusted based on, as discussed herein, an indirectly measured numeric difference between the encoded sequence that incorrectly represents the lexical text and the histogram that correctly represents the graph.

Type: Grant

Filed: August 18, 2023

Date of Patent: March 3, 2026

Assignee: Oracle International Corporation

Inventors: Tomas Feith, Arno Schneuwly, Saeid Allahdadian, Matteo Casserini, Felix Schmidt
Next AST branch prediction and next token prediction joint pre-training task for code generative models

Patent number: 12547832

Abstract: During pretraining, a computer generates three trainable and untrained machine learning models that are a token sequence encoder, a token predictor, and a path predictor. A sequence of lexical tokens is generated that represents a lexical text in a training corpus. A graph is generated that represents the lexical text. In the graph, a next traversal path is selected that corresponds to a next lexical token that is adjacent to a sliding subsequence of the sequence of lexical tokens. From the subsequence, the token sequence encoder infers an encoded sequence that represents the subsequence. The path predictor and token predictor accept the encoded sequence as input for respective inferencing for which respective training losses are measured. Both training losses are combined into a combined loss that is used to increase the accuracy of the three machine learning models by, for example, backpropagation of the combined loss.

Type: Grant

Filed: December 22, 2023

Date of Patent: February 10, 2026

Assignee: Oracle International Corporation

Inventors: Tomas Feith, Arno Schneuwly, Saeid Allahdadian, Matteo Casserini, Felix Schmidt
NEXT AST BRANCH PREDICTION AND NEXT TOKEN PREDICTION JOINT PRE-TRAINING TASK FOR CODE GENERATIVE MODELS

Publication number: 20250209270

Abstract: During pretraining, a computer generates three trainable and untrained machine learning models that are a token sequence encoder, a token predictor, and a path predictor. A sequence of lexical tokens is generated that represents a lexical text in a training corpus. A graph is generated that represents the lexical text. In the graph, a next traversal path is selected that corresponds to a next lexical token that is adjacent to a sliding subsequence of the sequence of lexical tokens. From the subsequence, the token sequence encoder infers an encoded sequence that represents the subsequence. The path predictor and token predictor accept the encoded sequence as input for respective inferencing for which respective training losses are measured. Both training losses are combined into a combined loss that is used to increase the accuracy of the three machine learning models by, for example, backpropagation of the combined loss.

Type: Application

Filed: December 22, 2023

Publication date: June 26, 2025

Inventors: Tomas Feith, Arno Schneuwly, Saeid Allahdadian, Matteo Casserini, Felix Schmidt
MULTI-DECODER CLASSIFICATION ARCHITECTURE FOR COARSE-GRAINED CATEGORIZED DATA

Publication number: 20250173549

Abstract: A pretraining computer generates a neural encoder and multiple partition decoders (PDs) for respective partitions of training inputs (TIs) in a training corpus. A training batch is generated that contains a mix of TIs from multiple partitions. For each TI in the batch, the neural encoder infers an encoding and, based on the partition of the TI, exactly one PD is used to decode the encoding, for which an individual loss is measured. The individual loss is combined into a batch loss that is based on the entire batch, and combined into a partition loss that is based on TIs only in the partition of the exactly one PD. After measuring losses for the batch, the batch loss is backpropagated into the neural encoder without backpropagating the batch loss into any PD. Into each PD is backpropagated a respective partition loss that is based on TIs only in the decoder's partition.

Type: Application

Filed: November 27, 2023

Publication date: May 29, 2025

Inventors: Tomas Feith, Arno Schneuwly, Saeid Allahdadian, Matteo Casserini, Felix Schmidt
PARTIAL GRAPH PATH PREDICTION AND NEXT TOKEN PREDICTION JOINT TRAINING ALGORITHM FOR GENERATIVE LANGUAGE MODELS

Publication number: 20250165852

Abstract: During pretraining, a computer generates three untrained machine learning models that are a token sequence encoder, a token predictor, and a decoder that infers a frequency distribution of graph traversal paths. A sequence of lexical tokens is generated that represents a lexical text in a training corpus. A graph is generated that represents the lexical text. In the graph, multiple traversal paths are selected that collectively represent a sliding subsequence of the sequence of lexical tokens. From the subsequence, the token sequence encoder infers an encoded sequence that represents the subsequence of the sequence of lexical tokens. The decoder and token predictor accept the encoded sequence as input for respective inferencing for which respective training losses are measured. Both training losses are combined into a combined loss that is used to increase the accuracy of the three machine learning models by, for example, backpropagation of the combined loss.

Type: Application

Filed: November 20, 2023

Publication date: May 22, 2025

Inventors: Tomas Feith, Arno Schneuwly, Saeid Allahdadian, Matteo Casserini, Felix Schmidt
CONTEXTUAL RE-RANKING BASED ON CURSOR POSITION FOR DOCUMENTATION RECOMMENDER SYSTEMS

Publication number: 20250110961

Abstract: Here is dynamic and contextual ranking of reference documentation based on an interactively selected position in new source logic. A computer receives a vocabulary of lexical tokens, a sequence of references that contains a first reference to a first reference document before a second reference to a second reference document, respective subsets of the vocabulary that occur in the first and second reference documents, a new source logic that contains a sequence of lexical tokens, respective measurements of semantic distance between the new source logic and the first and second reference documents, and a selected position in the sequence of lexical tokens. Based on the selected position, the measurements of semantic distance are selectively increased. Based on that increasing the measurements of the semantic distance, a relative ordering of the first and second references is reversed to generate and display a reordered sequence of references.

Type: Application

Filed: September 28, 2023

Publication date: April 3, 2025

Inventors: Tomas Feith, Arno Schneuwly, Saeid Allahdadian, Matteo Casserini, Kristopher Leland Rice, Felix Schmidt
GRAPH PATH PREDICTION AND MASKED LANGUAGE MODELLING JOINT TRAINING ALGORITHM FOR LANGUAGE MODELS

Publication number: 20250060951

Abstract: In an embodiment providing natural language processing (NLP), a computer generates a histogram that correctly represents a graph that represents a lexical text, and generates a token sequence encoder that is trainable and untrained. During training such as pretraining, the token sequence encoder infers an encoded sequence that incorrectly represents the lexical text, and the encoded sequence is dense and saves space. To increase the accuracy of the sequence encoder by learning, the token sequence encoder is adjusted based on, as discussed herein, an indirectly measured numeric difference between the encoded sequence that incorrectly represents the lexical text and the histogram that correctly represents the graph.

Type: Application

Filed: August 18, 2023

Publication date: February 20, 2025

Inventors: Tomas Feith, Arno Schneuwly, Saeid Allahdadian, Matteo Casserini, Felix Schmidt
APPROXIMATE CONFUSION MATRIX FOR MULTI-LABEL CLASSIFICATION

Publication number: 20250036934

Abstract: Herein is validation of a trained classifier based on novel and accelerated estimation of a confusion matrix. In an embodiment, a computer hosts a trained classifier that infers, from many objects, an inferred frequency of each class. An upscaled magnitude of each class is generated from the inferred frequency of the class. An integer of each class is generated from the upscaled magnitude of the class. Based on those integers of the classes and a target integer for each class, counts are generated of the objects that are true positives, false positives, and false negatives of the class. Based on those counts, an estimated total of true positives, false positives, false negatives are generated that characterizes fitness of the trained classifier. In an embodiment, those counts and totals are downscaled to be fractions from zero to one.

Type: Application

Filed: July 28, 2023

Publication date: January 30, 2025

Inventors: Tomas Feith, Arno Schneuwly, Saeid Allahdadian, Matteo Casserini, Felix Schmidt