Patents by Inventor Felix Schmidt

Felix Schmidt has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Extraction from trees at scale

Patent number: 11620118

Abstract: Herein are machine learning (ML) feature processing and analytic techniques to detect anomalies in parse trees of logic statements, database queries, logic scripts, compilation units of general-purpose programing language, extensible markup language (XML), JavaScript object notation (JSON), and document object models (DOM). In an embodiment, a computer identifies an operational trace that contains multiple parse trees. Values of explicit features are generated from a single respective parse tree of the multiple parse trees of the operational trace. Values of implicit features are generated from more than one respective parse tree of the multiple parse trees of the operational trace. The explicit and implicit features are stored into a same feature vector. With the feature vector as input, an ML model detects whether or not the operational trace is anomalous, based on the explicit features of each parse tree of the operational trace and the implicit features of multiple parse trees of the operational trace.

Type: Grant

Filed: February 12, 2021

Date of Patent: April 4, 2023

Assignee: Oracle International Corporation

Inventors: Arno Schneuwly, Nikola Milojkovic, Felix Schmidt, Nipun Agarwal
NOVEL COMPOUNDS FOR THE DIAGNOSIS, TREATMENT AND PREVENTION OF DISEASES ASSOCIATED WITH THE AGGREGATION OF ALPHA-SYNUCLEIN

Publication number: 20230067910

Abstract: The present invention relates to compounds represented by general formula (Ia), (Ib), (IIa) or (IIb). These compounds are suitable for imaging alpha-synuclein and for diagnosing diseases which are associated with the aggregation of alpha-synuclein. The compounds are also useful for treating and preventing diseases which are associated with the aggregation of alpha-synuclein.

Type: Application

Filed: November 19, 2020

Publication date: March 2, 2023

Applicants: MODAG GMBH, MAX-PLANCK-GESELLSCHAFT ZUR FÖRDERUNG DER WISSENSCHAFTEN E.V.

Inventors: Armin GIESE, Felix SCHMIDT, Daniel WECKBECKER, Andrei LEONOV, Sergey RYAZANOV, Christian GRIESINGER, Bernd PICHLER, Kristina HERFERT, Andreas MAURER, Laura KÜBLER, Sabrina BUSS
Disk drive failure prediction with neural networks

Patent number: 11579951

Abstract: Techniques are described herein for predicting disk drive failure using a machine learning model. The framework involves receiving disk drive sensor attributes as training data, preprocessing the training data to select a set of enhanced feature sequences, and using the enhanced feature sequences to train a machine learning model to predict disk drive failures from disk drive sensor monitoring data. Prior to the training phase, the RNN LSTM model is tuned using a set of predefined hyper-parameters. The preprocessing, which is performed during the training and evaluation phase as well as later during the prediction phase, involves using predefined values for a set of parameters to generate the set of enhanced sequences from raw sensor reading. The enhanced feature sequences are generated to maintain a desired healthy/failed disk ratio, and only use samples leading up to a last-valid-time sample in order to honor a pre-specified heads-up-period alert requirement.

Type: Grant

Filed: September 27, 2018

Date of Patent: February 14, 2023

Assignee: Oracle International Corporation

Inventors: Onur Kocberber, Felix Schmidt, Arun Raghavan, Nipun Agarwal, Sam Idicula, Guang-Tong Zhou, Nitin Kunal
ANOMALY DETECTION PERFORMANCE ENHANCEMENT USING GRADIENT-BASED FEATURE IMPORTANCE

Publication number: 20230043993

Abstract: Herein are machine learning techniques that adjust reconstruction loss of a reconstructive model, such as a principal component analysis (PCA), based on importances of features. In an embodiment having a reconstructive model that more or less accurately reconstructs its input, a computer measures, for each feature, a respective importance that is based on the reconstructive model. For example, importance may be based on grading samples that the reconstructive model correctly or incorrectly inferenced. For each feature during production inferencing, a respective original loss from the reconstructive model measures a difference between a value of the feature in an input and a reconstructed value of the feature generated by the reconstructive model. For each feature, the respective importance of the feature is applied to the respective original loss to generate a respective weighted loss, which compensates for concept drift.

Type: Application

Filed: August 4, 2021

Publication date: February 9, 2023

Inventors: SAEID ALLAHDADIAN, YUTING SUN, NAVANEETH JAMADAGNI, FELIX SCHMIDT, MARIA VLACHOPOULOU
BALANCING FEATURE DISTRIBUTIONS USING AN IMPORTANCE FACTOR

Publication number: 20230024884

Abstract: Herein are machine learning techniques that adjust reconstruction loss of a reconstructive model such as an autoencoder based on importances of values of features. In an embodiment and before, during, or after training, the reconstructive model that more or less accurately reconstructs its input, a computer measures, for each distinct value of each feature, a respective importance that is not based on the reconstructive model. For example, importance may be based solely on a training corpus. For each feature during or after training, a respective original loss from the reconstructive model measures a difference between a value of the feature in an input and a reconstructed value of the feature generated by the reconstructive model. For each feature, the respective importance of the input value of the feature is applied to the respective original loss to generate a respective weighted loss. The weighted losses of the features of the input are collectively detected as anomalous or non-anomalous.

Type: Application

Filed: July 20, 2021

Publication date: January 26, 2023

Inventors: MATTEO CASSERINI, SAEID ALLAHDADIAN, FELIX SCHMIDT, ANDREW BROWNSWORD
DATACENTER LEVEL UTILIZATION PREDICTION WITHOUT OPERATING SYSTEM INVOLVEMENT

Publication number: 20220351023

Abstract: Embodiments use a hierarchy of machine learning models to predict datacenter behavior at multiple hardware levels of a datacenter without accessing operating system generated hardware utilization information. The accuracy of higher-level models in the hierarchy of models is increased by including, as input to the higher-level models, hardware utilization predictions from lower-level models. The hierarchy of models includes: server utilization models and workload/OS prediction models that produce predictions at a server device-level of a datacenter; and also top-of-rack switch models and backbone switch models that produce predictions at higher levels of the datacenter. These models receive, as input, hardware utilization information from non-OS sources. Based on datacenter-level network utilization predictions from the hierarchy of models, the datacenter automatically configures its hardware to avoid any predicted over-utilization of hardware in the datacenter.

Type: Application

Filed: July 18, 2022

Publication date: November 3, 2022

Inventors: Pravin Shinde, Felix Schmidt, Onur Kocberber
SPARSE ENSEMBLING OF UNSUPERVISED MODELS

Publication number: 20220318684

Abstract: Techniques are provided for sparse ensembling of unsupervised machine learning models. In an embodiment, the proposed architecture is composed of multiple unsupervised machine learning models that each produce a score as output and a gating network that analyzes the inputs and outputs of the unsupervised machine learning models to select an optimal ensemble of unsupervised machine learning models. The gating network is trained to choose a minimal number of the multiple unsupervised machine learning models whose scores are combined to create a final score that matches or closely resembles a final score that is computed using all the scores of the multiple unsupervised machine learning models.

Type: Application

Filed: April 2, 2021

Publication date: October 6, 2022

Inventors: SAEID ALLAHDADIAN, AMIN SUZANI, MILOS VASIC, MATTEO CASSERINI, ANDREW BROWNSWORD, FELIX SCHMIDT, NIPUN AGARWAL
Malicious activity detection by cross-trace analysis and deep learning

Patent number: 11451565

Abstract: Techniques are provided herein for contextual embedding of features of operational logs or network traffic for anomaly detection based on sequence prediction. In an embodiment, a computer has a predictive recurrent neural network (RNN) that detects an anomalous network flow. In an embodiment, an RNN contextually transcodes sparse feature vectors that represent log messages into dense feature vectors that may be predictive or used to generate predictive vectors. In an embodiment, graph embedding improves feature embedding of log traces. In an embodiment, a computer detects and feature-encodes independent traces from related log messages. These techniques may detect malicious activity by anomaly analysis of context-aware feature embeddings of network packet flows, log messages, and/or log traces.

Type: Grant

Filed: September 5, 2018

Date of Patent: September 20, 2022

Assignee: Oracle International Corporation

Inventors: Guang-Tong Zhou, Hossein Hajimirsadeghi, Andrew Brownsword, Stuart Wray, Craig Schelp, Rod Reddekopp, Felix Schmidt
Kernel subsampling for an accelerated tree similarity computation

Patent number: 11449517

Abstract: Approaches herein relate to machine learning for detection of anomalous logic syntax. Herein is acceleration for comparison of parse trees such as suspicious database queries. In an embodiment, a computer identifies subtrees in each of many trees. A respective subset of participating subtrees is selected in each tree. A respective root node of each participating subtree should directly have a child node that is a leaf and/or should have a degree that exceeds a branching threshold such as one. For each pairing of a respective first tree with a respective second tree, based on a count of subtree matches between the participating subset of subtrees in the first tree and the participating subset of subtrees in the second tree, a respective tree similarity score is calculated. A machine learning model inferences based on the tree similarity scores of the many trees. In an embodiment, each tree similarity score is a convolution kernel.

Type: Grant

Filed: December 22, 2020

Date of Patent: September 20, 2022

Assignee: Oracle International Corporation

Inventors: Arno Schneuwly, Nikola Milojkovic, Felix Schmidt, Nipun Agarwal
MACHINE LEARNING-BASED DNS REQUEST STRING REPRESENTATION WITH HASH REPLACEMENT

Publication number: 20220294757

Abstract: Techniques are described herein for using machine learning to learn vector representations of DNS requests such that the resulting embeddings represent the semantics of the DNS requests as a whole. Techniques described herein perform pre-processing of tokenized DNS request strings in which hashes, which are long and relatively random strings of characters, are detected in DNS request strings and each detected hash token is replaced with a placeholder token. A vectorizing ML model is trained using the pre-processed training dataset in which hash tokens have been replaced. Embeddings for the DNS tokens are derived from an intermediate layer of the vectorizing ML model. The encoding application creates final vector representations for each DNS request string by generating a weighted summation of the embeddings of all of the tokens in the DNS request string. Because of hash replacement, the resulting DNS request embeddings reflect semantics of the hashes as a group.

Type: Application

Filed: March 10, 2021

Publication date: September 15, 2022

Inventors: Renata Khasanova, Felix Schmidt, Stuart Wray, Craig Schelp, Nipun Agarwal, Matteo Casserini
MULTI-STAGE FEATURE EXTRACTION FOR EFFECTIVE ML-BASED ANOMALY DETECTION ON STRUCTURED LOG DATA

Publication number: 20220292304

Abstract: Herein are feature extraction mechanisms that receive parsed log messages as inputs and transform them into numerical feature vectors for machine learning models (MLMs). In an embodiment, a computer extracts fields from a log message. Each field specifies a name, a text value, and a type. For each field, a field transformer for the field is dynamically selected based the field's name and/or the field's type. The field transformer converts the field's text value into a value of the field's type. A feature encoder for the value of the field's type is dynamically selected based on the field's type and/or a range of the field's values that occur in a training corpus of an MLM. From the feature encoder, an encoding of the value of the field's typed is stored into a feature vector. Based on the MLM and the feature vector, the log message is detected as anomalous or not.

Type: Application

Filed: March 12, 2021

Publication date: September 15, 2022

Inventors: AMIN SUZANI, SAEID ALLAHDADIAN, MILOS VASIC, MATTEO CASSERINI, HAMED AHMADI, FELIX SCHMIDT, ANDREW BROWNSWORD, NIPUN AGARWAL
Datacenter level utilization prediction without operating system involvement

Patent number: 11443166

Abstract: Embodiments use a hierarchy of machine learning models to predict datacenter behavior at multiple hardware levels of a datacenter without accessing operating system generated hardware utilization information. The accuracy of higher-level models in the hierarchy of models is increased by including, as input to the higher-level models, hardware utilization predictions from lower-level models. The hierarchy of models includes: server utilization models and workload/OS prediction models that produce predictions at a server device-level of a datacenter; and also top-of-rack switch models and backbone switch models that produce predictions at higher levels of the datacenter. These models receive, as input, hardware utilization information from non-OS sources. Based on datacenter-level network utilization predictions from the hierarchy of models, the datacenter automatically configures its hardware to avoid any predicted over-utilization of hardware in the datacenter.

Type: Grant

Filed: October 29, 2018

Date of Patent: September 13, 2022

Assignee: Oracle International Corporation

Inventors: Pravin Shinde, Felix Schmidt, Onur Kocberber
RELATIONAL METHOD FOR TRANSFORMING UNSORTED SPARSE DICTIONARY ENCODINGS INTO UNSORTED-DENSE OR SORTED -DENSE DICTIONARY ENCODINGS

Publication number: 20220284005

Abstract: Unsorted sparse dictionary encodings are transformed into unsorted-dense or sorted-dense dictionary encodings. Sparse domain codes have large gaps between codes that are adjacent in order. Unlike spare codes, dense codes have smaller gaps between adjacent codes; consecutive codes are dense codes that have no gaps between adjacent codes. The techniques described herein are relational approaches that may be used to generate sparse composite codes and sorted codes.

Type: Application

Filed: May 24, 2022

Publication date: September 8, 2022

Inventors: Pit Fender, Felix Schmidt, Benjamin Schlegel
Out of band server utilization estimation and server workload characterization for datacenter resource optimization and forecasting

Patent number: 11423327

Abstract: Techniques are described herein for estimating CPU, memory, and I/O utilization for a workload via out-of-band sensor readings using a machine learning model. The framework involves receiving sensor data associated with executing benchmark applications, obtaining ground truth utilization values for the benchmarks, preprocessing the training data to select a set of enhanced sequences, and using the enhanced sequences to train a random forest model to estimate CPU, memory, and I/O utilization given sensor monitoring data. Prior to the training phase, a machine learning model is trained using a set of predefined hyper-parameters. The trained models are used to generate estimations for CPU, memory, and I/O utilizations values. The utilization values are used with workload context information to assess the deployment and generate one or more recommendations for machine types that will best serve the workload in terms of system utilization.

Type: Grant

Filed: October 10, 2018

Date of Patent: August 23, 2022

Assignee: Oracle International Corporation

Inventors: Onur Kocberber, Felix Schmidt, Craig Schelp, Andrew Brownsword, Nipun Agarwal
EXTRACTION FROM TREES AT SCALE

Publication number: 20220261228

Abstract: Herein are machine learning (ML) feature processing and analytic techniques to detect anomalies in parse trees of logic statements, database queries, logic scripts, compilation units of general-purpose programing language, extensible markup language (XML), JavaScript object notation (JSON), and document object models (DOM). In an embodiment, a computer identifies an operational trace that contains multiple parse trees. Values of explicit features are generated from a single respective parse tree of the multiple parse trees of the operational trace. Values of implicit features are generated from more than one respective parse tree of the multiple parse trees of the operational trace. The explicit and implicit features are stored into a same feature vector. With the feature vector as input, an ML model detects whether or not the operational trace is anomalous, based on the explicit features of each parse tree of the operational trace and the implicit features of multiple parse trees of the operational trace.

Type: Application

Filed: February 12, 2021

Publication date: August 18, 2022

Inventors: Arno Schneuwly, Nikola Milojkovic, Felix Schmidt, Nipun Agarwal
Relational method for transforming unsorted sparse dictionary encodings into unsorted-dense or sorted-dense dictionary encodings

Patent number: 11379450

Abstract: Unsorted sparse dictionary encodings are transformed into unsorted-dense or sorted-dense dictionary encodings. Sparse domain codes have large gaps between codes that are adjacent in order. Unlike spare codes, dense codes have smaller gaps between adjacent codes; consecutive codes are dense codes that have no gaps between adjacent codes. The techniques described herein are relational approaches that may be used to generate sparse composite codes and sorted codes.

Type: Grant

Filed: October 9, 2018

Date of Patent: July 5, 2022

Assignee: Oracle International Corporation

Inventors: Pit Fender, Felix Schmidt, Benjamin Schlegel
GENERALIZED PRODUCTION RULES - N-GRAM FEATURE EXTRACTION FROM ABSTRACT SYNTAX TREES (AST) FOR CODE VECTORIZATION

Publication number: 20220198294

Abstract: Herein is resource-constrained feature enrichment for analysis of parse trees such as suspicious database queries. In an embodiment, a computer receives a parse tree that contains many tree nodes. Each tree node is associated with a respective production rule that was used to generate the tree node. Extracted from the parse tree are many sequences of production rules having respective sequence lengths that satisfy a length constraint that accepts at least one fixed length that is greater than two. Each extracted sequence of production rules consists of respective production rules of a sequence of tree nodes in a respective directed tree path of the parse tree having a path length that satisfies that same length constraint. Based on the extracted sequences of production rules, a machine learning model generates an inference. In a bag of rules data structure, the extracted sequences of production rules are aggregated by distinct sequence and duplicates are counted.

Type: Application

Filed: December 23, 2020

Publication date: June 23, 2022

Inventors: ARNO SCHNEUWLY, NIKOLA MILOJKOVIC, FELIX SCHMIDT, NIPUN AGARWAL
KERNEL SUBSAMPLING FOR AN ACCELERATED TREE SIMILARITY COMPUTATION

Publication number: 20220197917

Abstract: Approaches herein relate to machine learning for detection of anomalous logic syntax. Herein is acceleration for comparison of parse trees such as suspicious database queries. In an embodiment, a computer identifies subtrees in each of many trees. A respective subset of participating subtrees is selected in each tree. A respective root node of each participating subtree should directly have a child node that is a leaf and/or should have a degree that exceeds a branching threshold such as one. For each pairing of a respective first tree with a respective second tree, based on a count of subtree matches between the participating subset of subtrees in the first tree and the participating subset of subtrees in the second tree, a respective tree similarity score is calculated. A machine learning model inferences based on the tree similarity scores of the many trees. In an embodiment, each tree similarity score is a convolution kernel.

Type: Application

Filed: December 22, 2020

Publication date: June 23, 2022

Inventors: ARNO SCHNEUWLY, NIKOLA MILOJKOVIC, FELIX SCHMIDT, NIPUN AGARWAL
AUTOMATICALLY CHANGE ANOMALY DETECTION THRESHOLD BASED ON PROBABILISTIC DISTRIBUTION OF ANOMALY SCORES

Publication number: 20220188694

Abstract: Approaches herein relate to model decay of an anomaly detector due to concept drift. Herein are machine learning techniques for dynamically self-tuning an anomaly score threshold. In an embodiment in a production environment, a computer receives an item in a stream of items. A machine learning (ML) model hosted by the computer infers by calculation an anomaly score for the item. Whether the item is anomalous or not is decided based on the anomaly score and an adaptive anomaly threshold that dynamically fluctuates. A moving standard deviation of anomaly scores is adjusted based on a moving average of anomaly scores. The moving average of anomaly scores is then adjusted based on the anomaly score. The adaptive anomaly threshold is then adjusted based on the moving average of anomaly scores and the moving standard deviation of anomaly scores.

Type: Application

Filed: December 15, 2020

Publication date: June 16, 2022

Inventors: Amin Suzani, Matteo Casserini, Milos Vasic, Saeid Allahdadian, Andrew Brownsword, Hamed Ahmadi, Felix Schmidt, Nipun Agarwal
COPING WITH FEATURE ERROR SUPPRESSION: A MECHANISM TO HANDLE THE CONCEPT DRIFT

Publication number: 20220188410

Abstract: Approaches herein relate to reconstructive models such as an autoencoder for anomaly detection. Herein are machine learning techniques that detect and suppress any feature that causes model decay by concept drift. In an embodiment in a production environment, a computer initializes an unsuppressed subset of features with a plurality of features that an already-trained reconstructive model can process. A respective reconstruction error of each feature of the unsuppressed subset of features is calculated. The computer detects that a respective moving average based on the reconstruction error of a particular feature of the unsuppressed subset of features exceeds a respective feature suppression threshold of the particular feature, which causes removal of the particular feature from the unsuppressed subset of features.

Type: Application

Filed: December 15, 2020

Publication date: June 16, 2022

Inventors: SAEID ALLAHDADIAN, ANDREW BROWNSWORD, MILOS VASIC, MATTEO CASSERINI, AMIN SUZANI, HAMED AHMADI, FELIX SCHMIDT, NIPUN AGARWAL

prev 1 2 3 4 5 next