Patents by Inventor NIKOLA MILOJKOVIC

NIKOLA MILOJKOVIC has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

TRACE REPRESENTATION LEARNING

Publication number: 20230376743

Abstract: The present invention avoids overfitting in deep neural network (DNN) training by using multitask learning (MTL) and self-supervised learning (SSL) techniques when training a multi-branch DNN to encode a sequence. In an embodiment, a computer first trains the DNN to perform a first task. The DNN contains: a first encoder in a first branch, a second encoder in a second branch, and an interpreter layer that combines data from the first branch and the second branch. The DNN second trains to perform a second task. After the first and second trainings, production encoding and inferencing occur. The first encoder encodes a sparse feature vector into a dense feature vector from which an inference is inferred. In an embodiment, a sequence of log messages is encoded into an encoded trace. An anomaly detector infers whether the sequence is anomalous. In an embodiment, the log messages are database commands.

Type: Application

Filed: May 19, 2022

Publication date: November 23, 2023

Inventors: Marija Nikolic, Nikola Milojkovic, Arno Schneuwly, Matteo Casserini, Milos Vasic, Renata Khasanova, Felix Schmidt
ANOMALY SCORE NORMALISATION BASED ON EXTREME VALUE THEORY

Publication number: 20230368054

Abstract: The present invention relates to threshold estimation and calibration for anomaly detection. Herein are machine learning (ML) and extreme value theory (EVT) techniques for normalizing and thresholding anomaly scores without presuming a values distribution. In an embodiment, a computer receives many unnormalized anomaly scores and, according to peak over threshold (POT), selects a highest subset of the unnormalized anomaly scores that exceed a tail threshold. Based on the highest subset of the unnormalized anomaly scores, parameters of a probability density function are trained according to EVT. After training and in a production environment, a normalized anomaly score is generated based on an unnormalized anomaly score and the trained parameters of the probability density function. Anomaly detection compares the normalized anomaly score to an optimized anomaly threshold.

Type: Application

Filed: May 16, 2022

Publication date: November 16, 2023

Inventors: Marija Nikolic, Matteo Casserini, Arno Schneuwly, Nikola Milojkovic, Milos Vasic, Renata Khasanova, Felix Schmidt
SUPER-FEATURES FOR EXPLAINABILITY WITH PERTURBATION-BASED APPROACHES

Publication number: 20230334343

Abstract: In an embodiment, a computer hosts a machine learning (ML) model that infers a particular inference for a particular tuple that is based on many features. The features are grouped into predefined super-features that each contain a disjoint (i.e. nonintersecting, mutually exclusive) subset of features. For each super-feature, the computer: a) randomly selects many permuted values from original values of the super-feature in original tuples, b) generates permuted tuples that are based on the particular tuple and a respective permuted value, and c) causes the ML model to infer a respective permuted inference for each permuted tuple. A surrogate model is trained based on the permuted inferences. For each super-feature, a respective importance of the super-feature is calculated based on the surrogate model. Super-feature importances may be used to rank super-features by influence and/or generate a local ML explainability (MLX) explanation.

Type: Application

Filed: April 13, 2022

Publication date: October 19, 2023

Inventors: Renata Khasanova, Nikola Milojkovic, Matteo Casserini, Felix Schmidt
Extraction from trees at scale

Patent number: 11620118

Abstract: Herein are machine learning (ML) feature processing and analytic techniques to detect anomalies in parse trees of logic statements, database queries, logic scripts, compilation units of general-purpose programing language, extensible markup language (XML), JavaScript object notation (JSON), and document object models (DOM). In an embodiment, a computer identifies an operational trace that contains multiple parse trees. Values of explicit features are generated from a single respective parse tree of the multiple parse trees of the operational trace. Values of implicit features are generated from more than one respective parse tree of the multiple parse trees of the operational trace. The explicit and implicit features are stored into a same feature vector. With the feature vector as input, an ML model detects whether or not the operational trace is anomalous, based on the explicit features of each parse tree of the operational trace and the implicit features of multiple parse trees of the operational trace.

Type: Grant

Filed: February 12, 2021

Date of Patent: April 4, 2023

Assignee: Oracle International Corporation

Inventors: Arno Schneuwly, Nikola Milojkovic, Felix Schmidt, Nipun Agarwal
Kernel subsampling for an accelerated tree similarity computation

Patent number: 11449517

Abstract: Approaches herein relate to machine learning for detection of anomalous logic syntax. Herein is acceleration for comparison of parse trees such as suspicious database queries. In an embodiment, a computer identifies subtrees in each of many trees. A respective subset of participating subtrees is selected in each tree. A respective root node of each participating subtree should directly have a child node that is a leaf and/or should have a degree that exceeds a branching threshold such as one. For each pairing of a respective first tree with a respective second tree, based on a count of subtree matches between the participating subset of subtrees in the first tree and the participating subset of subtrees in the second tree, a respective tree similarity score is calculated. A machine learning model inferences based on the tree similarity scores of the many trees. In an embodiment, each tree similarity score is a convolution kernel.

Type: Grant

Filed: December 22, 2020

Date of Patent: September 20, 2022

Assignee: Oracle International Corporation

Inventors: Arno Schneuwly, Nikola Milojkovic, Felix Schmidt, Nipun Agarwal
EXTRACTION FROM TREES AT SCALE

Publication number: 20220261228

Abstract: Herein are machine learning (ML) feature processing and analytic techniques to detect anomalies in parse trees of logic statements, database queries, logic scripts, compilation units of general-purpose programing language, extensible markup language (XML), JavaScript object notation (JSON), and document object models (DOM). In an embodiment, a computer identifies an operational trace that contains multiple parse trees. Values of explicit features are generated from a single respective parse tree of the multiple parse trees of the operational trace. Values of implicit features are generated from more than one respective parse tree of the multiple parse trees of the operational trace. The explicit and implicit features are stored into a same feature vector. With the feature vector as input, an ML model detects whether or not the operational trace is anomalous, based on the explicit features of each parse tree of the operational trace and the implicit features of multiple parse trees of the operational trace.

Type: Application

Filed: February 12, 2021

Publication date: August 18, 2022

Inventors: Arno Schneuwly, Nikola Milojkovic, Felix Schmidt, Nipun Agarwal
KERNEL SUBSAMPLING FOR AN ACCELERATED TREE SIMILARITY COMPUTATION

Publication number: 20220197917

Abstract: Approaches herein relate to machine learning for detection of anomalous logic syntax. Herein is acceleration for comparison of parse trees such as suspicious database queries. In an embodiment, a computer identifies subtrees in each of many trees. A respective subset of participating subtrees is selected in each tree. A respective root node of each participating subtree should directly have a child node that is a leaf and/or should have a degree that exceeds a branching threshold such as one. For each pairing of a respective first tree with a respective second tree, based on a count of subtree matches between the participating subset of subtrees in the first tree and the participating subset of subtrees in the second tree, a respective tree similarity score is calculated. A machine learning model inferences based on the tree similarity scores of the many trees. In an embodiment, each tree similarity score is a convolution kernel.

Type: Application

Filed: December 22, 2020

Publication date: June 23, 2022

Inventors: ARNO SCHNEUWLY, NIKOLA MILOJKOVIC, FELIX SCHMIDT, NIPUN AGARWAL
GENERALIZED PRODUCTION RULES - N-GRAM FEATURE EXTRACTION FROM ABSTRACT SYNTAX TREES (AST) FOR CODE VECTORIZATION

Publication number: 20220198294

Abstract: Herein is resource-constrained feature enrichment for analysis of parse trees such as suspicious database queries. In an embodiment, a computer receives a parse tree that contains many tree nodes. Each tree node is associated with a respective production rule that was used to generate the tree node. Extracted from the parse tree are many sequences of production rules having respective sequence lengths that satisfy a length constraint that accepts at least one fixed length that is greater than two. Each extracted sequence of production rules consists of respective production rules of a sequence of tree nodes in a respective directed tree path of the parse tree having a path length that satisfies that same length constraint. Based on the extracted sequences of production rules, a machine learning model generates an inference. In a bag of rules data structure, the extracted sequences of production rules are aggregated by distinct sequence and duplicates are counted.

Type: Application

Filed: December 23, 2020

Publication date: June 23, 2022

Inventors: ARNO SCHNEUWLY, NIKOLA MILOJKOVIC, FELIX SCHMIDT, NIPUN AGARWAL