Patents by Inventor Andrey Finkelshtein

Andrey Finkelshtein has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

OPTIMIZING CASCADE OF CLASSIFIERS SCHEMA USING GENETIC SEARCH

Publication number: 20230297848

Abstract: A system and a method for training and classification using an optimized classification schema using an ensemble of cascaded classifiers is disclosed. Each of the cascaded classifiers is characterized by a set of classifier parameters and the classifiers which are not the first in a cascade are associated with one or more thresholds used to determine when to execute them according to a confidence measure computed by a preceding cascaded classifier. The optimization comprises a genetic algorithm applied to a set of ensembles of classification and parameters and the set of scores, into a pool of ensembles and associated scores. The scores may be based on associated classification quality and cost.

Type: Application

Filed: March 21, 2022

Publication date: September 21, 2023

Inventors: Andrey Finkelshtein, Eitan Menahem, Yuval Margalit, Sarit Hollander
CLASSIFICATION OF MOUSE DYNAMICS DATA USING UNIFORM RESOURCE LOCATOR CATEGORY MAPPING

Publication number: 20230024397

Abstract: An example system includes a processor to receive mouse dynamics data of a session to be analyzed and a uniform resource locator (URL) category mapping. The processor can group the mouse dynamics data into a plurality of groups using the URL category mapping. The processor can separately extract features from each of the plurality of groups to generate a plurality of groups of features for the session. The processor can input the groups of features into a trained classification model. The processor can receive an output score from the trained classification model.

Type: Application

Filed: July 20, 2021

Publication date: January 26, 2023

Inventors: Anton PUZANOV, Andrey FINKELSHTEIN, Eitan MENAHEM
User flow graph analytics for cyber security

Patent number: 11563762

Abstract: A cyber security method including: obtaining user flow data associated with a browsing session at a website; constructing a directed graph representative of the browsing session; computing a set of features for the directed graph; and applying a machine learning classifier to the set of features, to classify the browsing session as legitimate or fraudulent.

Type: Grant

Filed: June 23, 2020

Date of Patent: January 24, 2023

Assignee: International Business Machines Corporation

Inventors: Yehonatan Bitton, Andrey Finkelshtein, Eitan Menahem
Clustering web page addresses for website analysis

Patent number: 11455364

Abstract: A machine learning clustering process is trained. Web pages of a website are clustered. User flow data associated with a first browsing session at the website is obtained. The user flow data includes a plurality of web page identifiers (e.g., URLs). A web page record for each of the web page identifiers is generated. Each web page record includes words of the corresponding web page identifier. Clusters of web page identifiers previously output from the trained machine learning clustering process are received. For each of the web page records, a cluster of web page identifiers is identified by mapping the web page record to one of the clusters of web page identifiers using the machine learning clustering process. A directed graph representative of the first browsing session is constructed. One or more nodes of the directed graph are the identified clusters of web page identifiers.

Type: Grant

Filed: June 23, 2020

Date of Patent: September 27, 2022

Assignee: International Business Machines Corporation

Inventors: Andrey Finkelshtein, Noga Agmon, Eitan Menahem, Yehonatan Bitton
Automated detection of personal information in free text

Patent number: 11429790

Abstract: Automated detection of personal information in free text, which includes: automatically applying a named-entity recognition (NER) algorithm to a digital text document, to detect named entities appearing in the digital text document, wherein the named entities are selected from the group consisting of: at least one person-type entity, and at least one non-person-type entity; automatically detecting at least one relation between the named entities, by applying a parts-of-speech (POS) tagging algorithm and a dependency parsing algorithm to sentences of the digital text document which contain the detected named entities; automatically estimating whether the at least one relation between the named entities is indicative of personal information; and automatically issuing a notification of a result of the estimation.

Type: Grant

Filed: September 25, 2019

Date of Patent: August 30, 2022

Assignee: International Business Machines Corporation

Inventors: Andrey Finkelshtein, Bar Haim, Eitan Menahem
MONTE-CARLO ADVERSARIAL AUTOENCODER FOR MULTI-SOURCE DOMAIN ADAPTATION

Publication number: 20220261657

Abstract: Embodiments may include novel techniques for training and using an adversarial autoencoder for multi-source domain functions. For example, a method may comprise training an adversarial encoder comprising an encoder and a decoder by simultaneously training the encoder and the decoder, using data comprising a plurality of datasets, the data having labels based on an origin class and a dataset number, training the encoder to act as a generator to generate codewords based on the data for a generative adversarial network including the generator and a discriminator by training the generator to cause the discriminator to predict random labels for a plurality of data samples of each class and training the generator using the predicted random labels to generate codewords that relate to the origin class, and classifying new data samples using the trained adversarial encoder and generator, and the discriminator.

Type: Application

Filed: February 17, 2021

Publication date: August 18, 2022

Inventors: Anton Puzanov, Eitan Menahem, ANDREY FINKELSHTEIN, NOGA AGMON
System and method for staged ensemble classification

Patent number: 11373063

Abstract: A method for training thresholds controlling data flow in a plurality of cascaded classifiers for classifying malicious software, comprising: in each of a plurality of iterations: computing a set of scores, each for one of a set of threshold sequences, each threshold sequence is a sequence of sets of classifier output thresholds, each set of classifier output thresholds used to control a flow of data from a first cascaded classifier of the plurality of cascaded classifiers to a second cascaded classifier of the plurality of cascaded classifiers, each score computed when classifying, using the respective threshold sequence, each of a plurality of software objects as one of a set of maliciousness classes; computing a set of new threshold sequences by applying a genetic algorithm to the set of threshold sequences and the set of scores; and using the set of new threshold sequences in a consecutive iteration.

Type: Grant

Filed: December 10, 2018

Date of Patent: June 28, 2022

Assignee: International Business Machines Corporation

Inventors: Andrey Finkelshtein, Oded Margalit, Eitan Menahem
MACHINE LEARNING MODEL TRAINED USING FEATURES EXTRACTED FROM N-GRAMS OF MOUSE EVENT DATA

Publication number: 20220172102

Abstract: An example system includes a processor to receive mouse event data of a session. The processor is to split the mouse event data of the session into mouse event n-grams. The processor is to extract features from the mouse event n-grams. The processor is to send the extracted features to a trained machine learning model. The processor is to receive an output decision from the trained machine learning model.

Type: Application

Filed: November 30, 2020

Publication date: June 2, 2022

Inventors: Andrey FINKELSHTEIN, Anton PUZANOV, Noga AGMON, Eitan MENAHEM
Identifying source datasets that fit a transfer learning process for a target domain

Patent number: 11308077

Abstract: A method for quantifying a similarity between a target dataset and multiple source datasets and identifying one or more source datasets that are most similar to the target dataset is provided. The method includes receiving, at a computing system, source datasets relating to a source domain and a target dataset relating to a target domain of interest. Each dataset is arranged in a tabular format including columns and rows, and the source datasets and the target dataset include a same feature space. The method also includes pre-processing, via a processor of the computing system, each source-target dataset pair to remove non-intersecting columns. The method further includes calculating at least two of a dataset similarity score, a row similarity score, and a column similarity score for each source-target dataset pair, and summarizing the calculated similarity scores to identify one or more source datasets that are most similar to the target dataset.

Type: Grant

Filed: July 21, 2020

Date of Patent: April 19, 2022

Assignee: International Business Machines Corporation

Inventors: Bar Haim, Andrey Finkelshtein, Eitan Menahem, Noga Agmon
Detecting replay attacks using action windows

Patent number: 11303672

Abstract: An example system includes a processor to receive a current session and previous sessions associated with an account. The processor can split the current session and the previous sessions into action windows. The processor can calculate a window similarity score for each action window of the current session using a pair-wise comparison with action windows of each of the previous sessions. The processor can aggregate the window similarity scores to generate a replay likelihood score for the current session with respect to each of the previous sessions. The processor can classify the current session as a replay attack in response to detecting that a replay likelihood score of the current session exceeds a threshold.

Type: Grant

Filed: April 2, 2020

Date of Patent: April 12, 2022

Assignee: International Business Machines Corporation

Inventors: Andrey Finkelshtein, Itay Hazan
IDENTIFYING SOURCE DATASETS THAT FIT A TRANSFER LEARNING PROCESS FOR A TARGET DOMAIN

Publication number: 20220027339

Abstract: A method for quantifying a similarity between a target dataset and multiple source datasets and identifying one or more source datasets that are most similar to the target dataset is provided. The method includes receiving, at a computing system, source datasets relating to a source domain and a target dataset relating to a target domain of interest. Each dataset is arranged in a tabular format including columns and rows, and the source datasets and the target dataset include a same feature space. The method also includes pre-processing, via a processor of the computing system, each source-target dataset pair to remove non-intersecting columns. The method further includes calculating at least two of a dataset similarity score, a row similarity score, and a column similarity score for each source-target dataset pair, and summarizing the calculated similarity scores to identify one or more source datasets that are most similar to the target dataset.

Type: Application

Filed: July 21, 2020

Publication date: January 27, 2022

Inventors: BAR HAIM, ANDREY FINKELSHTEIN, Eitan Menahem, NOGA AGMON
USER FLOW GRAPH ANALYTICS FOR CYBER SECURITY

Publication number: 20210400064

Abstract: A cyber security method including: obtaining user flow data associated with a browsing session at a website; constructing a directed graph representative of the browsing session; computing a set of features for the directed graph; and applying a machine learning classifier to the set of features, to classify the browsing session as legitimate or fraudulent.

Type: Application

Filed: June 23, 2020

Publication date: December 23, 2021

Inventors: YEHONATAN BITTON, ANDREY FINKELSHTEIN, Eitan Menahem
CLUSTERING WEB PAGE ADDRESSES FOR WEBSITE ANALYSIS

Publication number: 20210397669

Abstract: A machine learning clustering process is trained. Web pages of a website are clustered. User flow data associated with a first browsing session at the website is obtained. The user flow data includes a plurality of web page identifiers (e.g., URLs). A web page record for each of the web page identifiers is generated. Each web page record includes words of the corresponding web page identifier. Clusters of web page identifiers previously output from the trained machine learning clustering process are received. For each of the web page records, a cluster of web page identifiers is identified by mapping the web page record to one of the clusters of web page identifiers using the machine learning clustering process. A directed graph representative of the first browsing session is constructed. One or more nodes of the directed graph are the identified clusters of web page identifiers.

Type: Application

Filed: June 23, 2020

Publication date: December 23, 2021

Inventors: ANDREY FINKELSHTEIN, NOGA AGMON, Eitan Menahem, Yehonatan Bitton
DETECTING REPLAY ATTACKS USING ACTION WINDOWS

Publication number: 20210314350

Abstract: An example system includes a processor to receive a current session and previous sessions associated with an account. The processor can split the current session and the previous sessions into action windows. The processor can calculate a window similarity score for each action window of the current session using a pair-wise comparison with action windows of each of the previous sessions. The processor can aggregate the window similarity scores to generate a replay likelihood score for the current session with respect to each of the previous sessions. The processor can classify the current session as a replay attack in response to detecting that a replay likelihood score of the current session exceeds a threshold.

Type: Application

Filed: April 2, 2020

Publication date: October 7, 2021

Inventors: Andrey Finkelshtein, Itay Hazan
AUTOMATED DETECTION OF PERSONAL INFORMATION IN FREE TEXT

Publication number: 20210089620

Abstract: Automated detection of personal information in free text, which includes: automatically applying a named-entity recognition (NER) algorithm to a digital text document, to detect named entities appearing in the digital text document, wherein the named entities are selected from the group consisting of: at least one person-type entity, and at least one non-person-type entity; automatically detecting at least one relation between the named entities, by applying a parts-of-speech (POS) tagging algorithm and a dependency parsing algorithm to sentences of the digital text document which contain the detected named entities; automatically estimating whether the at least one relation between the named entities is indicative of personal information; and automatically issuing a notification of a result of the estimation.

Type: Application

Filed: September 25, 2019

Publication date: March 25, 2021

Inventors: ANDREY FINKELSHTEIN, BAR HAIM, Eitan Menahem
Detecting malicious executable files by performing static analysis on executable files' overlay

Patent number: 10846403

Abstract: Embodiments of the present systems and methods may decide if a software file is malicious or benign, using properties of the file's overlay, if existing. For example, in an embodiment, a computer-implemented method for identifying malware in computer systems may comprise receiving a plurality of executable files labeled as being malicious or benign, training a machine learning model using properties extracted from overlays associated with each of the plurality of received labeled executable files, receiving an executable file that is not labeled, determining whether the received unlabeled executable file is malicious or benign using the trained machine learning model based on properties extracted from an overlay associated with the received unlabeled executable file, and transmitting information identifying the received unlabeled executable file as malicious when the received unlabeled executable file is determined to be malicious.

Type: Grant

Filed: May 15, 2018

Date of Patent: November 24, 2020

Assignee: International Business Machines Corporation

Inventors: Andrey Finkelshtein, Eitan Menahem
SYSTEM AND METHOD FOR STAGED ENSEMBLE CLASSIFICATION

Publication number: 20200184254

Abstract: A method for training thresholds controlling data flow in a plurality of cascaded classifiers for classifying malicious software, comprising: in each of a plurality of iterations: computing a set of scores, each for one of a set of threshold sequences, each threshold sequence is a sequence of sets of classifier output thresholds, each set of classifier output thresholds used to control a flow of data from a first cascaded classifier of the plurality of cascaded classifiers to a second cascaded classifier of the plurality of cascaded classifiers, each score computed when classifying, using the respective threshold sequence, each of a plurality of software objects as one of a set of maliciousness classes; computing a set of new threshold sequences by applying a genetic algorithm to the set of threshold sequences and the set of scores; and using the set of new threshold sequences in a consecutive iteration.

Type: Application

Filed: December 10, 2018

Publication date: June 11, 2020

Inventors: ANDREY FINKELSHTEIN, ODED MARGALIT, EITAN MENAHEM
DETECTING MALICIOUS EXECUTABLE FILES BY PERFORMING STATIC ANALYSIS ON EXECUTABLE FILES' OVERLAY

Publication number: 20190354682

Abstract: Embodiments of the present systems and methods may decide if a software file is malicious or benign, using properties of the file's overlay, if existing. For example, in an embodiment, a computer-implemented method for identifying malware in computer systems may comprise receiving a plurality of executable files labeled as being malicious or benign, training a machine learning model using properties extracted from overlays associated with each of the plurality of received labeled executable files, receiving an executable file that is not labeled, determining whether the received unlabeled executable file is malicious or benign using the trained machine learning model based on properties extracted from an overlay associated with the received unlabeled executable file, and transmitting information identifying the received unlabeled executable file as malicious when the received unlabeled executable file is determined to be malicious.

Type: Application

Filed: May 15, 2018

Publication date: November 21, 2019

Inventors: Andrey Finkelshtein, Eitan Menahem