Patents by Inventor Andrey Finkelshtein

Andrey Finkelshtein has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230297848
    Abstract: A system and a method for training and classification using an optimized classification schema using an ensemble of cascaded classifiers is disclosed. Each of the cascaded classifiers is characterized by a set of classifier parameters and the classifiers which are not the first in a cascade are associated with one or more thresholds used to determine when to execute them according to a confidence measure computed by a preceding cascaded classifier. The optimization comprises a genetic algorithm applied to a set of ensembles of classification and parameters and the set of scores, into a pool of ensembles and associated scores. The scores may be based on associated classification quality and cost.
    Type: Application
    Filed: March 21, 2022
    Publication date: September 21, 2023
    Inventors: Andrey Finkelshtein, Eitan Menahem, Yuval Margalit, Sarit Hollander
  • Publication number: 20230024397
    Abstract: An example system includes a processor to receive mouse dynamics data of a session to be analyzed and a uniform resource locator (URL) category mapping. The processor can group the mouse dynamics data into a plurality of groups using the URL category mapping. The processor can separately extract features from each of the plurality of groups to generate a plurality of groups of features for the session. The processor can input the groups of features into a trained classification model. The processor can receive an output score from the trained classification model.
    Type: Application
    Filed: July 20, 2021
    Publication date: January 26, 2023
    Inventors: Anton PUZANOV, Andrey FINKELSHTEIN, Eitan MENAHEM
  • Patent number: 11563762
    Abstract: A cyber security method including: obtaining user flow data associated with a browsing session at a website; constructing a directed graph representative of the browsing session; computing a set of features for the directed graph; and applying a machine learning classifier to the set of features, to classify the browsing session as legitimate or fraudulent.
    Type: Grant
    Filed: June 23, 2020
    Date of Patent: January 24, 2023
    Assignee: International Business Machines Corporation
    Inventors: Yehonatan Bitton, Andrey Finkelshtein, Eitan Menahem
  • Patent number: 11455364
    Abstract: A machine learning clustering process is trained. Web pages of a website are clustered. User flow data associated with a first browsing session at the website is obtained. The user flow data includes a plurality of web page identifiers (e.g., URLs). A web page record for each of the web page identifiers is generated. Each web page record includes words of the corresponding web page identifier. Clusters of web page identifiers previously output from the trained machine learning clustering process are received. For each of the web page records, a cluster of web page identifiers is identified by mapping the web page record to one of the clusters of web page identifiers using the machine learning clustering process. A directed graph representative of the first browsing session is constructed. One or more nodes of the directed graph are the identified clusters of web page identifiers.
    Type: Grant
    Filed: June 23, 2020
    Date of Patent: September 27, 2022
    Assignee: International Business Machines Corporation
    Inventors: Andrey Finkelshtein, Noga Agmon, Eitan Menahem, Yehonatan Bitton
  • Patent number: 11429790
    Abstract: Automated detection of personal information in free text, which includes: automatically applying a named-entity recognition (NER) algorithm to a digital text document, to detect named entities appearing in the digital text document, wherein the named entities are selected from the group consisting of: at least one person-type entity, and at least one non-person-type entity; automatically detecting at least one relation between the named entities, by applying a parts-of-speech (POS) tagging algorithm and a dependency parsing algorithm to sentences of the digital text document which contain the detected named entities; automatically estimating whether the at least one relation between the named entities is indicative of personal information; and automatically issuing a notification of a result of the estimation.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: August 30, 2022
    Assignee: International Business Machines Corporation
    Inventors: Andrey Finkelshtein, Bar Haim, Eitan Menahem
  • Publication number: 20220261657
    Abstract: Embodiments may include novel techniques for training and using an adversarial autoencoder for multi-source domain functions. For example, a method may comprise training an adversarial encoder comprising an encoder and a decoder by simultaneously training the encoder and the decoder, using data comprising a plurality of datasets, the data having labels based on an origin class and a dataset number, training the encoder to act as a generator to generate codewords based on the data for a generative adversarial network including the generator and a discriminator by training the generator to cause the discriminator to predict random labels for a plurality of data samples of each class and training the generator using the predicted random labels to generate codewords that relate to the origin class, and classifying new data samples using the trained adversarial encoder and generator, and the discriminator.
    Type: Application
    Filed: February 17, 2021
    Publication date: August 18, 2022
    Inventors: Anton Puzanov, Eitan Menahem, ANDREY FINKELSHTEIN, NOGA AGMON
  • Patent number: 11373063
    Abstract: A method for training thresholds controlling data flow in a plurality of cascaded classifiers for classifying malicious software, comprising: in each of a plurality of iterations: computing a set of scores, each for one of a set of threshold sequences, each threshold sequence is a sequence of sets of classifier output thresholds, each set of classifier output thresholds used to control a flow of data from a first cascaded classifier of the plurality of cascaded classifiers to a second cascaded classifier of the plurality of cascaded classifiers, each score computed when classifying, using the respective threshold sequence, each of a plurality of software objects as one of a set of maliciousness classes; computing a set of new threshold sequences by applying a genetic algorithm to the set of threshold sequences and the set of scores; and using the set of new threshold sequences in a consecutive iteration.
    Type: Grant
    Filed: December 10, 2018
    Date of Patent: June 28, 2022
    Assignee: International Business Machines Corporation
    Inventors: Andrey Finkelshtein, Oded Margalit, Eitan Menahem
  • Publication number: 20220172102
    Abstract: An example system includes a processor to receive mouse event data of a session. The processor is to split the mouse event data of the session into mouse event n-grams. The processor is to extract features from the mouse event n-grams. The processor is to send the extracted features to a trained machine learning model. The processor is to receive an output decision from the trained machine learning model.
    Type: Application
    Filed: November 30, 2020
    Publication date: June 2, 2022
    Inventors: Andrey FINKELSHTEIN, Anton PUZANOV, Noga AGMON, Eitan MENAHEM
  • Patent number: 11308077
    Abstract: A method for quantifying a similarity between a target dataset and multiple source datasets and identifying one or more source datasets that are most similar to the target dataset is provided. The method includes receiving, at a computing system, source datasets relating to a source domain and a target dataset relating to a target domain of interest. Each dataset is arranged in a tabular format including columns and rows, and the source datasets and the target dataset include a same feature space. The method also includes pre-processing, via a processor of the computing system, each source-target dataset pair to remove non-intersecting columns. The method further includes calculating at least two of a dataset similarity score, a row similarity score, and a column similarity score for each source-target dataset pair, and summarizing the calculated similarity scores to identify one or more source datasets that are most similar to the target dataset.
    Type: Grant
    Filed: July 21, 2020
    Date of Patent: April 19, 2022
    Assignee: International Business Machines Corporation
    Inventors: Bar Haim, Andrey Finkelshtein, Eitan Menahem, Noga Agmon
  • Patent number: 11303672
    Abstract: An example system includes a processor to receive a current session and previous sessions associated with an account. The processor can split the current session and the previous sessions into action windows. The processor can calculate a window similarity score for each action window of the current session using a pair-wise comparison with action windows of each of the previous sessions. The processor can aggregate the window similarity scores to generate a replay likelihood score for the current session with respect to each of the previous sessions. The processor can classify the current session as a replay attack in response to detecting that a replay likelihood score of the current session exceeds a threshold.
    Type: Grant
    Filed: April 2, 2020
    Date of Patent: April 12, 2022
    Assignee: International Business Machines Corporation
    Inventors: Andrey Finkelshtein, Itay Hazan
  • Publication number: 20220027339
    Abstract: A method for quantifying a similarity between a target dataset and multiple source datasets and identifying one or more source datasets that are most similar to the target dataset is provided. The method includes receiving, at a computing system, source datasets relating to a source domain and a target dataset relating to a target domain of interest. Each dataset is arranged in a tabular format including columns and rows, and the source datasets and the target dataset include a same feature space. The method also includes pre-processing, via a processor of the computing system, each source-target dataset pair to remove non-intersecting columns. The method further includes calculating at least two of a dataset similarity score, a row similarity score, and a column similarity score for each source-target dataset pair, and summarizing the calculated similarity scores to identify one or more source datasets that are most similar to the target dataset.
    Type: Application
    Filed: July 21, 2020
    Publication date: January 27, 2022
    Inventors: BAR HAIM, ANDREY FINKELSHTEIN, Eitan Menahem, NOGA AGMON
  • Publication number: 20210400064
    Abstract: A cyber security method including: obtaining user flow data associated with a browsing session at a website; constructing a directed graph representative of the browsing session; computing a set of features for the directed graph; and applying a machine learning classifier to the set of features, to classify the browsing session as legitimate or fraudulent.
    Type: Application
    Filed: June 23, 2020
    Publication date: December 23, 2021
    Inventors: YEHONATAN BITTON, ANDREY FINKELSHTEIN, Eitan Menahem
  • Publication number: 20210397669
    Abstract: A machine learning clustering process is trained. Web pages of a website are clustered. User flow data associated with a first browsing session at the website is obtained. The user flow data includes a plurality of web page identifiers (e.g., URLs). A web page record for each of the web page identifiers is generated. Each web page record includes words of the corresponding web page identifier. Clusters of web page identifiers previously output from the trained machine learning clustering process are received. For each of the web page records, a cluster of web page identifiers is identified by mapping the web page record to one of the clusters of web page identifiers using the machine learning clustering process. A directed graph representative of the first browsing session is constructed. One or more nodes of the directed graph are the identified clusters of web page identifiers.
    Type: Application
    Filed: June 23, 2020
    Publication date: December 23, 2021
    Inventors: ANDREY FINKELSHTEIN, NOGA AGMON, Eitan Menahem, Yehonatan Bitton
  • Publication number: 20210314350
    Abstract: An example system includes a processor to receive a current session and previous sessions associated with an account. The processor can split the current session and the previous sessions into action windows. The processor can calculate a window similarity score for each action window of the current session using a pair-wise comparison with action windows of each of the previous sessions. The processor can aggregate the window similarity scores to generate a replay likelihood score for the current session with respect to each of the previous sessions. The processor can classify the current session as a replay attack in response to detecting that a replay likelihood score of the current session exceeds a threshold.
    Type: Application
    Filed: April 2, 2020
    Publication date: October 7, 2021
    Inventors: Andrey Finkelshtein, Itay Hazan
  • Publication number: 20210089620
    Abstract: Automated detection of personal information in free text, which includes: automatically applying a named-entity recognition (NER) algorithm to a digital text document, to detect named entities appearing in the digital text document, wherein the named entities are selected from the group consisting of: at least one person-type entity, and at least one non-person-type entity; automatically detecting at least one relation between the named entities, by applying a parts-of-speech (POS) tagging algorithm and a dependency parsing algorithm to sentences of the digital text document which contain the detected named entities; automatically estimating whether the at least one relation between the named entities is indicative of personal information; and automatically issuing a notification of a result of the estimation.
    Type: Application
    Filed: September 25, 2019
    Publication date: March 25, 2021
    Inventors: ANDREY FINKELSHTEIN, BAR HAIM, Eitan Menahem
  • Patent number: 10846403
    Abstract: Embodiments of the present systems and methods may decide if a software file is malicious or benign, using properties of the file's overlay, if existing. For example, in an embodiment, a computer-implemented method for identifying malware in computer systems may comprise receiving a plurality of executable files labeled as being malicious or benign, training a machine learning model using properties extracted from overlays associated with each of the plurality of received labeled executable files, receiving an executable file that is not labeled, determining whether the received unlabeled executable file is malicious or benign using the trained machine learning model based on properties extracted from an overlay associated with the received unlabeled executable file, and transmitting information identifying the received unlabeled executable file as malicious when the received unlabeled executable file is determined to be malicious.
    Type: Grant
    Filed: May 15, 2018
    Date of Patent: November 24, 2020
    Assignee: International Business Machines Corporation
    Inventors: Andrey Finkelshtein, Eitan Menahem
  • Publication number: 20200184254
    Abstract: A method for training thresholds controlling data flow in a plurality of cascaded classifiers for classifying malicious software, comprising: in each of a plurality of iterations: computing a set of scores, each for one of a set of threshold sequences, each threshold sequence is a sequence of sets of classifier output thresholds, each set of classifier output thresholds used to control a flow of data from a first cascaded classifier of the plurality of cascaded classifiers to a second cascaded classifier of the plurality of cascaded classifiers, each score computed when classifying, using the respective threshold sequence, each of a plurality of software objects as one of a set of maliciousness classes; computing a set of new threshold sequences by applying a genetic algorithm to the set of threshold sequences and the set of scores; and using the set of new threshold sequences in a consecutive iteration.
    Type: Application
    Filed: December 10, 2018
    Publication date: June 11, 2020
    Inventors: ANDREY FINKELSHTEIN, ODED MARGALIT, EITAN MENAHEM
  • Publication number: 20190354682
    Abstract: Embodiments of the present systems and methods may decide if a software file is malicious or benign, using properties of the file's overlay, if existing. For example, in an embodiment, a computer-implemented method for identifying malware in computer systems may comprise receiving a plurality of executable files labeled as being malicious or benign, training a machine learning model using properties extracted from overlays associated with each of the plurality of received labeled executable files, receiving an executable file that is not labeled, determining whether the received unlabeled executable file is malicious or benign using the trained machine learning model based on properties extracted from an overlay associated with the received unlabeled executable file, and transmitting information identifying the received unlabeled executable file as malicious when the received unlabeled executable file is determined to be malicious.
    Type: Application
    Filed: May 15, 2018
    Publication date: November 21, 2019
    Inventors: Andrey Finkelshtein, Eitan Menahem