Patents by Inventor Tomas Komarek

Tomas Komarek has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MULTIPLE INSTANCE LEARNING MODELS FOR CYBERSECURITY USING JAVASCRIPT OBJECT NOTATION (JSON) TRAINING DATA

Publication number: 20230376836

Abstract: Techniques and architecture are described for converting tree structured data such as, for example, JavaScript Object Notation (JSON) data, into multiple feature vectors to train multiple instance learning (MIL) models for providing cybersecurity in networks. In particular, a data set is provided, wherein the data set comprises a sample configured as a hierarchal tree. The sample is converted into a set of path and value pairs, e.g., flattened into a set of path and value pairs, where the path is a sequence of field names and array indices encoding a position of a value. Each path and value pair of the set of path and value pairs is converted into a respective feature vector to form a set of feature vectors. The set of feature vectors is used to train a multiple instance learning (MIL) model, wherein each feature vector has a same, fixed length.

Type: Application

Filed: May 20, 2022

Publication date: November 23, 2023

Inventors: Tomas Komarek, Stepan Dvorak, Jan Brabec
Malware detection using inverse imbalance subspace searching

Patent number: 11799904

Abstract: Inverse imbalance subspace searching techniques are used to detect potential malware among samples of network communication data. A large number of samples of network communication data, such as proxy log data and/or network flows, are received and analyzed by a malware detection system. A number of the samples are associated with known malware, while other unlabeled samples are either benign or may be associated with unknown malware. An inverse imbalance subspace search may be performed, in which the sample sets are divided into subsets based on random feature thresholds, and each subset is evaluated based on the ratio of known malware samples to unlabeled samples. Unlabeled samples within subsets having high malware sample ratios may be identified, aggregated, and processed as potential malware.

Type: Grant

Filed: December 10, 2020

Date of Patent: October 24, 2023

Assignee: Cisco Technology, Inc.

Inventors: Tomas Komarek, Jan Brabec, Cenek Skarda
Instant network threat detection system

Patent number: 11374944

Abstract: In one embodiment, a network security service forms, for each of a plurality of malware classes, a feature vector descriptor for the malware class. The service uses the feature vector descriptors for the malware classes and a symmetric mapping function to generate a training dataset having both positively and negatively labeled feature vectors. The service trains, using the training dataset, an instant threat detector to determine whether telemetry data for a particular traffic flow is within a threshold of similarity to a feature vector descriptor for a new malware class that was not part of the plurality of malware classes.

Type: Grant

Filed: December 19, 2018

Date of Patent: June 28, 2022

Assignee: Cisco Technology, Inc.

Inventors: Tomas Komarek, Petr Somol
MALWARE DETECTION USING INVERSE IMBALANCE SUBSPACE SEARCHING

Publication number: 20220191244

Abstract: Inverse imbalance subspace searching techniques are used to detect potential malware among samples of network communication data. A large number of samples of network communication data, such as proxy log data and/or network flows, are received and analyzed by a malware detection system. A number of the samples are associated with known malware, while other unlabeled samples are either benign or may be associated with unknown malware. An inverse imbalance subspace search may be performed, in which the sample sets are divided into subsets based on random feature thresholds, and each subset is evaluated based on the ratio of known malware samples to unlabeled samples. Unlabeled samples within subsets having high malware sample ratios may be identified, aggregated, and processed as potential malware.

Type: Application

Filed: December 10, 2020

Publication date: June 16, 2022

Inventors: Tomas Komarek, Jan Brabec, Cenek Skarda
Training a network traffic classifier using training data enriched with contextual bag information

Patent number: 11271833

Abstract: In one embodiment, a device groups feature vectors representing network traffic flows into bags. The device forms a bag representation of a particular one of the bags by aggregating the feature vectors in the particular bag. The device extends one or more feature vectors in the particular bag with the bag representation. The extended one or more feature vectors are positive examples of a classification label for the network traffic. The device trains a network traffic classifier using training data that comprises the one or more feature vectors extended with the bag representation.

Type: Grant

Filed: October 23, 2017

Date of Patent: March 8, 2022

Assignee: Cisco Technology, Inc.

Inventors: Tomas Komarek, Martin Vejman, Petr Somol
Multiple pairwise feature histograms for representing network traffic

Patent number: 10867036

Abstract: In one embodiment, a device divides groups of tuples of traffic characteristics of encrypted network traffic into different pairs of the characteristics. Each of the pairs has a corresponding two dimensional (2-D) feature subspace. The device discretizes the 2-D feature subspaces, to form a plurality of bins in each feature subspace. The device assigns the pairs of the traffic characteristics in a particular group of tuples to the bins in the discretized 2-D feature subspaces. The device forms, for each group of tuples, a vector representation of the group of tuples based on the bins in the discretized 2-D feature subspaces to which the pairs of the traffic characteristics from the group are assigned. The vector representations of the groups of tuples are of a fixed dimension. The device uses the vector representations of the groups of tuples to train a machine learning-based traffic classifier.

Type: Grant

Filed: October 12, 2017

Date of Patent: December 15, 2020

Assignee: Cisco Technology, Inc.

Inventors: Tomas Komarek, Petr Somol
INSTANT NETWORK THREAT DETECTION SYSTEM

Publication number: 20200204569

Abstract: In one embodiment, a network security service forms, for each of a plurality of malware classes, a feature vector descriptor for the malware class. The service uses the feature vector descriptors for the malware classes and a symmetric mapping function to generate a training dataset having both positively and negatively labeled feature vectors. The service trains, using the training dataset, an instant threat detector to determine whether telemetry data for a particular traffic flow is within a threshold of similarity to a feature vector descriptor for a new malware class that was not part of the plurality of malware classes.

Type: Application

Filed: December 19, 2018

Publication date: June 25, 2020

Inventors: Tomas Komarek, Petr Somol
TRAINING A NETWORK TRAFFIC CLASSIFIER USING TRAINING DATA ENRICHED WITH CONTEXTUAL BAG INFORMATION

Publication number: 20190123982

Abstract: In one embodiment, a device groups feature vectors representing network traffic flows into bags. The device forms a bag representation of a particular one of the bags by aggregating the feature vectors in the particular bag. The device extends one or more feature vectors in the particular bag with the bag representation. The extended one or more feature vectors are positive examples of a classification label for the network traffic. The device trains a network traffic classifier using training data that comprises the one or more feature vectors extended with the bag representation.

Type: Application

Filed: October 23, 2017

Publication date: April 25, 2019

Inventors: Tomas Komarek, Martin Vejman, Petr Somol
MULTIPLE PAIRWISE FEATURE HISTOGRAMS FOR REPRESENTING NETWORK TRAFFIC

Publication number: 20190114416

Abstract: In one embodiment, a device divides groups of tuples of traffic characteristics of encrypted network traffic into different pairs of the characteristics. Each of the pairs has a corresponding two dimensional (2-D) feature subspace. The device discretizes the 2-D feature subspaces, to form a plurality of bins in each feature subspace. The device assigns the pairs of the traffic characteristics in a particular group of tuples to the bins in the discretized 2-D feature subspaces. The device forms, for each group of tuples, a vector representation of the group of tuples based on the bins in the discretized 2-D feature subspaces to which the pairs of the traffic characteristics from the group are assigned. The vector representations of the groups of tuples are of a fixed dimension. The device uses the vector representations of the groups of tuples to train a machine learning-based traffic classifier.

Type: Application

Filed: October 12, 2017

Publication date: April 18, 2019

Inventors: Tomas Komarek, Petr Somol
DETECTING NETWORK ADDRESS TRANSLATION DEVICES IN A NETWORK BASED ON NETWORK TRAFFIC LOGS

Publication number: 20170230395

Abstract: Actual traffic logs of network traffic to and from host devices in a network are collected over time. Artificial traffic logs for each of multiple artificial network address translation (NAT) devices are generated from the actual traffic logs. The actual traffic logs and the artificial traffic logs are labeled as being indicative of non-NAT devices and NAT devices, respectively, to produce labeled traffic logs. From the labeled traffic logs for each artificial NAT device and each non-NAT device, respective, correspondingly labeled, network traffic features indicative of whether the device behaves like a NAT device or a non-NAT device are extracted. A classifier device is trained using the network traffic features extracted for each artificial NAT device and each non-NAT device to classify between an actual NAT device and an actual non-NAT device based on further actual traffic logs.

Type: Application

Filed: April 25, 2017

Publication date: August 10, 2017

Inventors: Tomás Komárek, Martin Grill, Tomás Pevny
Detecting Network Address Translation Devices In A Network Based On Network Traffic Logs

Publication number: 20160315952

Abstract: Network traffic logs of network traffic to and from host devices connected to a network that were collected over time are accessed. For each host device identified in the logs, a set of network traffic features indicative of whether the host device behaves like a Network Address Translation (NAT) device or an end host device is extracted from the logs for the host device. Each feature has values that vary over time based on the logs. A trained host device behavior classifier classifies the host device as either a NAT device or an end host device based on one or more of the feature values.

Type: Application

Filed: April 27, 2015

Publication date: October 27, 2016

Inventors: Tomás Komárek, Martin Grill, Tomás Pevný