Patents by Inventor CRAIG SCHELP

CRAIG SCHELP has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MACHINE LEARNING-BASED DNS REQUEST STRING REPRESENTATION WITH HASH REPLACEMENT

Publication number: 20230421528

Abstract: Techniques are described herein for using machine learning to learn vector representations of DNS requests such that the resulting embeddings represent the semantics of the DNS requests as a whole. Techniques described herein perform pre-processing of tokenized DNS request strings in which hashes, which are long and relatively random strings of characters, are detected in DNS request strings and each detected hash token is replaced with a placeholder token. A vectorizing ML model is trained using the pre-processed training dataset in which hash tokens have been replaced. Embeddings for the DNS tokens are derived from an intermediate layer of the vectorizing ML model. The encoding application creates final vector representations for each DNS request string by generating a weighted summation of the embeddings of all of the tokens in the DNS request string. Because of hash replacement, the resulting DNS request embeddings reflect semantics of the hashes as a group.

Type: Application

Filed: August 24, 2023

Publication date: December 28, 2023

Inventors: Renata Khasanova, Felix Schmidt, Stuart Wray, Craig Schelp, Nipun Agarwal, Matteo Casserini
Machine learning-based DNS request string representation with hash replacement

Patent number: 11784964

Abstract: Techniques are described herein for using machine learning to learn vector representations of DNS requests such that the resulting embeddings represent the semantics of the DNS requests as a whole. Techniques described herein perform pre-processing of tokenized DNS request strings in which hashes, which are long and relatively random strings of characters, are detected in DNS request strings and each detected hash token is replaced with a placeholder token. A vectorizing ML model is trained using the pre-processed training dataset in which hash tokens have been replaced. Embeddings for the DNS tokens are derived from an intermediate layer of the vectorizing ML model. The encoding application creates final vector representations for each DNS request string by generating a weighted summation of the embeddings of all of the tokens in the DNS request string. Because of hash replacement, the resulting DNS request embeddings reflect semantics of the hashes as a group.

Type: Grant

Filed: March 10, 2021

Date of Patent: October 10, 2023

Assignee: Oracle International Corporation

Inventors: Renata Khasanova, Felix Schmidt, Stuart Wray, Craig Schelp, Nipun Agarwal, Matteo Casserini
Malicious activity detection by cross-trace analysis and deep learning

Patent number: 11451565

Abstract: Techniques are provided herein for contextual embedding of features of operational logs or network traffic for anomaly detection based on sequence prediction. In an embodiment, a computer has a predictive recurrent neural network (RNN) that detects an anomalous network flow. In an embodiment, an RNN contextually transcodes sparse feature vectors that represent log messages into dense feature vectors that may be predictive or used to generate predictive vectors. In an embodiment, graph embedding improves feature embedding of log traces. In an embodiment, a computer detects and feature-encodes independent traces from related log messages. These techniques may detect malicious activity by anomaly analysis of context-aware feature embeddings of network packet flows, log messages, and/or log traces.

Type: Grant

Filed: September 5, 2018

Date of Patent: September 20, 2022

Assignee: Oracle International Corporation

Inventors: Guang-Tong Zhou, Hossein Hajimirsadeghi, Andrew Brownsword, Stuart Wray, Craig Schelp, Rod Reddekopp, Felix Schmidt
MACHINE LEARNING-BASED DNS REQUEST STRING REPRESENTATION WITH HASH REPLACEMENT

Publication number: 20220294757

Abstract: Techniques are described herein for using machine learning to learn vector representations of DNS requests such that the resulting embeddings represent the semantics of the DNS requests as a whole. Techniques described herein perform pre-processing of tokenized DNS request strings in which hashes, which are long and relatively random strings of characters, are detected in DNS request strings and each detected hash token is replaced with a placeholder token. A vectorizing ML model is trained using the pre-processed training dataset in which hash tokens have been replaced. Embeddings for the DNS tokens are derived from an intermediate layer of the vectorizing ML model. The encoding application creates final vector representations for each DNS request string by generating a weighted summation of the embeddings of all of the tokens in the DNS request string. Because of hash replacement, the resulting DNS request embeddings reflect semantics of the hashes as a group.

Type: Application

Filed: March 10, 2021

Publication date: September 15, 2022

Inventors: Renata Khasanova, Felix Schmidt, Stuart Wray, Craig Schelp, Nipun Agarwal, Matteo Casserini
Out of band server utilization estimation and server workload characterization for datacenter resource optimization and forecasting

Patent number: 11423327

Abstract: Techniques are described herein for estimating CPU, memory, and I/O utilization for a workload via out-of-band sensor readings using a machine learning model. The framework involves receiving sensor data associated with executing benchmark applications, obtaining ground truth utilization values for the benchmarks, preprocessing the training data to select a set of enhanced sequences, and using the enhanced sequences to train a random forest model to estimate CPU, memory, and I/O utilization given sensor monitoring data. Prior to the training phase, a machine learning model is trained using a set of predefined hyper-parameters. The trained models are used to generate estimations for CPU, memory, and I/O utilizations values. The utilization values are used with workload context information to assess the deployment and generate one or more recommendations for machine types that will best serve the workload in terms of system utilization.

Type: Grant

Filed: October 10, 2018

Date of Patent: August 23, 2022

Assignee: Oracle International Corporation

Inventors: Onur Kocberber, Felix Schmidt, Craig Schelp, Andrew Brownsword, Nipun Agarwal
Techniques for accurately estimating the reliability of storage systems

Patent number: 11416324

Abstract: Techniques are described herein for accurately measuring the reliability of storage systems. Rather than relying on a series of approximations, which may produce highly optimistic estimates, the techniques described herein use a failure distribution derived from a disk failure data set to derive reliability metrics such as mean time to data loss (MTTDL) and annual durability. A new framework for modeling storage system dynamics is described herein. The framework facilitates theoretical analysis of the reliability. The model described herein captures the complex structure of storage systems considering their configuration, dynamics, and operation. Given this model, a simulation-free analytical solution to the commonly used reliability metrics is derived. The model may also be used to analyze the long-term reliability behavior of storage systems.

Type: Grant

Filed: May 13, 2020

Date of Patent: August 16, 2022

Assignee: Oracle International Corporation

Inventors: Paria Rashidinejad, Navaneeth Jamadagni, Arun Raghavan, Craig Schelp, Charles Gordon
Malicious activity detection by cross-trace analysis and deep learning

Patent number: 11082438

Abstract: Techniques are provided herein for contextual embedding of features of operational logs or network traffic for anomaly detection based on sequence prediction. In an embodiment, a computer has a predictive recurrent neural network (RNN) that detects an anomalous network flow. In an embodiment, an RNN contextually transcodes sparse feature vectors that represent log messages into dense feature vectors that may be predictive or used to generate predictive vectors. In an embodiment, graph embedding improves feature embedding of log traces. In an embodiment, a computer detects and feature-encodes independent traces from related log messages. These techniques may detect malicious activity by anomaly analysis of context-aware feature embeddings of network packet flows, log messages, and/or log traces.

Type: Grant

Filed: September 5, 2018

Date of Patent: August 3, 2021

Assignee: Oracle International Corporation

Inventors: Juan Fernandez Peinador, Manel Fernandez Gomez, Guang-Tong Zhou, Hossein Hajimirsadeghi, Andrew Brownsword, Onur Kocberber, Felix Schmidt, Craig Schelp
Estimate bit error rates of network cables

Patent number: 10917203

Abstract: Embodiments use Bayesian techniques to efficiently estimate the bit error rates (BERs) of cables in a computer network at a customizable level of confidence. Specifically, a plurality of probability records are maintained for a given cable in a computer system, where each probability record is associated with a hypothetical BER for the cable, and reflects a probability that the cable has the associated hypothetical BER. At configurable time intervals, the probability records are updated using statistics gathered from a switch port connected to the cable. In order to estimate the BER of the cable at a given confidence level, embodiments determine which probability record is associated with a probability mass that indicates the confidence level. The estimate for the cable BER is the hypothetical BER that is associated with the indicated probability mass. Embodiments store the estimate in memory and utilize the estimate to aid in maintaining the computer system.

Type: Grant

Filed: May 17, 2019

Date of Patent: February 9, 2021

Assignee: Oracle International Corporation

Inventors: Stuart Wray, Felix Schmidt, Craig Schelp, Pravin Shinde, Akhilesh Singhania, Nipun Agarwal
Application- and infrastructure-aware orchestration for cloud monitoring applications

Patent number: 10892961

Abstract: Herein are computerized techniques for autonomous and artificially intelligent administration of a computer cloud health monitoring system. In an embodiment, an orchestration computer automatically detects a current state of network elements of a computer network by processing: a) a network plan that defines a topology of the computer network, and b) performance statistics of the network elements. The network elements include computers that each hosts virtual execution environment(s). Each virtual execution environment hosts analysis logic that transforms raw performance data of a network element into a portion of the performance statistics. For each computer, a configuration specification for each virtual execution environment of the computer is automatically generated based on the network plan and the current state of the computer network. At least one virtual execution environment is automatically tuned and/or re-provisioned based on a generated configuration specification.

Type: Grant

Filed: February 8, 2019

Date of Patent: January 12, 2021

Assignee: Oracle International Corporation

Inventors: Onur Kocberber, Felix Schmidt, Craig Schelp, Pravin Shinde
TECHNIQUES FOR ACCURATELY ESTIMATING THE RELIABILITY OF STORAGE SYSTEMS

Publication number: 20200371855

Abstract: Techniques are described herein for accurately measuring the reliability of storage systems. Rather than relying on a series of approximations, which may produce highly optimistic estimates, the techniques described herein use a failure distribution derived from a disk failure data set to derive reliability metrics such as mean time to data loss (MTTDL) and annual durability. A new framework for modeling storage system dynamics is described herein. The framework facilitates theoretical analysis of the reliability. The model described herein captures the complex structure of storage systems considering their configuration, dynamics, and operation. Given this model, a simulation-free analytical solution to the commonly used reliability metrics is derived. The model may also be used to analyze the long-term reliability behavior of storage systems.

Type: Application

Filed: May 13, 2020

Publication date: November 26, 2020

Inventors: Paria Rashidinejad, Navaneeth Jamadagni, Arun Raghavan, Craig Schelp, Charles Gordon
ESTIMATE BIT ERROR RATES OF NETWORK CABLES

Publication number: 20200366428

Abstract: Embodiments use Bayesian techniques to efficiently estimate the bit error rates (BERs) of cables in a computer network at a customizable level of confidence. Specifically, a plurality of probability records are maintained for a given cable in a computer system, where each probability record is associated with a hypothetical BER for the cable, and reflects a probability that the cable has the associated hypothetical BER. At configurable time intervals, the probability records are updated using statistics gathered from a switch port connected to the cable. In order to estimate the BER of the cable at a given confidence level, embodiments determine which probability record is associated with a probability mass that indicates the confidence level. The estimate for the cable BER is the hypothetical BER that is associated with the indicated probability mass. Embodiments store the estimate in memory and utilize the estimate to aid in maintaining the computer system.

Type: Application

Filed: May 17, 2019

Publication date: November 19, 2020

Inventors: STUART WRAY, FELIX SCHMIDT, CRAIG SCHELP, PRAVIN SHINDE, AKHILESH SINGHANIA, NIPUN AGARWAL
Automated mechanisms for ensuring correctness of evolving datacenter configurations

Patent number: 10795690

Abstract: Herein are computerized techniques for generation, costing/scoring, optimal selection, and reporting of intermediate configurations for a datacenter change plan. In an embodiment, a computer receives a current configuration of a datacenter and a target configuration. New configurations are generated based on the current configuration. A cost function is applied to calculate a cost of each new configuration based on measuring a logical difference between the new configuration and the target configuration. A particular new configuration is selected that has a least cost. When the particular configuration satisfies the target configuration, the datacenter is reconfigured based on the particular configuration. Otherwise, this process is (e.g. iteratively) repeated with the particular configuration instead used as the current configuration. In embodiments, new configurations are randomly, greedily, and/or manually generated.

Type: Grant

Filed: October 30, 2018

Date of Patent: October 6, 2020

Assignee: Oracle International Corporation

Inventors: Pravin Shinde, Felix Schmidt, Craig Schelp
Engine for reactive execution of massively concurrent heterogeneous accelerated scripted streaming analyses

Patent number: 10768982

Abstract: Herein are techniques for analysis of data streams. In an embodiment, a computer associates each software actor with data streams. Each software actor has its own backlog queue of data to analyze. In response to receiving some stream content and based on the received stream content, data is distributed to some software actors. In response to determining that the data satisfies completeness criteria of a particular software actor, an indication of the data is appended onto the backlog queue of the particular software actor. The particular software actor is reset to an initial state by loading an execution snapshot of a previous initial execution of an embedded virtual machine. Based on the particular software actor, execution of the execution snapshot of the previous initial execution is resumed to dequeue and process the indication of the data from the backlog queue of the particular software actor to generate a result.

Type: Grant

Filed: September 19, 2018

Date of Patent: September 8, 2020

Assignee: Oracle International Corporation

Inventors: Andrew Brownsword, Tayler Hetherington, Pavan Chandrashekar, Akhilesh Singhania, Stuart Wray, Pravin Shinde, Felix Schmidt, Craig Schelp, Onur Kocberber, Juan Fernandez Peinador, Rod Reddekopp, Manel Fernandez Gomez, Nipun Agarwal
APPLICATION- AND INFRASTRUCTURE-AWARE ORCHESTRATION FOR CLOUD MONITORING APPLICATIONS

Publication number: 20200259722

Abstract: Herein are computerized techniques for autonomous and artificially intelligent administration of a computer cloud health monitoring system. In an embodiment, an orchestration computer automatically detects a current state of network elements of a computer network by processing: a) a network plan that defines a topology of the computer network, and b) performance statistics of the network elements. The network elements include computers that each hosts virtual execution environment(s). Each virtual execution environment hosts analysis logic that transforms raw performance data of a network element into a portion of the performance statistics. For each computer, a configuration specification for each virtual execution environment of the computer is automatically generated based on the network plan and the current state of the computer network. At least one virtual execution environment is automatically tuned and/or re-provisioned based on a generated configuration specification.

Type: Application

Filed: February 8, 2019

Publication date: August 13, 2020

Inventors: Onur Kocberber, Felix Schmidt, Craig Schelp, Pravin Shinde
AUTOMATED MECHANISMS FOR ENSURING CORRECTNESS OF EVOLVING DATACENTER CONFIGURATIONS

Publication number: 20200133688

Abstract: Herein are computerized techniques for generation, costing/scoring, optimal selection, and reporting of intermediate configurations for a datacenter change plan. In an embodiment, a computer receives a current configuration of a datacenter and a target configuration. New configurations are generated based on the current configuration. A cost function is applied to calculate a cost of each new configuration based on measuring a logical difference between the new configuration and the target configuration. A particular new configuration is selected that has a least cost. When the particular configuration satisfies the target configuration, the datacenter is reconfigured based on the particular configuration. Otherwise, this process is (e.g. iteratively) repeated with the particular configuration instead used as the current configuration. In embodiments, new configurations are randomly, greedily, and/or manually generated.

Type: Application

Filed: October 30, 2018

Publication date: April 30, 2020

Inventors: PRAVIN SHINDE, FELIX SCHMIDT, CRAIG SCHELP
OUT OF BAND SERVER UTILIZATION ESTIMATION AND SERVER WORKLOAD CHARACTERIZATION FOR DATACENTER RESOURCE OPTIMIZATION AND FORECASTING

Publication number: 20200118039

Abstract: Techniques are described herein for estimating CPU, memory, and I/O utilization for a workload via out-of-band sensor readings using a machine learning model. The framework involves receiving sensor data associated with executing benchmark applications, obtaining ground truth utilization values for the benchmarks, preprocessing the training data to select a set of enhanced sequences, and using the enhanced sequences to train a random forest model to estimate CPU, memory, and I/O utilization given sensor monitoring data. Prior to the training phase, a machine learning model is trained using a set of predefined hyper-parameters. The trained models are used to generate estimations for CPU, memory, and I/O utilizations values. The utilization values are used with workload context information to assess the deployment and generate one or more recommendations for machine types that will best serve the workload in terms of system utilization.

Type: Application

Filed: October 10, 2018

Publication date: April 16, 2020

Inventors: Onur Kocberber, Felix Schmidt, Craig Schelp, Andrew Brownsword, Nipun Agarwal
ENGINE FOR REACTIVE EXECUTION OF MASSIVELY CONCURRENT HETEROGENEOUS ACCELERATED SCRIPTED STREAMING ANALYSES

Publication number: 20200089529

Abstract: Herein are techniques for analysis of data streams. In an embodiment, a computer associates each software actor with data streams. Each software actor has its own backlog queue of data to analyze. In response to receiving some stream content and based on the received stream content, data is distributed to some software actors. In response to determining that the data satisfies completeness criteria of a particular software actor, an indication of the data is appended onto the backlog queue of the particular software actor. The particular software actor is reset to an initial state by loading an execution snapshot of a previous initial execution of an embedded virtual machine. Based on the particular software actor, execution of the execution snapshot of the previous initial execution is resumed to dequeue and process the indication of the data from the backlog queue of the particular software actor to generate a result.

Type: Application

Filed: September 19, 2018

Publication date: March 19, 2020

Inventors: ANDREW BROWNSWORD, TAYLER HETHERINGTON, PAVAN CHANDRASHEKAR, AKHILESH SINGHANIA, STUART WRAY, PRAVIN SHINDE, FELIX SCHMIDT, CRAIG SCHELP, ONUR KOCBERBER, JUAN FERNANDEZ PEINADOR, ROD REDDEKOPP, MANEL FERNANDEZ GOMEZ, NIPUN AGARWAL
MALICIOUS ACTIVITY DETECTION BY CROSS-TRACE ANALYSIS AND DEEP LEARNING

Publication number: 20200076840

Abstract: Techniques are provided herein for contextual embedding of features of operational logs or network traffic for anomaly detection based on sequence prediction. In an embodiment, a computer has a predictive recurrent neural network (RNN) that detects an anomalous network flow. In an embodiment, an RNN contextually transcodes sparse feature vectors that represent log messages into dense feature vectors that may be predictive or used to generate predictive vectors. In an embodiment, graph embedding improves feature embedding of log traces. In an embodiment, a computer detects and feature-encodes independent traces from related log messages. These techniques may detect malicious activity by anomaly analysis of context-aware feature embeddings of network packet flows, log messages, and/or log traces.

Type: Application

Filed: September 5, 2018

Publication date: March 5, 2020

Inventors: JUAN FERNANDEZ PEINADOR, MANEL FERNANDEZ GOMEZ, GUANG-TONG ZHOU, HOSSEIN HAJIMIRSADEGHI, ANDREW BROWNSWORD, ONUR KOCBERBER, FELIX SCHMIDT, CRAIG SCHELP
MALICIOUS ACTIVITY DETECTION BY CROSS-TRACE ANALYSIS AND DEEP LEARNING

Publication number: 20200076842

Abstract: Techniques are provided herein for contextual embedding of features of operational logs or network traffic for anomaly detection based on sequence prediction. In an embodiment, a computer has a predictive recurrent neural network (RNN) that detects an anomalous network flow. In an embodiment, an RNN contextually transcodes sparse feature vectors that represent log messages into dense feature vectors that may be predictive or used to generate predictive vectors. In an embodiment, graph embedding improves feature embedding of log traces. In an embodiment, a computer detects and feature-encodes independent traces from related log messages. These techniques may detect malicious activity by anomaly analysis of context-aware feature embeddings of network packet flows, log messages, and/or log traces.

Type: Application

Filed: September 5, 2018

Publication date: March 5, 2020

Inventors: GUANG-TONG ZHOU, HOSSEIN HAJIMIRSADEGHI, ANDREW BROWNSWORD, STUART WRAY, CRAIG SCHELP, ROD REDDEKOPP, FELIX SCHMIDT