Patents by Inventor Joshua Daniel Saxe

Joshua Daniel Saxe has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Methods and apparatus for detection of malicious documents using machine learning

Patent number: 12339962

Abstract: An apparatus for detecting malicious files includes a memory and a processor communicatively coupled to the memory. The processor receives multiple potentially malicious files. A first potentially malicious file has a first file format, and a second potentially malicious file has a second file format different than the first file format. The processor extracts a first set of strings from the first potentially malicious file, and extracts a second set of strings from the second potentially malicious file. First and second feature vectors are defined based on lengths of each string from the associated set of strings. The processor provides the first feature vector as an input to a machine learning model to produce a maliciousness classification of the first potentially malicious file, and provides the second feature vector as an input to the machine learning model to produce a maliciousness classification of the second potentially malicious file.

Type: Grant

Filed: October 10, 2023

Date of Patent: June 24, 2025

Assignee: Sophos Limited

Inventors: Joshua Daniel Saxe, Ethan M. Rudd, Richard Harang
Methods and apparatus for natural language interface for constructing complex database queries

Patent number: 12265526

Abstract: In some embodiments, a processor receives, via an interface, natural language data associated with a user request for performing an identified computational task associated with a cybersecurity management system. The processor is configured to provide the natural language data as input to a machine learning (ML) model. The ML model is configured to automatically infer a template query based on the natural language data. The processor is further configured to cause the template query to be displayed, via the interface. The processor is further configured to receive, via the interface, user input indicating a finalized query associated with the identified computational task, and to provide the finalized query as input to a system configured to perform the identified computational task. The processor is further configured to modify a security setting in the cybersecurity management system based on the performance of the identified computational task.

Type: Grant

Filed: March 31, 2022

Date of Patent: April 1, 2025

Assignee: Sophos Limited

Inventors: Joshua Daniel Saxe, Younghoo Lee
Methods and apparatus for using machine learning on multiple file fragments to identify malware

Patent number: 12248572

Abstract: In some embodiments, a method includes processing at least a portion of a received file into a first set of fragments and analyzing each fragment from the first set of fragments using a machine learning model to identify within each fragment first information potentially relevant to whether the file is malicious. The method includes forming a second set of fragments by combining adjacent fragments from the first set of fragments and analyzing each fragment from the second set of fragments using the machine learning model to identify second information potentially relevant to whether the file is malicious. The method includes identifying the file as malicious based on the first information within at least one fragment from the first set of fragments and the second information within at least one fragment from the second set of fragments. The method includes performing a remedial action based on identifying the file as malicious.

Type: Grant

Filed: March 20, 2023

Date of Patent: March 11, 2025

Assignee: Sophos Limited

Inventors: Joshua Daniel Saxe, Richard Harang
Natural language analysis of a command line using a machine learning model to generate a natural language description of the command line

Patent number: 12204870

Abstract: In one or more embodiments, a command is repeatedly input a predetermined number of times into a machine learning model to generate a plurality of different natural language (NL) descriptions. The plurality of different NL descriptions are input into the machine learning model to generate a plurality of different check commands. A plurality of similarity metrics are determined by comparing each check command from the plurality of different check commands to the command. A check command from the plurality of different check commands that is most similar to the command is identified based on the plurality of similarity metrics. An NL description from the plurality of different NL descriptions is caused to be displayed, the NL description previously input into the machine learning model to generate the check command.

Type: Grant

Filed: March 31, 2022

Date of Patent: January 21, 2025

Assignee: Sophos Limited

Inventor: Joshua Daniel Saxe
Methods and apparatus for detecting whether a string of characters represents malicious activity using machine learning

Patent number: 12189773

Abstract: In some embodiments, a processor can receive an input string associated with a potentially malicious artifact and convert each character in the input string into a vector of values to define a character matrix. The processor can apply a convolution matrix to a first window of the character matrix to define a first subscore, apply the convolution matrix to a second window of the character matrix to define a second subscore and combine the first subscore and the second subscore to define a score for the convolution matrix. The processor can provide the score for the convolution matrix as an input to a machine learning threat model, identify the potentially malicious artifact as malicious based on an output of the machine learning threat model, and perform a remedial action on the potentially malicious artifact based on identifying the potentially malicious artifact as malicious.

Type: Grant

Filed: November 10, 2023

Date of Patent: January 7, 2025

Assignee: Invincea, Inc.

Inventor: Joshua Daniel Saxe
Methods and apparatus for augmenting training data using large language models

Patent number: 12130923

Abstract: In some embodiments, a processor receives natural language data for performing an identified cybersecurity task. The processor can provide the natural language data to a first machine learning (ML) model. The first ML model can automatically infer a template query based on the natural language data. The processor can receive user input indicating a finalized query and to provide the finalized query as input to a system configured to perform the identified computational task. The processor can provide the finalized query as a reference phrase to a second ML model, the second ML model configured to generate a set of natural language phrases similar to the reference phrase. The processor can generate supplemental training data using the set of natural language phrases similar to the reference phrase to augment training data used to improve performance of the first ML model and/or the second ML model.

Type: Grant

Filed: March 31, 2022

Date of Patent: October 29, 2024

Assignee: Sophos Limited

Inventors: Younghoo Lee, Miklós Sándor Béky, Joshua Daniel Saxe
Classifier generator

Patent number: 12067120

Abstract: A rule generator can automatically generate a machine-learning-powered detection system capable of recognizing a new malicious object or family of malicious objects and deployable as a text-based, pastable detection rule. The text may be quickly distributed and integrated into existing cybersecurity infrastructure, for example, if the cybersecurity infrastructure supports a rules engine. After initial distribution, the identity may be refined, updated, and replaced. This allows for rapid development and distribution of an initial level of protection, and for updating and improvement over time.

Type: Grant

Filed: November 19, 2021

Date of Patent: August 20, 2024

Assignee: Sophos Limited

Inventor: Joshua Daniel Saxe
Methods and apparatus for using machine learning to classify malicious infrastructure

Patent number: 12010129

Abstract: Embodiments disclosed include methods and apparatus for detecting a reputation of infrastructure associated with potentially malicious content. In some embodiments, an apparatus includes a memory and a processor. The processor is configured to identify an Internet Protocol (IP) address associated with potentially malicious content and define each row of a matrix by applying a different subnet mask from a plurality of subnet masks to a binary representation of the IP address to define that row of the matrix. The processor is further configured to provide the matrix as an input to a machine learning model, and receive, from the machine learning model, a score associated with a maliciousness of the IP address.

Type: Grant

Filed: April 23, 2021

Date of Patent: June 11, 2024

Assignee: Sophos Limited

Inventors: Tamás Vörös, Richard Harang, Joshua Daniel Saxe
Programmable feature extractor with anonymization

Patent number: 11989326

Abstract: A compute instance may be configured to extract a feature of a data instance accessed by the compute instance, generate an anonymized feature value for the feature of the data instance, include the anonymized feature value in a feature vector corresponding to the data instance, and transmit the feature vector to a server-based computing system.

Type: Grant

Filed: March 30, 2021

Date of Patent: May 21, 2024

Assignee: Sophos Limited

Inventors: Joseph H. Levy, Kenneth D. Ray, Joshua Daniel Saxe
Methods and apparatus for identifying an impact of a portion of a file on machine learning classification of malicious content

Patent number: 11941491

Abstract: In some embodiments, a non-transitory processor-readable medium stores code representing instructions to be executed by a processor. The code includes code to cause the processor to receive a structured file for which a machine learning model has made a malicious content classification. The code further includes code to remove a portion of the structured file to define a modified structured file that follows a format associated with a type of the structured file. The code further includes code to extract a set of features from the modified structured file. The code further includes code to provide the set of features as an input to the machine learning model to produce an output. The code further includes code to identify an impact of the portion of the structured file on the malicious content classification of the structured file based on the output.

Type: Grant

Filed: January 31, 2018

Date of Patent: March 26, 2024

Assignee: Sophos Limited

Inventors: Richard Harang, Joshua Daniel Saxe
COMPUTER AUGMENTED THREAT EVALUATION

Publication number: 20240062133

Abstract: An automated system attempts to characterize code as safe or unsafe. For intermediate code samples not placed with sufficient confidence in either category, human-readable analysis is automatically generated to assist a human reviewer in reaching a final disposition. For example, a random forest over human-interpretable features may be created and used to identify suspicious features in a manner that is understandable to, and actionable by, a human reviewer. Similarly, a k-nearest neighbor algorithm may be used to identify similar samples of known safe and unsafe code based on a model for, e.g., a file path, a URL, an executable, and so forth. Similar code may then be displayed (with other information) to a user for evaluation in a user interface. This comparative information can improve the speed and accuracy of human interventions by providing richer context for human review of potential threats.

Type: Application

Filed: September 7, 2023

Publication date: February 22, 2024

Inventors: Joshua Daniel Saxe, Andrew J. Thomas, Russell Humphries, Simon Neil Reed, Kenneth D. Ray, Joseph H. Levy
Methods and apparatus for detecting whether a string of characters represents malicious activity using machine learning

Patent number: 11853427

Abstract: In some embodiments, a processor can receive an input string associated with a potentially malicious artifact and convert each character in the input string into a vector of values to define a character matrix. The processor can apply a convolution matrix to a first window of the character matrix to define a first subscore, apply the convolution matrix to a second window of the character matrix to define a second sub score and combine the first subscore and the second subscore to define a score for the convolution matrix. The processor can provide the score for the convolution matrix as an input to a machine learning threat model, identify the potentially malicious artifact as malicious based on an output of the machine learning threat model, and perform a remedial action on the potentially malicious artifact based on identifying the potentially malicious artifact as malicious.

Type: Grant

Filed: December 19, 2022

Date of Patent: December 26, 2023

Assignee: Invincea, Inc.

Inventor: Joshua Daniel Saxe
Methods and apparatus for machine learning based malware detection

Patent number: 11841947

Abstract: Apparatus and methods describe herein, for example, a process that can include receiving a potentially malicious file, and dividing the potentially malicious file into a set of byte windows. The process can include calculating at least one attribute associated with each byte window from the set of byte windows for the potentially malicious file. In such an instance, the at least one attribute is not dependent on an order of bytes in the potentially malicious file. The process can further include identifying a probability that the potentially malicious file is malicious, based at least in part on the at least one attribute and a trained threat model.

Type: Grant

Filed: December 8, 2020

Date of Patent: December 12, 2023

Assignee: Invincea, Inc.

Inventors: Joshua Daniel Saxe, Konstantin Berlin
Methods and apparatus for detection of malicious documents using machine learning

Patent number: 11822374

Abstract: An apparatus for detecting malicious files includes a memory and a processor communicatively coupled to the memory. The processor receives multiple potentially malicious files. A first potentially malicious file has a first file format, and a second potentially malicious file has a second file format different than the first file format. The processor extracts a first set of strings from the first potentially malicious file, and extracts a second set of strings from the second potentially malicious file. First and second feature vectors are defined based on lengths of each string from the associated set of strings. The processor provides the first feature vector as an input to a machine learning model to produce a maliciousness classification of the first potentially malicious file, and provides the second feature vector as an input to the machine learning model to produce a maliciousness classification of the second potentially malicious file.

Type: Grant

Filed: May 7, 2021

Date of Patent: November 21, 2023

Assignee: Sophos Limited

Inventors: Joshua Daniel Saxe, Ethan M. Rudd, Richard Harang
Computer augmented threat evaluation

Patent number: 11755974

Abstract: An automated system attempts to characterize code as safe or unsafe. For intermediate code samples not placed with sufficient confidence in either category, human-readable analysis is automatically generated to assist a human reviewer in reaching a final disposition. For example, a random forest over human-interpretable features may be created and used to identify suspicious features in a manner that is understandable to, and actionable by, a human reviewer. Similarly, a k-nearest neighbor algorithm may be used to identify similar samples of known safe and unsafe code based on a model for, e.g., a file path, a URL, an executable, and so forth. Similar code may then be displayed (with other information) to a user for evaluation in a user interface. This comparative information can improve the speed and accuracy of human interventions by providing richer context for human review of potential threats.

Type: Grant

Filed: March 1, 2021

Date of Patent: September 12, 2023

Assignee: Sophos Limited

Inventors: Joshua Daniel Saxe, Andrew J. Thomas, Russell Humphries, Simon Neil Reed, Kenneth D. Ray, Joseph H. Levy
Methods and apparatus for using machine learning on multiple file fragments to identify malware

Patent number: 11609991

Abstract: In some embodiments, a method includes processing at least a portion of a received file into a first set of fragments and analyzing each fragment from the first set of fragments using a machine learning model to identify within each fragment first information potentially relevant to whether the file is malicious. The method includes forming a second set of fragments by combining adjacent fragments from the first set of fragments and analyzing each fragment from the second set of fragments using the machine learning model to identify second information potentially relevant to whether the file is malicious. The method includes identifying the file as malicious based on the first information within at least one fragment from the first set of fragments and the second information within at least one fragment from the second set of fragments. The method includes performing a remedial action based on identifying the file as malicious.

Type: Grant

Filed: April 21, 2020

Date of Patent: March 21, 2023

Assignee: Sophos Limited

Inventors: Joshua Daniel Saxe, Richard Harang
Computer assisted identification of intermediate level threats

Patent number: 11552962

Abstract: An ensemble of detection techniques are used to identify code that presents intermediate levels of threat. For example, an ensemble of machine learning techniques may be used to evaluate suspiciousness based on binaries, file paths, behaviors, reputations, and so forth, and code may be sorted into safe, unsafe, intermediate, or any similar categories. By filtering and prioritizing intermediate threats with these tools, human threat intervention can advantageously be directed toward code samples and associated contexts most appropriate for non-automated responses.

Type: Grant

Filed: September 12, 2018

Date of Patent: January 10, 2023

Assignee: Sophos Limited

Inventors: Joshua Daniel Saxe, Andrew J. Thomas, Russell Humphries, Simon Neil Reed, Kenneth D. Ray, Joseph H. Levy
Methods and apparatus for detecting whether a string of characters represents malicious activity using machine learning

Patent number: 11544380

Abstract: In some embodiments, a processor can receive an input string associated with a potentially malicious artifact and convert each character in the input string into a vector of values to define a character matrix. The processor can apply a convolution matrix to a first window of the character matrix to define a first subscore, apply the convolution matrix to a second window of the character matrix to define a second subscore and combine the first subscore and the second subscore to define a score for the convolution matrix. The processor can provide the score for the convolution matrix as an input to a machine learning threat model, identify the potentially malicious artifact as malicious based on an output of the machine learning threat model, and perform a remedial action on the potentially malicious artifact based on identifying the potentially malicious artifact as malicious.

Type: Grant

Filed: December 17, 2020

Date of Patent: January 3, 2023

Assignee: Invincea, Inc.

Inventor: Joshua Daniel Saxe
Programmable Feature Extractor

Publication number: 20220318665

Abstract: A compute instance stores a programmable feature extractor associated with a machine learning model maintained by a server-based computing system configured to communicate with the compute instance by way of a network. The machine learning model is based on a feature set that includes a plurality of features. The compute instance executes the programmable feature extractor to generate a feature vector corresponding to a data instance accessed by the compute instance, where the feature vector includes a feature value specific to the data instance for each feature included in the feature set. The compute instance transmits the feature vector corresponding to the data instance to the server-based computing system for use as a training input to the machine learning model.

Type: Application

Filed: March 30, 2021

Publication date: October 6, 2022

Inventors: Joseph H. Levy, Kenneth D. Ray, Joshua Daniel Saxe
Programmable Feature Extractor with Anonymization

Publication number: 20220318429

Abstract: A compute instance may be configured to extract a feature of a data instance accessed by the compute instance, generate an anonymized feature value for the feature of the data instance, include the anonymized feature value in a feature vector corresponding to the data instance, and transmit the feature vector to a server-based computing system.

Type: Application

Filed: March 30, 2021

Publication date: October 6, 2022

Inventors: Joseph H. Levy, Kenneth D. Ray, Joshua Daniel Saxe

1 2 3 next