Patents by Inventor Asankhaya Sharma

Asankhaya Sharma has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Open source vulnerability prediction with machine learning ensemble

Patent number: 11899800

Abstract: A system to create a stacked classifier model combination or classifier ensemble has been designed for identification of undisclosed flaws in software components on a large-scale. This classifier ensemble is capable of at least a 54.55% improvement in precision. The system uses a K-folding cross validation algorithm to partition a sample dataset and then train and test a set of N classifiers with the dataset folds. At each test iteration, trained models of the set of classifiers generate probabilities that a sample has a flaw, resulting in a set of N probabilities or predictions for each sample in the test data. With a sample size of S, the system passes the S sets of N predictions to a logistic regressor along with “ground truth” for the sample dataset to train a logistic regression model. The trained classifiers and the logistic regression model are stored as the classifier ensemble.

Type: Grant

Filed: June 28, 2022

Date of Patent: February 13, 2024

Assignee: Veracode, Inc.

Inventors: Asankhaya Sharma, Yaqin Zhou
DEVELOPMENT PIPELINE INTEGRATED ONGOING LEARNING FOR ASSISTED CODE REMEDIATION

Publication number: 20230409464

Abstract: With invocations of a software development pipeline, organization specific remediations/fixes for a software project can be learned from scanning results of code submissions (e.g., commits or merges) across an organization for a software project(s). Fixes of detected program code flaws can be detected and/or specified across scans and associated with flaw identifiers and used for training machine learning models to identify candidate fixes for detected flaws. This ongoing learning during development propagates fixes created or chosen by experts (e.g., software engineers working on the software project) relevant to the software project. The experts can choose from suggestions mined from the learned fixes of the organization and suggestions generated from a pipeline created with the trained machine learning models. The selections are then used for further training of the machine learning models that form the pipeline.

Type: Application

Filed: October 29, 2020

Publication date: December 21, 2023

Inventors: Asankhaya Sharma, Hao Xiao, Hendy Heng Lee Chua, Darius Tsien Wei Foo
DEIDENTIFYING CODE FOR CROSS-ORGANIZATION REMEDIATION KNOWLEDGE

Publication number: 20230153459

Abstract: To preserve privacy when leveraging organization-specific remediation knowledge for flaw remediation across organizations, program code is deidentified to remove code which potentially identifies its source/origin. Deidentification operates based on structure of flaws and fixes at the level of source code constructs based on an abstract syntax tree (AST) or other structural context representation of a fix and corresponding flaw. Potentially identifying portions of a fix indicated in its AST are determined and modified (e.g., removed or obfuscated) without impacting AST structure. Deidentified remediation knowledge originating from different organizations is used to train a fix suggestion model(s) which learns structural context of fixes and corresponding flaws and, once trained, generates predictions indicating suggested fixes to flaws based on structural contexts of the flaws.

Type: Application

Filed: November 10, 2020

Publication date: May 18, 2023

Inventors: Asankhaya Sharma, Hao Xiao, Hendy Heng Lee Chua, Darius Tsien Wei Foo
OPEN SOURCE VULNERABILITY PREDICTION WITH MACHINE LEARNING ENSEMBLE

Publication number: 20220327220

Abstract: A system to create a stacked classifier model combination or classifier ensemble has been designed for identification of undisclosed flaws in software components on a large-scale. This classifier ensemble is capable of at least a 54.55% improvement in precision. The system uses a K-folding cross validation algorithm to partition a sample dataset and then train and test a set of N classifiers with the dataset folds. At each test iteration, trained models of the set of classifiers generate probabilities that a sample has a flaw, resulting in a set of N probabilities or predictions for each sample in the test data. With a sample size of S, the system passes the S sets of N predictions to a logistic regressor along with “ground truth” for the sample dataset to train a logistic regression model. The trained classifiers and the logistic regression model are stored as the classifier ensemble.

Type: Application

Filed: June 28, 2022

Publication date: October 13, 2022

Inventors: Asankhaya Sharma, Yaqin Zhou
Open source vulnerability prediction with machine learning ensemble

Patent number: 11416622

Abstract: A system to create a stacked classifier model combination or classifier ensemble has been designed for identification of undisclosed flaws in software components on a large-scale. This classifier ensemble is capable of at least a 54.55% improvement in precision. The system uses a K-folding cross validation algorithm to partition a sample dataset and then train and test a set of N classifiers with the dataset folds. At each test iteration, trained models of the set of classifiers generate probabilities that a sample has a flaw, resulting in a set of N probabilities or predictions for each sample in the test data. With a sample size of S, the system passes the S sets of N predictions to a logistic regressor along with “ground truth” for the sample dataset to train a logistic regression model. The trained classifiers and the logistic regression model are stored as the classifier ensemble.

Type: Grant

Filed: August 20, 2018

Date of Patent: August 16, 2022

Assignee: VERACODE, INC.

Inventors: Asankhaya Sharma, Yaqin Zhou
Software vulnerability graph database

Patent number: 10803061

Abstract: To analyze open-source code at a large scale, a security domain graph language (“SGL”) has been created that functions as a vulnerability description language and facilitates program analysis queries. The SGL facilitates building and maintaining a graph database to catalogue vulnerabilities found in open-source components. This graphical database can be accessed via a database interface directly or accessed by an agent that interacts with the database interface. To build the graph database, a database interface processes an open-source component and creates graph structures which represent relationships present in the open-source component. The database interface transforms a vulnerability description into a canonical form based on a schema for the graph database and updates the database based on a determination of whether the vulnerability is a duplicate. This ensures quality and consistency of the vulnerability dataset maintained in the graph database.

Type: Grant

Filed: July 31, 2018

Date of Patent: October 13, 2020

Assignee: Veracode, Inc.

Inventors: Darius Tsien Wei Foo, Ming Yi Ang, Jie Shun Yeo, Asankhaya Sharma
OPEN SOURCE VULNERABILITY PREDICTION WITH MACHINE LEARNING ENSEMBLE

Publication number: 20200057858

Abstract: A system to create a stacked classifier model combination or classifier ensemble has been designed for identification of undisclosed flaws in software components on a large-scale. This classifier ensemble is capable of at least a 54.55% improvement in precision. The system uses a K-folding cross validation algorithm to partition a sample dataset and then train and test a set of N classifiers with the dataset folds. At each test iteration, trained models of the set of classifiers generate probabilities that a sample has a flaw, resulting in a set of N probabilities or predictions for each sample in the test data. With a sample size of S, the system passes the S sets of N predictions to a logistic regressor along with “ground truth” for the sample dataset to train a logistic regression model. The trained classifiers and the logistic regression model are stored as the classifier ensemble.

Type: Application

Filed: August 20, 2018

Publication date: February 20, 2020

Inventors: Asankhaya Sharma, Yaqin Zhou
OPEN-SOURCE SOFTWARE VULNERABILITY ANALYSIS

Publication number: 20200042712

Abstract: To analyze open-source code at a large scale, a security domain graph language (“SGL”) has been created that functions as a vulnerability description language and facilitates program analysis queries. The SGL facilitates building and maintaining a graph database to catalogue vulnerabilities found in open-source components. This vulnerability database generated with SGL is used for analysis of software projects which use open source components. An agent which interacts with the vulnerability database can perform a scan of a software project to identify open-source components used in the project and submit queries to the vulnerability database to identify vulnerabilities which may affect the open-source components in the project. Results of the scan are presented to a user in the form of a vulnerability report which indicates vulnerabilities that have been discovered and which open-source components the vulnerabilities affect.

Type: Application

Filed: July 31, 2018

Publication date: February 6, 2020

Inventors: Darius Tsien Wei Foo, Ming Yi Ang, Jie Shun Yeo, Asankhaya Sharma
SOFTWARE VULNERABILITY GRAPH DATABASE

Publication number: 20200042628

Abstract: To analyze open-source code at a large scale, a security domain graph language (“SGL”) has been created that functions as a vulnerability description language and facilitates program analysis queries. The SGL facilitates building and maintaining a graph database to catalogue vulnerabilities found in open-source components. This graphical database can be accessed via a database interface directly or accessed by an agent that interacts with the database interface. To build the graph database, a database interface processes an open-source component and creates graph structures which represent relationships present in the open-source component. The database interface transforms a vulnerability description into a canonical form based on a schema for the graph database and updates the database based on a determination of whether the vulnerability is a duplicate. This ensures quality and consistency of the vulnerability dataset maintained in the graph database.

Type: Application

Filed: July 31, 2018

Publication date: February 6, 2020

Inventors: Darius Tsien Wei Foo, Ming Yi Ang, Jie Shun Yeo, Asankhaya Sharma
SIGNATURES FOR SOFTWARE COMPONENTS

Publication number: 20160098563

Abstract: A facility for analyzing a pair of code files is described. From each of the code files, the facility extracts a hierarchy of textual names. The facility then determines the score reflecting a level of similarity between the extracted hierarchies of textual names for attribution to the pair of code files.

Type: Application

Filed: October 3, 2014

Publication date: April 7, 2016

Inventor: Asankhaya Sharma
DISPLAYING CONTENT ON MULTIPLE WEB PAGES

Publication number: 20110126113

Abstract: Aspects of the subject matter described herein relate to displaying content on multiple pages. In aspects, a request for content is received from a browsing component. The content is divided into pages suitable for displaying on a display associated with the browsing component. Navigation elements may be embedded in the pages to allow a user using the browsing component to navigate between pages corresponding to the content. The actions of dividing the content into multiple pages may occur on a content server, an entity intermediate to the content server and a client hosting the browsing component, or a component of the client.

Type: Application

Filed: November 23, 2009

Publication date: May 26, 2011

Applicant: c/o Microsoft Corporation

Inventors: Asankhaya Sharma, Prashant Kumar Dhingra