Patents by Inventor Asankhaya Sharma
Asankhaya Sharma has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11899800Abstract: A system to create a stacked classifier model combination or classifier ensemble has been designed for identification of undisclosed flaws in software components on a large-scale. This classifier ensemble is capable of at least a 54.55% improvement in precision. The system uses a K-folding cross validation algorithm to partition a sample dataset and then train and test a set of N classifiers with the dataset folds. At each test iteration, trained models of the set of classifiers generate probabilities that a sample has a flaw, resulting in a set of N probabilities or predictions for each sample in the test data. With a sample size of S, the system passes the S sets of N predictions to a logistic regressor along with “ground truth” for the sample dataset to train a logistic regression model. The trained classifiers and the logistic regression model are stored as the classifier ensemble.Type: GrantFiled: June 28, 2022Date of Patent: February 13, 2024Assignee: Veracode, Inc.Inventors: Asankhaya Sharma, Yaqin Zhou
-
Publication number: 20230409464Abstract: With invocations of a software development pipeline, organization specific remediations/fixes for a software project can be learned from scanning results of code submissions (e.g., commits or merges) across an organization for a software project(s). Fixes of detected program code flaws can be detected and/or specified across scans and associated with flaw identifiers and used for training machine learning models to identify candidate fixes for detected flaws. This ongoing learning during development propagates fixes created or chosen by experts (e.g., software engineers working on the software project) relevant to the software project. The experts can choose from suggestions mined from the learned fixes of the organization and suggestions generated from a pipeline created with the trained machine learning models. The selections are then used for further training of the machine learning models that form the pipeline.Type: ApplicationFiled: October 29, 2020Publication date: December 21, 2023Inventors: Asankhaya Sharma, Hao Xiao, Hendy Heng Lee Chua, Darius Tsien Wei Foo
-
Publication number: 20230153459Abstract: To preserve privacy when leveraging organization-specific remediation knowledge for flaw remediation across organizations, program code is deidentified to remove code which potentially identifies its source/origin. Deidentification operates based on structure of flaws and fixes at the level of source code constructs based on an abstract syntax tree (AST) or other structural context representation of a fix and corresponding flaw. Potentially identifying portions of a fix indicated in its AST are determined and modified (e.g., removed or obfuscated) without impacting AST structure. Deidentified remediation knowledge originating from different organizations is used to train a fix suggestion model(s) which learns structural context of fixes and corresponding flaws and, once trained, generates predictions indicating suggested fixes to flaws based on structural contexts of the flaws.Type: ApplicationFiled: November 10, 2020Publication date: May 18, 2023Inventors: Asankhaya Sharma, Hao Xiao, Hendy Heng Lee Chua, Darius Tsien Wei Foo
-
Publication number: 20220327220Abstract: A system to create a stacked classifier model combination or classifier ensemble has been designed for identification of undisclosed flaws in software components on a large-scale. This classifier ensemble is capable of at least a 54.55% improvement in precision. The system uses a K-folding cross validation algorithm to partition a sample dataset and then train and test a set of N classifiers with the dataset folds. At each test iteration, trained models of the set of classifiers generate probabilities that a sample has a flaw, resulting in a set of N probabilities or predictions for each sample in the test data. With a sample size of S, the system passes the S sets of N predictions to a logistic regressor along with “ground truth” for the sample dataset to train a logistic regression model. The trained classifiers and the logistic regression model are stored as the classifier ensemble.Type: ApplicationFiled: June 28, 2022Publication date: October 13, 2022Inventors: Asankhaya Sharma, Yaqin Zhou
-
Patent number: 11416622Abstract: A system to create a stacked classifier model combination or classifier ensemble has been designed for identification of undisclosed flaws in software components on a large-scale. This classifier ensemble is capable of at least a 54.55% improvement in precision. The system uses a K-folding cross validation algorithm to partition a sample dataset and then train and test a set of N classifiers with the dataset folds. At each test iteration, trained models of the set of classifiers generate probabilities that a sample has a flaw, resulting in a set of N probabilities or predictions for each sample in the test data. With a sample size of S, the system passes the S sets of N predictions to a logistic regressor along with “ground truth” for the sample dataset to train a logistic regression model. The trained classifiers and the logistic regression model are stored as the classifier ensemble.Type: GrantFiled: August 20, 2018Date of Patent: August 16, 2022Assignee: VERACODE, INC.Inventors: Asankhaya Sharma, Yaqin Zhou
-
Patent number: 10803061Abstract: To analyze open-source code at a large scale, a security domain graph language (“SGL”) has been created that functions as a vulnerability description language and facilitates program analysis queries. The SGL facilitates building and maintaining a graph database to catalogue vulnerabilities found in open-source components. This graphical database can be accessed via a database interface directly or accessed by an agent that interacts with the database interface. To build the graph database, a database interface processes an open-source component and creates graph structures which represent relationships present in the open-source component. The database interface transforms a vulnerability description into a canonical form based on a schema for the graph database and updates the database based on a determination of whether the vulnerability is a duplicate. This ensures quality and consistency of the vulnerability dataset maintained in the graph database.Type: GrantFiled: July 31, 2018Date of Patent: October 13, 2020Assignee: Veracode, Inc.Inventors: Darius Tsien Wei Foo, Ming Yi Ang, Jie Shun Yeo, Asankhaya Sharma
-
Publication number: 20200057858Abstract: A system to create a stacked classifier model combination or classifier ensemble has been designed for identification of undisclosed flaws in software components on a large-scale. This classifier ensemble is capable of at least a 54.55% improvement in precision. The system uses a K-folding cross validation algorithm to partition a sample dataset and then train and test a set of N classifiers with the dataset folds. At each test iteration, trained models of the set of classifiers generate probabilities that a sample has a flaw, resulting in a set of N probabilities or predictions for each sample in the test data. With a sample size of S, the system passes the S sets of N predictions to a logistic regressor along with “ground truth” for the sample dataset to train a logistic regression model. The trained classifiers and the logistic regression model are stored as the classifier ensemble.Type: ApplicationFiled: August 20, 2018Publication date: February 20, 2020Inventors: Asankhaya Sharma, Yaqin Zhou
-
Publication number: 20200042712Abstract: To analyze open-source code at a large scale, a security domain graph language (“SGL”) has been created that functions as a vulnerability description language and facilitates program analysis queries. The SGL facilitates building and maintaining a graph database to catalogue vulnerabilities found in open-source components. This vulnerability database generated with SGL is used for analysis of software projects which use open source components. An agent which interacts with the vulnerability database can perform a scan of a software project to identify open-source components used in the project and submit queries to the vulnerability database to identify vulnerabilities which may affect the open-source components in the project. Results of the scan are presented to a user in the form of a vulnerability report which indicates vulnerabilities that have been discovered and which open-source components the vulnerabilities affect.Type: ApplicationFiled: July 31, 2018Publication date: February 6, 2020Inventors: Darius Tsien Wei Foo, Ming Yi Ang, Jie Shun Yeo, Asankhaya Sharma
-
Publication number: 20200042628Abstract: To analyze open-source code at a large scale, a security domain graph language (“SGL”) has been created that functions as a vulnerability description language and facilitates program analysis queries. The SGL facilitates building and maintaining a graph database to catalogue vulnerabilities found in open-source components. This graphical database can be accessed via a database interface directly or accessed by an agent that interacts with the database interface. To build the graph database, a database interface processes an open-source component and creates graph structures which represent relationships present in the open-source component. The database interface transforms a vulnerability description into a canonical form based on a schema for the graph database and updates the database based on a determination of whether the vulnerability is a duplicate. This ensures quality and consistency of the vulnerability dataset maintained in the graph database.Type: ApplicationFiled: July 31, 2018Publication date: February 6, 2020Inventors: Darius Tsien Wei Foo, Ming Yi Ang, Jie Shun Yeo, Asankhaya Sharma
-
Publication number: 20160098563Abstract: A facility for analyzing a pair of code files is described. From each of the code files, the facility extracts a hierarchy of textual names. The facility then determines the score reflecting a level of similarity between the extracted hierarchies of textual names for attribution to the pair of code files.Type: ApplicationFiled: October 3, 2014Publication date: April 7, 2016Inventor: Asankhaya Sharma
-
Publication number: 20110126113Abstract: Aspects of the subject matter described herein relate to displaying content on multiple pages. In aspects, a request for content is received from a browsing component. The content is divided into pages suitable for displaying on a display associated with the browsing component. Navigation elements may be embedded in the pages to allow a user using the browsing component to navigate between pages corresponding to the content. The actions of dividing the content into multiple pages may occur on a content server, an entity intermediate to the content server and a client hosting the browsing component, or a component of the client.Type: ApplicationFiled: November 23, 2009Publication date: May 26, 2011Applicant: c/o Microsoft CorporationInventors: Asankhaya Sharma, Prashant Kumar Dhingra