Patents by Inventor Yunhui Zheng

Yunhui Zheng has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Training data augmentation via program simplification

Patent number: 11947940

Abstract: Techniques regarding augmenting one or more training datasets for training one or more AI models are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise training augmentation component that can generate an augmented training dataset for training an artificial intelligence model by extracting a simplified source code sample from a source code sample comprised within a training dataset.

Type: Grant

Filed: October 11, 2021

Date of Patent: April 2, 2024

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sahil Suneja, Yufan Zhuang, Yunhui Zheng, Alessandro Morari, Jim Alain Laredo
Contextual embeddings for improving static analyzer output

Patent number: 11765193

Abstract: In a computer-implemented method for improving a static analyzer output, a processor receives a labeled data set with labeled true vulnerabilities and labeled false vulnerabilities. A processor receives pretrained contextual embeddings from a contextual embeddings model. A processor maps the true vulnerabilities and the false vulnerabilities to the pretrained contextual embeddings model. A processor generates a fine-tuned model with classifications for true vulnerabilities.

Type: Grant

Filed: December 30, 2020

Date of Patent: September 19, 2023

Assignee: International Business Machines Corporation

Inventors: Saurabh Pujar, Luca Buratti, Alessandro Morari, Jim Alain Laredo, Mihaela Ancuta Bornea, Jeffrey Scott McCarley, Yunhui Zheng
Source code fault detection

Patent number: 11762758

Abstract: Approaches presented herein enable fault detection. More specifically, implementation code of one or more functions is identified from source code. The implementation code of the one or more functions is converted to corresponding Abstract Syntax Trees (ASTs). The implementation code of the one or more functions is represented as a first plurality of sets of AST paths over the ASTs. Classification results for the one or more functions are generated with a classifier based on the first plurality of sets of AST paths for the implementation code of the one or more functions. Each of the classification results indicates a probability of having at least one fault in a corresponding function of the one or more functions. Fault detection results of the source code are generated based on the classification results.

Type: Grant

Filed: March 29, 2021

Date of Patent: September 19, 2023

Assignee: International Business Machines Corporation

Inventors: Shiwan Zhao, Bo Yang, HongLei Guo, Zhong Su, Yunhui Zheng, Jim Alain Laredo, Alessandro Morari, Marco Pistoia
Analysis to check web API code usage and specification

Patent number: 11663110

Abstract: A debugging tool and method for statically verifying programs that invoke web-based services through API calls is provided. The tool receives source code that comprises one or more invocation of web APIs for requesting web-based services. The tool also receives a set of web API specifications. The tool extracts a set of request information for each web API invocation in the source code, the set of request information including a usage string of an URL endpoint. The tool verifies whether the set of request information complies with the received web API specifications and reports a result of the verification.

Type: Grant

Filed: October 31, 2016

Date of Patent: May 30, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Julian T. Dolby, Jim A. Laredo, John E. Wittern, Annie T. T. Ying, Yunhui Zheng
ARTIFICIAL INTELLIGENCE MODEL LEARNING INTROSPECTION

Publication number: 20230130781

Abstract: Techniques regarding AI model introspection are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise model introspection component that can analyze artificial intelligence model learning behavior for a code understanding task by comparing an output of an artificial intelligence model with respect to a plurality of testing data subsets that have varying code complexity distributions.

Type: Application

Filed: October 21, 2021

Publication date: April 27, 2023

Inventors: Sahil Suneja, Yufan Zhuang, Yunhui Zheng, Alessandro Morari, Jim Alain Laredo
TRAINING DATA AUGMENTATION VIA PROGRAM SIMPLIFICATION

Publication number: 20230113733

Abstract: Techniques regarding augmenting one or more training datasets for training one or more AI models are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise training augmentation component that can generate an augmented training dataset for training an artificial intelligence model by extracting a simplified source code sample from a source code sample comprised within a training dataset.

Type: Application

Filed: October 11, 2021

Publication date: April 13, 2023

Inventors: Sahil Suneja, Yufan Zhuang, Yunhui Zheng, Alessandro Morari, Jim Alain Laredo
COMPLEXITY BASED ARTIFICIAL INTELLIGENCE MODEL TRAINING

Publication number: 20230115723

Abstract: Techniques regarding training one or more AI models for a source code understanding task are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a training component that can train an artificial intelligence model on source code samples for a source code understanding task. The source code samples can be ranked based on code complexity.

Type: Application

Filed: September 30, 2021

Publication date: April 13, 2023

Inventors: Sahil Suneja, Yufan Zhuang, Yunhui Zheng, Alessandro Morari, Jim Alain Laredo
Probing Model Signal Awareness

Publication number: 20220358400

Abstract: A system, computer program product, and method are provided for probing model signal awareness. An iterative process is employed to systematically isolate one or more relevant tokens of an input sequence to generate a reduced input sequence. The reduced input sequence is validated and presented to a trained artificial intelligence (AI) model and prediction output is generated. The reduction process is continued while the prediction output stays the same as that of the input sequence, and until a minimal sub-sequence is identified. A signal existence in the minimal sub-sequence is verified and signal awareness of the trained AI model is evaluated. The evaluation includes measuring the verified signal existence against an original signal from the input sentence.

Type: Application

Filed: May 10, 2021

Publication date: November 10, 2022

Applicant: International Business Machines Corporation

Inventors: Yunhui Zheng, Sahil Suneja, Yufan Zhuang, Alessandro Morari, Jim Alain Laredo
SOURCE CODE FAULT DETECTION

Publication number: 20220308984

Abstract: Approaches presented herein enable fault detection. More specifically, implementation code of one or more functions is identified from source code. The implementation code of the one or more functions is converted to corresponding Abstract Syntax Trees (ASTs). The implementation code of the one or more functions is represented as a first plurality of sets of AST paths over the ASTs. Classification results for the one or more functions are generated with a classifier based on the first plurality of sets of AST paths for the implementation code of the one or more functions. Each of the classification results indicates a probability of having at least one fault in a corresponding function of the one or more functions. Fault detection results of the source code are generated based on the classification results.

Type: Application

Filed: March 29, 2021

Publication date: September 29, 2022

Inventors: Shiwan Zhao, Bo Yang, HongLei Guo, Zhong Su, Yunhui Zheng, Jim Alain Laredo, Alessandro Morari, Marco Pistoia
CONTEXTUAL EMBEDDINGS FOR IMPROVING STATIC ANALYZER OUTPUT

Publication number: 20220210178

Abstract: In a computer-implemented method for improving a static analyzer output, a processor receives a labeled data set with labeled true vulnerabilities and labeled false vulnerabilities. A processor receives pretrained contextual embeddings from a contextual embeddings model. A processor maps the true vulnerabilities and the false vulnerabilities to the pretrained contextual embeddings model. A processor generates a fine-tuned model with classifications for true vulnerabilities.

Type: Application

Filed: December 30, 2020

Publication date: June 30, 2022

Inventors: Saurabh Pujar, Luca Buratti, Alessandro Morari, Jim Alain Laredo, Mihaela Ancuta Bornea, Jeffrey Scott McCarley, Yunhui Zheng
Synthesizing sanitization code for applications based upon probabilistic prediction model

Patent number: 11194908

Abstract: Synthesizing sanitization code for applications based upon a probabilistic prediction model includes receiving a set of applications. The set of applications is partitioned into a first subset of applications and a second subset of applications. The first subset has one or more malicious payloads associated therewith, and the second subset has one or more non-malicious payloads associated therewith. A probabilistic prediction model is computed based upon the malicious payloads associated with the first subset of applications. One or more predicted malicious payloads are predicted from the probabilistic prediction model.

Type: Grant

Filed: January 8, 2019

Date of Patent: December 7, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Peng Liu, Yunhui Zheng, Marco Pistoia, Omer Tripp
Probabilistic matching of web application program interface code usage to specifications

Patent number: 11113029

Abstract: A method and system of matching an application program interface (API) code usage with an API specification are provided. A program having an API code usage is received and its features are extracted therefrom. Features from meta data of a plurality of API specifications are extracted. For each API specification of the plurality of API specifications, a match probability with the API code usage is determined. An API specification having a highest probability is determined. The API code usage is matched with the API specification having the highest probability.

Type: Grant

Filed: April 10, 2019

Date of Patent: September 7, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Annie T. Ying, Christopher Charles Young, John Erik Wittern, Yunhui Zheng, Jim Laredo, Aleksander Slominski
Gray-box testing based on concrete usages

Patent number: 11074158

Abstract: A computer implemented method for testing an application according to usage data includes receiving an application to be tested and a set of usage data corresponding to the application to be tested, wherein the set of usage data corresponds to previously executed code sequences, identifying one or more code sequences of interest corresponding to the received application, wherein the code sequences of interest correspond to codes sequences that are configured to exercise the received application, extracting concrete usages of the code sequence of interest from the received set of usage data, generating one or more test cases for the application according to the extracted usages, and providing the one or more generated test cases. The method may additionally include testing the application according to the one or more generated test cases.

Type: Grant

Filed: December 1, 2017

Date of Patent: July 27, 2021

Assignee: International Business Machines Corporation

Inventors: Jim A. Laredo, Aleksander Slominski, John E. Wittern, Annie T. Ying, Christopher C. Young, Yunhui Zheng
Approach to summarize code usage

Patent number: 11048505

Abstract: Techniques for autonomously generating a code usage summary associated with a web application programming interface request are provided. In one example, a computer-implemented method can comprise evaluating, by a system operatively coupled to a processor, data from a data repository, wherein the evaluating is based on a defined machine learning process. Also, the computer-implemented method can comprise generating, by the system, a usage summary of the data, wherein the usage summary is based on a statistic derived from a web application programming interface request, and the web application programming interface request is associated with the data.

Type: Grant

Filed: December 18, 2019

Date of Patent: June 29, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jim Laredo, Aleksander Slominski, John Erik Wittern, Annie Tsui Tsui Ying, Christopher Charles Young, Yunhui Zheng
Probabilistic Matching of Web Application Program Interface Code Usage to Specifications

Publication number: 20200326913

Abstract: A method and system of matching an application program interface (API) code usage with an API specification are provided. A program having an API code usage is received and its features are extracted therefrom. Features from meta data of a plurality of API specifications are extracted. For each API specification of the plurality of API specifications, a match probability with the API code usage is determined. An API specification having a highest probability is determined. The API code usage is matched with the API specification having the highest probability.

Type: Application

Filed: April 10, 2019

Publication date: October 15, 2020

Inventors: Annie T. Ying, Christopher Charles Young, John Erik Wittern, Yunhui Zheng, Jim Laredo, Aleksander Slominski
SYNTHESIZING SANITIZATION CODE FOR APPLICATIONS BASED UPON PROBABILISTIC PREDICTION MODEL

Publication number: 20200218805

Abstract: Synthesizing sanitization code for applications based upon a probabilistic prediction model includes receiving a set of applications. The set of applications is partitioned into a first subset of applications and a second subset of applications. The first subset has one or more malicious payloads associated therewith, and the second subset has one or more non-malicious payloads associated therewith. A probabilistic prediction model is computed based upon the malicious payloads associated with the first subset of applications. One or more predicted malicious payloads are predicted from the probabilistic prediction model.

Type: Application

Filed: January 8, 2019

Publication date: July 9, 2020

Applicant: International Business Machines Corporation

Inventors: Peng Liu, Yunhui Zheng, Marco Pistoia, Omer Tripp
APPROACH TO SUMMARIZE CODE USAGE

Publication number: 20200125362

Abstract: Techniques for autonomously generating a code usage summary associated with a web application programming interface request are provided. In one example, a computer-implemented method can comprise evaluating, by a system operatively coupled to a processor, data from a data repository, wherein the evaluating is based on a defined machine learning process. Also, the computer-implemented method can comprise generating, by the system, a usage summary of the data, wherein the usage summary is based on a statistic derived from a web application programming interface request, and the web application programming interface request is associated with the data.

Type: Application

Filed: December 18, 2019

Publication date: April 23, 2020

Inventors: Jim Laredo, Aleksander Slominski, John Erik Wittern, Annie Tsui Tsui Ying, Christopher Charles Young, Yunhui Zheng
Approach to summarize code usage

Patent number: 10620949

Abstract: Techniques for autonomously generating a code usage summary associated with a web application programming interface request are provided. In one example, a computer-implemented method can comprise evaluating, by a system operatively coupled to a processor, data from a data repository, wherein the evaluating is based on a defined machine learning process. Also, the computer-implemented method can comprise generating, by the system, a usage summary of the data, wherein the usage summary is based on a statistic derived from a web application programming interface request, and the web application programming interface request is associated with the data.

Type: Grant

Filed: December 14, 2017

Date of Patent: April 14, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jim Laredo, Aleksander Slominski, John Erik Wittern, Annie Tsui Tsui Ying, Christopher Charles Young, Yunhui Zheng
Approach to summarize code usage

Patent number: 10558459

Abstract: Techniques for autonomously generating a code usage summary associated with a web application programming interface request are provided. In one example, a computer-implemented method can comprise evaluating, by a system operatively coupled to a processor, data from a data repository, wherein the evaluating is based on a defined machine learning process. Also, the computer-implemented method can comprise generating, by the system, a usage summary of the data, wherein the usage summary is based on a statistic derived from a web application programming interface request, and the web application programming interface request is associated with the data.

Type: Grant

Filed: May 12, 2017

Date of Patent: February 11, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jim Laredo, Aleksander Slominski, John Erik Wittern, Annie Tsui Tsui Ying, Christopher Charles Young, Yunhui Zheng
Automatically running tests against WEB APIs based on specifications

Patent number: 10409711

Abstract: A method and system of determining whether a specification is an accurate representation of an application program interface (API) is provided. The specification is received electronically over a network. Service calls to be tested are identified based on the specification. A test case is created for each of the identified service calls. A sequence is created for the test cases. A test plan is generated based on the created sequence. The generated test plan is executed. Upon identifying an error in response to the executed test plan, a notification is generated, indicating that the specification is not an accurate representation of the API.

Type: Grant

Filed: June 12, 2017

Date of Patent: September 10, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Julian Timothy Dolby, Jim Alain Laredo, Aleksander Slominski, John Erik Wittern, Annie T. Ying, Christopher Young, Yunhui Zheng

1 2 next