Patents by Inventor Yunhui Zheng
Yunhui Zheng has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11947940Abstract: Techniques regarding augmenting one or more training datasets for training one or more AI models are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise training augmentation component that can generate an augmented training dataset for training an artificial intelligence model by extracting a simplified source code sample from a source code sample comprised within a training dataset.Type: GrantFiled: October 11, 2021Date of Patent: April 2, 2024Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sahil Suneja, Yufan Zhuang, Yunhui Zheng, Alessandro Morari, Jim Alain Laredo
-
Patent number: 11765193Abstract: In a computer-implemented method for improving a static analyzer output, a processor receives a labeled data set with labeled true vulnerabilities and labeled false vulnerabilities. A processor receives pretrained contextual embeddings from a contextual embeddings model. A processor maps the true vulnerabilities and the false vulnerabilities to the pretrained contextual embeddings model. A processor generates a fine-tuned model with classifications for true vulnerabilities.Type: GrantFiled: December 30, 2020Date of Patent: September 19, 2023Assignee: International Business Machines CorporationInventors: Saurabh Pujar, Luca Buratti, Alessandro Morari, Jim Alain Laredo, Mihaela Ancuta Bornea, Jeffrey Scott McCarley, Yunhui Zheng
-
Patent number: 11762758Abstract: Approaches presented herein enable fault detection. More specifically, implementation code of one or more functions is identified from source code. The implementation code of the one or more functions is converted to corresponding Abstract Syntax Trees (ASTs). The implementation code of the one or more functions is represented as a first plurality of sets of AST paths over the ASTs. Classification results for the one or more functions are generated with a classifier based on the first plurality of sets of AST paths for the implementation code of the one or more functions. Each of the classification results indicates a probability of having at least one fault in a corresponding function of the one or more functions. Fault detection results of the source code are generated based on the classification results.Type: GrantFiled: March 29, 2021Date of Patent: September 19, 2023Assignee: International Business Machines CorporationInventors: Shiwan Zhao, Bo Yang, HongLei Guo, Zhong Su, Yunhui Zheng, Jim Alain Laredo, Alessandro Morari, Marco Pistoia
-
Patent number: 11663110Abstract: A debugging tool and method for statically verifying programs that invoke web-based services through API calls is provided. The tool receives source code that comprises one or more invocation of web APIs for requesting web-based services. The tool also receives a set of web API specifications. The tool extracts a set of request information for each web API invocation in the source code, the set of request information including a usage string of an URL endpoint. The tool verifies whether the set of request information complies with the received web API specifications and reports a result of the verification.Type: GrantFiled: October 31, 2016Date of Patent: May 30, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Julian T. Dolby, Jim A. Laredo, John E. Wittern, Annie T. T. Ying, Yunhui Zheng
-
Publication number: 20230130781Abstract: Techniques regarding AI model introspection are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise model introspection component that can analyze artificial intelligence model learning behavior for a code understanding task by comparing an output of an artificial intelligence model with respect to a plurality of testing data subsets that have varying code complexity distributions.Type: ApplicationFiled: October 21, 2021Publication date: April 27, 2023Inventors: Sahil Suneja, Yufan Zhuang, Yunhui Zheng, Alessandro Morari, Jim Alain Laredo
-
Publication number: 20230115723Abstract: Techniques regarding training one or more AI models for a source code understanding task are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a training component that can train an artificial intelligence model on source code samples for a source code understanding task. The source code samples can be ranked based on code complexity.Type: ApplicationFiled: September 30, 2021Publication date: April 13, 2023Inventors: Sahil Suneja, Yufan Zhuang, Yunhui Zheng, Alessandro Morari, Jim Alain Laredo
-
Publication number: 20230113733Abstract: Techniques regarding augmenting one or more training datasets for training one or more AI models are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise training augmentation component that can generate an augmented training dataset for training an artificial intelligence model by extracting a simplified source code sample from a source code sample comprised within a training dataset.Type: ApplicationFiled: October 11, 2021Publication date: April 13, 2023Inventors: Sahil Suneja, Yufan Zhuang, Yunhui Zheng, Alessandro Morari, Jim Alain Laredo
-
Publication number: 20220358400Abstract: A system, computer program product, and method are provided for probing model signal awareness. An iterative process is employed to systematically isolate one or more relevant tokens of an input sequence to generate a reduced input sequence. The reduced input sequence is validated and presented to a trained artificial intelligence (AI) model and prediction output is generated. The reduction process is continued while the prediction output stays the same as that of the input sequence, and until a minimal sub-sequence is identified. A signal existence in the minimal sub-sequence is verified and signal awareness of the trained AI model is evaluated. The evaluation includes measuring the verified signal existence against an original signal from the input sentence.Type: ApplicationFiled: May 10, 2021Publication date: November 10, 2022Applicant: International Business Machines CorporationInventors: Yunhui Zheng, Sahil Suneja, Yufan Zhuang, Alessandro Morari, Jim Alain Laredo
-
Publication number: 20220308984Abstract: Approaches presented herein enable fault detection. More specifically, implementation code of one or more functions is identified from source code. The implementation code of the one or more functions is converted to corresponding Abstract Syntax Trees (ASTs). The implementation code of the one or more functions is represented as a first plurality of sets of AST paths over the ASTs. Classification results for the one or more functions are generated with a classifier based on the first plurality of sets of AST paths for the implementation code of the one or more functions. Each of the classification results indicates a probability of having at least one fault in a corresponding function of the one or more functions. Fault detection results of the source code are generated based on the classification results.Type: ApplicationFiled: March 29, 2021Publication date: September 29, 2022Inventors: Shiwan Zhao, Bo Yang, HongLei Guo, Zhong Su, Yunhui Zheng, Jim Alain Laredo, Alessandro Morari, Marco Pistoia
-
Publication number: 20220210178Abstract: In a computer-implemented method for improving a static analyzer output, a processor receives a labeled data set with labeled true vulnerabilities and labeled false vulnerabilities. A processor receives pretrained contextual embeddings from a contextual embeddings model. A processor maps the true vulnerabilities and the false vulnerabilities to the pretrained contextual embeddings model. A processor generates a fine-tuned model with classifications for true vulnerabilities.Type: ApplicationFiled: December 30, 2020Publication date: June 30, 2022Inventors: Saurabh Pujar, Luca Buratti, Alessandro Morari, Jim Alain Laredo, Mihaela Ancuta Bornea, Jeffrey Scott McCarley, Yunhui Zheng
-
Patent number: 11194908Abstract: Synthesizing sanitization code for applications based upon a probabilistic prediction model includes receiving a set of applications. The set of applications is partitioned into a first subset of applications and a second subset of applications. The first subset has one or more malicious payloads associated therewith, and the second subset has one or more non-malicious payloads associated therewith. A probabilistic prediction model is computed based upon the malicious payloads associated with the first subset of applications. One or more predicted malicious payloads are predicted from the probabilistic prediction model.Type: GrantFiled: January 8, 2019Date of Patent: December 7, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Peng Liu, Yunhui Zheng, Marco Pistoia, Omer Tripp
-
Patent number: 11113029Abstract: A method and system of matching an application program interface (API) code usage with an API specification are provided. A program having an API code usage is received and its features are extracted therefrom. Features from meta data of a plurality of API specifications are extracted. For each API specification of the plurality of API specifications, a match probability with the API code usage is determined. An API specification having a highest probability is determined. The API code usage is matched with the API specification having the highest probability.Type: GrantFiled: April 10, 2019Date of Patent: September 7, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Annie T. Ying, Christopher Charles Young, John Erik Wittern, Yunhui Zheng, Jim Laredo, Aleksander Slominski
-
Patent number: 11074158Abstract: A computer implemented method for testing an application according to usage data includes receiving an application to be tested and a set of usage data corresponding to the application to be tested, wherein the set of usage data corresponds to previously executed code sequences, identifying one or more code sequences of interest corresponding to the received application, wherein the code sequences of interest correspond to codes sequences that are configured to exercise the received application, extracting concrete usages of the code sequence of interest from the received set of usage data, generating one or more test cases for the application according to the extracted usages, and providing the one or more generated test cases. The method may additionally include testing the application according to the one or more generated test cases.Type: GrantFiled: December 1, 2017Date of Patent: July 27, 2021Assignee: International Business Machines CorporationInventors: Jim A. Laredo, Aleksander Slominski, John E. Wittern, Annie T. Ying, Christopher C. Young, Yunhui Zheng
-
Patent number: 11048505Abstract: Techniques for autonomously generating a code usage summary associated with a web application programming interface request are provided. In one example, a computer-implemented method can comprise evaluating, by a system operatively coupled to a processor, data from a data repository, wherein the evaluating is based on a defined machine learning process. Also, the computer-implemented method can comprise generating, by the system, a usage summary of the data, wherein the usage summary is based on a statistic derived from a web application programming interface request, and the web application programming interface request is associated with the data.Type: GrantFiled: December 18, 2019Date of Patent: June 29, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jim Laredo, Aleksander Slominski, John Erik Wittern, Annie Tsui Tsui Ying, Christopher Charles Young, Yunhui Zheng
-
Publication number: 20200326913Abstract: A method and system of matching an application program interface (API) code usage with an API specification are provided. A program having an API code usage is received and its features are extracted therefrom. Features from meta data of a plurality of API specifications are extracted. For each API specification of the plurality of API specifications, a match probability with the API code usage is determined. An API specification having a highest probability is determined. The API code usage is matched with the API specification having the highest probability.Type: ApplicationFiled: April 10, 2019Publication date: October 15, 2020Inventors: Annie T. Ying, Christopher Charles Young, John Erik Wittern, Yunhui Zheng, Jim Laredo, Aleksander Slominski
-
Publication number: 20200218805Abstract: Synthesizing sanitization code for applications based upon a probabilistic prediction model includes receiving a set of applications. The set of applications is partitioned into a first subset of applications and a second subset of applications. The first subset has one or more malicious payloads associated therewith, and the second subset has one or more non-malicious payloads associated therewith. A probabilistic prediction model is computed based upon the malicious payloads associated with the first subset of applications. One or more predicted malicious payloads are predicted from the probabilistic prediction model.Type: ApplicationFiled: January 8, 2019Publication date: July 9, 2020Applicant: International Business Machines CorporationInventors: Peng Liu, Yunhui Zheng, Marco Pistoia, Omer Tripp
-
Publication number: 20200125362Abstract: Techniques for autonomously generating a code usage summary associated with a web application programming interface request are provided. In one example, a computer-implemented method can comprise evaluating, by a system operatively coupled to a processor, data from a data repository, wherein the evaluating is based on a defined machine learning process. Also, the computer-implemented method can comprise generating, by the system, a usage summary of the data, wherein the usage summary is based on a statistic derived from a web application programming interface request, and the web application programming interface request is associated with the data.Type: ApplicationFiled: December 18, 2019Publication date: April 23, 2020Inventors: Jim Laredo, Aleksander Slominski, John Erik Wittern, Annie Tsui Tsui Ying, Christopher Charles Young, Yunhui Zheng
-
Patent number: 10620949Abstract: Techniques for autonomously generating a code usage summary associated with a web application programming interface request are provided. In one example, a computer-implemented method can comprise evaluating, by a system operatively coupled to a processor, data from a data repository, wherein the evaluating is based on a defined machine learning process. Also, the computer-implemented method can comprise generating, by the system, a usage summary of the data, wherein the usage summary is based on a statistic derived from a web application programming interface request, and the web application programming interface request is associated with the data.Type: GrantFiled: December 14, 2017Date of Patent: April 14, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jim Laredo, Aleksander Slominski, John Erik Wittern, Annie Tsui Tsui Ying, Christopher Charles Young, Yunhui Zheng
-
Patent number: 10558459Abstract: Techniques for autonomously generating a code usage summary associated with a web application programming interface request are provided. In one example, a computer-implemented method can comprise evaluating, by a system operatively coupled to a processor, data from a data repository, wherein the evaluating is based on a defined machine learning process. Also, the computer-implemented method can comprise generating, by the system, a usage summary of the data, wherein the usage summary is based on a statistic derived from a web application programming interface request, and the web application programming interface request is associated with the data.Type: GrantFiled: May 12, 2017Date of Patent: February 11, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jim Laredo, Aleksander Slominski, John Erik Wittern, Annie Tsui Tsui Ying, Christopher Charles Young, Yunhui Zheng
-
Patent number: 10409711Abstract: A method and system of determining whether a specification is an accurate representation of an application program interface (API) is provided. The specification is received electronically over a network. Service calls to be tested are identified based on the specification. A test case is created for each of the identified service calls. A sequence is created for the test cases. A test plan is generated based on the created sequence. The generated test plan is executed. Upon identifying an error in response to the executed test plan, a notification is generated, indicating that the specification is not an accurate representation of the API.Type: GrantFiled: June 12, 2017Date of Patent: September 10, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Julian Timothy Dolby, Jim Alain Laredo, Aleksander Slominski, John Erik Wittern, Annie T. Ying, Christopher Young, Yunhui Zheng