Patents by Inventor Yunhui Zheng

Yunhui Zheng has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11947940
    Abstract: Techniques regarding augmenting one or more training datasets for training one or more AI models are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise training augmentation component that can generate an augmented training dataset for training an artificial intelligence model by extracting a simplified source code sample from a source code sample comprised within a training dataset.
    Type: Grant
    Filed: October 11, 2021
    Date of Patent: April 2, 2024
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sahil Suneja, Yufan Zhuang, Yunhui Zheng, Alessandro Morari, Jim Alain Laredo
  • Patent number: 11765193
    Abstract: In a computer-implemented method for improving a static analyzer output, a processor receives a labeled data set with labeled true vulnerabilities and labeled false vulnerabilities. A processor receives pretrained contextual embeddings from a contextual embeddings model. A processor maps the true vulnerabilities and the false vulnerabilities to the pretrained contextual embeddings model. A processor generates a fine-tuned model with classifications for true vulnerabilities.
    Type: Grant
    Filed: December 30, 2020
    Date of Patent: September 19, 2023
    Assignee: International Business Machines Corporation
    Inventors: Saurabh Pujar, Luca Buratti, Alessandro Morari, Jim Alain Laredo, Mihaela Ancuta Bornea, Jeffrey Scott McCarley, Yunhui Zheng
  • Patent number: 11762758
    Abstract: Approaches presented herein enable fault detection. More specifically, implementation code of one or more functions is identified from source code. The implementation code of the one or more functions is converted to corresponding Abstract Syntax Trees (ASTs). The implementation code of the one or more functions is represented as a first plurality of sets of AST paths over the ASTs. Classification results for the one or more functions are generated with a classifier based on the first plurality of sets of AST paths for the implementation code of the one or more functions. Each of the classification results indicates a probability of having at least one fault in a corresponding function of the one or more functions. Fault detection results of the source code are generated based on the classification results.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: September 19, 2023
    Assignee: International Business Machines Corporation
    Inventors: Shiwan Zhao, Bo Yang, HongLei Guo, Zhong Su, Yunhui Zheng, Jim Alain Laredo, Alessandro Morari, Marco Pistoia
  • Patent number: 11663110
    Abstract: A debugging tool and method for statically verifying programs that invoke web-based services through API calls is provided. The tool receives source code that comprises one or more invocation of web APIs for requesting web-based services. The tool also receives a set of web API specifications. The tool extracts a set of request information for each web API invocation in the source code, the set of request information including a usage string of an URL endpoint. The tool verifies whether the set of request information complies with the received web API specifications and reports a result of the verification.
    Type: Grant
    Filed: October 31, 2016
    Date of Patent: May 30, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Julian T. Dolby, Jim A. Laredo, John E. Wittern, Annie T. T. Ying, Yunhui Zheng
  • Publication number: 20230130781
    Abstract: Techniques regarding AI model introspection are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise model introspection component that can analyze artificial intelligence model learning behavior for a code understanding task by comparing an output of an artificial intelligence model with respect to a plurality of testing data subsets that have varying code complexity distributions.
    Type: Application
    Filed: October 21, 2021
    Publication date: April 27, 2023
    Inventors: Sahil Suneja, Yufan Zhuang, Yunhui Zheng, Alessandro Morari, Jim Alain Laredo
  • Publication number: 20230115723
    Abstract: Techniques regarding training one or more AI models for a source code understanding task are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a training component that can train an artificial intelligence model on source code samples for a source code understanding task. The source code samples can be ranked based on code complexity.
    Type: Application
    Filed: September 30, 2021
    Publication date: April 13, 2023
    Inventors: Sahil Suneja, Yufan Zhuang, Yunhui Zheng, Alessandro Morari, Jim Alain Laredo
  • Publication number: 20230113733
    Abstract: Techniques regarding augmenting one or more training datasets for training one or more AI models are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise training augmentation component that can generate an augmented training dataset for training an artificial intelligence model by extracting a simplified source code sample from a source code sample comprised within a training dataset.
    Type: Application
    Filed: October 11, 2021
    Publication date: April 13, 2023
    Inventors: Sahil Suneja, Yufan Zhuang, Yunhui Zheng, Alessandro Morari, Jim Alain Laredo
  • Publication number: 20220358400
    Abstract: A system, computer program product, and method are provided for probing model signal awareness. An iterative process is employed to systematically isolate one or more relevant tokens of an input sequence to generate a reduced input sequence. The reduced input sequence is validated and presented to a trained artificial intelligence (AI) model and prediction output is generated. The reduction process is continued while the prediction output stays the same as that of the input sequence, and until a minimal sub-sequence is identified. A signal existence in the minimal sub-sequence is verified and signal awareness of the trained AI model is evaluated. The evaluation includes measuring the verified signal existence against an original signal from the input sentence.
    Type: Application
    Filed: May 10, 2021
    Publication date: November 10, 2022
    Applicant: International Business Machines Corporation
    Inventors: Yunhui Zheng, Sahil Suneja, Yufan Zhuang, Alessandro Morari, Jim Alain Laredo
  • Publication number: 20220308984
    Abstract: Approaches presented herein enable fault detection. More specifically, implementation code of one or more functions is identified from source code. The implementation code of the one or more functions is converted to corresponding Abstract Syntax Trees (ASTs). The implementation code of the one or more functions is represented as a first plurality of sets of AST paths over the ASTs. Classification results for the one or more functions are generated with a classifier based on the first plurality of sets of AST paths for the implementation code of the one or more functions. Each of the classification results indicates a probability of having at least one fault in a corresponding function of the one or more functions. Fault detection results of the source code are generated based on the classification results.
    Type: Application
    Filed: March 29, 2021
    Publication date: September 29, 2022
    Inventors: Shiwan Zhao, Bo Yang, HongLei Guo, Zhong Su, Yunhui Zheng, Jim Alain Laredo, Alessandro Morari, Marco Pistoia
  • Publication number: 20220210178
    Abstract: In a computer-implemented method for improving a static analyzer output, a processor receives a labeled data set with labeled true vulnerabilities and labeled false vulnerabilities. A processor receives pretrained contextual embeddings from a contextual embeddings model. A processor maps the true vulnerabilities and the false vulnerabilities to the pretrained contextual embeddings model. A processor generates a fine-tuned model with classifications for true vulnerabilities.
    Type: Application
    Filed: December 30, 2020
    Publication date: June 30, 2022
    Inventors: Saurabh Pujar, Luca Buratti, Alessandro Morari, Jim Alain Laredo, Mihaela Ancuta Bornea, Jeffrey Scott McCarley, Yunhui Zheng
  • Patent number: 11194908
    Abstract: Synthesizing sanitization code for applications based upon a probabilistic prediction model includes receiving a set of applications. The set of applications is partitioned into a first subset of applications and a second subset of applications. The first subset has one or more malicious payloads associated therewith, and the second subset has one or more non-malicious payloads associated therewith. A probabilistic prediction model is computed based upon the malicious payloads associated with the first subset of applications. One or more predicted malicious payloads are predicted from the probabilistic prediction model.
    Type: Grant
    Filed: January 8, 2019
    Date of Patent: December 7, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Peng Liu, Yunhui Zheng, Marco Pistoia, Omer Tripp
  • Patent number: 11113029
    Abstract: A method and system of matching an application program interface (API) code usage with an API specification are provided. A program having an API code usage is received and its features are extracted therefrom. Features from meta data of a plurality of API specifications are extracted. For each API specification of the plurality of API specifications, a match probability with the API code usage is determined. An API specification having a highest probability is determined. The API code usage is matched with the API specification having the highest probability.
    Type: Grant
    Filed: April 10, 2019
    Date of Patent: September 7, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Annie T. Ying, Christopher Charles Young, John Erik Wittern, Yunhui Zheng, Jim Laredo, Aleksander Slominski
  • Patent number: 11074158
    Abstract: A computer implemented method for testing an application according to usage data includes receiving an application to be tested and a set of usage data corresponding to the application to be tested, wherein the set of usage data corresponds to previously executed code sequences, identifying one or more code sequences of interest corresponding to the received application, wherein the code sequences of interest correspond to codes sequences that are configured to exercise the received application, extracting concrete usages of the code sequence of interest from the received set of usage data, generating one or more test cases for the application according to the extracted usages, and providing the one or more generated test cases. The method may additionally include testing the application according to the one or more generated test cases.
    Type: Grant
    Filed: December 1, 2017
    Date of Patent: July 27, 2021
    Assignee: International Business Machines Corporation
    Inventors: Jim A. Laredo, Aleksander Slominski, John E. Wittern, Annie T. Ying, Christopher C. Young, Yunhui Zheng
  • Patent number: 11048505
    Abstract: Techniques for autonomously generating a code usage summary associated with a web application programming interface request are provided. In one example, a computer-implemented method can comprise evaluating, by a system operatively coupled to a processor, data from a data repository, wherein the evaluating is based on a defined machine learning process. Also, the computer-implemented method can comprise generating, by the system, a usage summary of the data, wherein the usage summary is based on a statistic derived from a web application programming interface request, and the web application programming interface request is associated with the data.
    Type: Grant
    Filed: December 18, 2019
    Date of Patent: June 29, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jim Laredo, Aleksander Slominski, John Erik Wittern, Annie Tsui Tsui Ying, Christopher Charles Young, Yunhui Zheng
  • Publication number: 20200326913
    Abstract: A method and system of matching an application program interface (API) code usage with an API specification are provided. A program having an API code usage is received and its features are extracted therefrom. Features from meta data of a plurality of API specifications are extracted. For each API specification of the plurality of API specifications, a match probability with the API code usage is determined. An API specification having a highest probability is determined. The API code usage is matched with the API specification having the highest probability.
    Type: Application
    Filed: April 10, 2019
    Publication date: October 15, 2020
    Inventors: Annie T. Ying, Christopher Charles Young, John Erik Wittern, Yunhui Zheng, Jim Laredo, Aleksander Slominski
  • Publication number: 20200218805
    Abstract: Synthesizing sanitization code for applications based upon a probabilistic prediction model includes receiving a set of applications. The set of applications is partitioned into a first subset of applications and a second subset of applications. The first subset has one or more malicious payloads associated therewith, and the second subset has one or more non-malicious payloads associated therewith. A probabilistic prediction model is computed based upon the malicious payloads associated with the first subset of applications. One or more predicted malicious payloads are predicted from the probabilistic prediction model.
    Type: Application
    Filed: January 8, 2019
    Publication date: July 9, 2020
    Applicant: International Business Machines Corporation
    Inventors: Peng Liu, Yunhui Zheng, Marco Pistoia, Omer Tripp
  • Publication number: 20200125362
    Abstract: Techniques for autonomously generating a code usage summary associated with a web application programming interface request are provided. In one example, a computer-implemented method can comprise evaluating, by a system operatively coupled to a processor, data from a data repository, wherein the evaluating is based on a defined machine learning process. Also, the computer-implemented method can comprise generating, by the system, a usage summary of the data, wherein the usage summary is based on a statistic derived from a web application programming interface request, and the web application programming interface request is associated with the data.
    Type: Application
    Filed: December 18, 2019
    Publication date: April 23, 2020
    Inventors: Jim Laredo, Aleksander Slominski, John Erik Wittern, Annie Tsui Tsui Ying, Christopher Charles Young, Yunhui Zheng
  • Patent number: 10620949
    Abstract: Techniques for autonomously generating a code usage summary associated with a web application programming interface request are provided. In one example, a computer-implemented method can comprise evaluating, by a system operatively coupled to a processor, data from a data repository, wherein the evaluating is based on a defined machine learning process. Also, the computer-implemented method can comprise generating, by the system, a usage summary of the data, wherein the usage summary is based on a statistic derived from a web application programming interface request, and the web application programming interface request is associated with the data.
    Type: Grant
    Filed: December 14, 2017
    Date of Patent: April 14, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jim Laredo, Aleksander Slominski, John Erik Wittern, Annie Tsui Tsui Ying, Christopher Charles Young, Yunhui Zheng
  • Patent number: 10558459
    Abstract: Techniques for autonomously generating a code usage summary associated with a web application programming interface request are provided. In one example, a computer-implemented method can comprise evaluating, by a system operatively coupled to a processor, data from a data repository, wherein the evaluating is based on a defined machine learning process. Also, the computer-implemented method can comprise generating, by the system, a usage summary of the data, wherein the usage summary is based on a statistic derived from a web application programming interface request, and the web application programming interface request is associated with the data.
    Type: Grant
    Filed: May 12, 2017
    Date of Patent: February 11, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jim Laredo, Aleksander Slominski, John Erik Wittern, Annie Tsui Tsui Ying, Christopher Charles Young, Yunhui Zheng
  • Patent number: 10409711
    Abstract: A method and system of determining whether a specification is an accurate representation of an application program interface (API) is provided. The specification is received electronically over a network. Service calls to be tested are identified based on the specification. A test case is created for each of the identified service calls. A sequence is created for the test cases. A test plan is generated based on the created sequence. The generated test plan is executed. Upon identifying an error in response to the executed test plan, a notification is generated, indicating that the specification is not an accurate representation of the API.
    Type: Grant
    Filed: June 12, 2017
    Date of Patent: September 10, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Julian Timothy Dolby, Jim Alain Laredo, Aleksander Slominski, John Erik Wittern, Annie T. Ying, Christopher Young, Yunhui Zheng