Patents by Inventor Liujia Shao

Liujia Shao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11604640
    Abstract: An approach to code refactor renaming may be provided. Source code with a naming convention for functions and classes can be presented to a machine learning model. The model may identify the names for functions and classes. The identified names may be tokenized. Docstrings associated with functions and classes may be identified. Code for the identified functions and classes and associated may be input into a feature vector generation mechanism. A model may be trained mapping the generated feature vectors to tokenized identified names, via regression. The model can be utilized to analyze input code with the same naming convention to predict names for functions and classes, allowing for the recommendation of function and class names in accordance with the programming code naming convention.
    Type: Grant
    Filed: December 11, 2020
    Date of Patent: March 14, 2023
    Assignee: International Business Machines Corporation
    Inventors: Liujia Shao, Yan Luo, Yan Xu
  • Patent number: 11422798
    Abstract: Techniques for context-based word embedding for programming artifacts are described herein. An aspect includes determining a plurality of keywords based on a corpus of programming artifacts, the corpus of programming artifacts including source code corresponding to a software project. Another aspect includes determining a plurality of context/keyword pair sets based on the plurality of keywords and the corpus of programming artifacts, wherein each context/keyword pair set of the plurality of context/keyword pair sets includes a first keyword, a second keyword, and a context type corresponding to a co-occurrence of the first keyword and the second keyword in the corpus of programming artifacts. Another aspect includes constructing a word embedding matrix based on the plurality of context/keyword pair sets.
    Type: Grant
    Filed: February 26, 2020
    Date of Patent: August 23, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yan Luo, Liujia Shao, Yan Xu, Sibin Fan
  • Publication number: 20220188102
    Abstract: An approach to code refactor renaming may be provided. Source code with a naming convention for functions and classes can be presented to a machine learning model. The model may identify the names for functions and classes. The identified names may be tokenized. Docstrings associated with functions and classes may be identified. Code for the identified functions and classes and associated may be input into a feature vector generation mechanism. A model may be trained mapping the generated feature vectors to tokenized identified names, via regression. The model can be utilized to analyze input code with the same naming convention to predict names for functions and classes, allowing for the recommendation of function and class names in accordance with the programming code naming convention.
    Type: Application
    Filed: December 11, 2020
    Publication date: June 16, 2022
    Inventors: Liujia SHAO, Yan LUO, Yan XU
  • Publication number: 20220180240
    Abstract: A computer-implemented process for transaction composition graph node embedding comprising traversing a data flow of transactions to convert a full graph to multiple directed acyclic subgraphs/paths in spanning trees, taking one-by-one nodes as input to a predetermined neural network, generating a set of one-hot vectors for all nodes, computing an embedding vector from a corresponding one-hot vector, computing a probability that an output node is nearby, and embedding the node to a latent feature vector.
    Type: Application
    Filed: December 3, 2020
    Publication date: June 9, 2022
    Inventors: Yan Luo, Liujia Shao, Yan Xu
  • Patent number: 11262985
    Abstract: In an approach to creating code snippet auto-commenting models utilizing a pre-training model leveraging dependency data, one or more computer processors create a generalized pre-training model trained with one or more dependencies and one or more associated dependency embeddings, wherein dependencies include frameworks, imported libraries, header files, and application programming interfaces associated with a software project. The one or more computer processors create a subsequent model with a model architecture identical to the created pre-training model. The one or more computer processors computationally reduce a training of the created subsequent model utilizing one or more trained parameters, activations, memory cells, and context vectors contained in the created pre-training model. The one or more computer processors create deploy the subsequent model to one to more production environments.
    Type: Grant
    Filed: March 10, 2020
    Date of Patent: March 1, 2022
    Assignee: International Business Machines Corporation
    Inventors: Yan Luo, Liujia Shao, Yan Xu, Sibin Fan
  • Patent number: 11176019
    Abstract: Methods, systems, and computer program products for automated breakpoint creation using machine learning are provided. Aspects include obtaining a bug report for a software and source code for the software and analyzing the bug report to determine a bug type for the bug report, where analyzing the bug report includes using a bug type labeling model. Aspects also include analyzing the source code to identify a code snippet in the source code based on the bug type, where analyzing the source code includes using a source code detection model. Aspects further include inserting a breakpoint in the source code at the code snippet.
    Type: Grant
    Filed: April 1, 2020
    Date of Patent: November 16, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Liujia Shao, Yan Luo, Yan Xu, Sibin Fan
  • Publication number: 20210311853
    Abstract: Methods, systems, and computer program products for automated breakpoint creation using machine learning are provided. Aspects include obtaining a bug report for a software and source code for the software and analyzing the bug report to determine a bug type for the bug report, where analyzing the bug report includes using a bug type labeling model. Aspects also include analyzing the source code to identify a code snippet in the source code based on the bug type, where analyzing the source code includes using a source code detection model. Aspects further include inserting a breakpoint in the source code at the code snippet.
    Type: Application
    Filed: April 1, 2020
    Publication date: October 7, 2021
    Inventors: LIUJIA SHAO, YAN LUO, YAN XU, SIBIN FAN
  • Publication number: 20210286598
    Abstract: In an approach to creating code snippet auto-commenting models utilizing a pre-training model leveraging dependency data, one or more computer processors create a generalized pre-training model trained with one or more dependencies and one or more associated dependency embeddings, wherein dependencies include frameworks, imported libraries, header files, and application programming interfaces associated with a software project. The one or more computer processors create a subsequent model with a model architecture identical to the created pre-training model. The one or more computer processors computationally reduce a training of the created subsequent model utilizing one or more trained parameters, activations, memory cells, and context vectors contained in the created pre-training model. The one or more computer processors create deploy the subsequent model to one to more production environments.
    Type: Application
    Filed: March 10, 2020
    Publication date: September 16, 2021
    Inventors: YAN LUO, Liujia Shao, YAN XU, SIBIN FAN
  • Patent number: 11119898
    Abstract: Techniques for automatic code coverage file recommendation are described herein. An aspect includes receiving historical code coverage data. Another aspect includes clustering the historical code coverage data. Another aspect includes performing content filtering based on the clustered historical code coverage data to determine a content filtering preferred file list. Another aspect includes performing collaborative filtering based on the clustered historical code coverage data to determine a collaborative filtering preferred file list. Another aspect includes combining the content filtering preferred file list and the collaborative filtering preferred file list to determine a code coverage file recommendation list. Another aspect includes providing the code coverage file recommendation list to a user.
    Type: Grant
    Filed: May 7, 2020
    Date of Patent: September 14, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Liujia Shao, Yan Luo, Yan Xu, Sibin Fan
  • Publication number: 20210263732
    Abstract: Techniques for context-based word embedding for programming artifacts are described herein. An aspect includes determining a plurality of keywords based on a corpus of programming artifacts, the corpus of programming artifacts including source code corresponding to a software project. Another aspect includes determining a plurality of context/keyword pair sets based on the plurality of keywords and the corpus of programming artifacts, wherein each context/keyword pair set of the plurality of context/keyword pair sets includes a first keyword, a second keyword, and a context type corresponding to a co-occurrence of the first keyword and the second keyword in the corpus of programming artifacts. Another aspect includes constructing a word embedding matrix based on the plurality of context/keyword pair sets.
    Type: Application
    Filed: February 26, 2020
    Publication date: August 26, 2021
    Inventors: YAN LUO, Liujia Shao, YAN XU, SIBIN FAN
  • Publication number: 20210149793
    Abstract: Provided is a method, a system, and a computer program product for determining a cognitive code coverage weight for code snippets located in a portion of code. The method includes generating a set of samples from code snippets included in a portion of code. Each sample in the set of samples includes features derived from the code snippets. The method further includes generating corresponding labels for the set of samples and creating a training dataset by applying the labels to the set of samples. The training dataset includes the set of samples with each sample including a label from the labels generated. The method further includes training a machine learning model using the labeled dataset to output a code coverage weight for code snippets.
    Type: Application
    Filed: November 19, 2019
    Publication date: May 20, 2021
    Inventors: Liujia Shao, Yan Luo, Yan Xu, Anton Karputkin