Patents by Inventor Bruce CLEMENTS

Bruce CLEMENTS has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Neural method completion based on natural language and source code

Patent number: 11972232

Abstract: A code completion tool uses a neural transformer model with attention to generate candidate sequences to complete a method body of a method signature. The neural transformer model is trained with source code programs and natural language text. The neural transformer model learns the meaning of a method name, its corresponding method parameters and types from a large corpus of unsupervised dataset of source code methods and a supervised dataset of tasks including source code constructs in combination with natural language docstrings to infer a candidate sequence of subtokens that represent a method body for a particular method signature.

Type: Grant

Filed: June 10, 2020

Date of Patent: April 30, 2024

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.

Inventors: Colin Bruce Clement, Dawn Drain, Neelakantan Sundaresan, Alexey Svyatkovskiy
Automated notebook completion using sequence-to-sequence transformer

Patent number: 11954429

Abstract: Generally discussed herein are devices, systems, and methods for generating an automatic interactive digital notebook completion model. A method can include receiving notebook content of an interactive digital notebook, the notebook content including a markdown cell followed by a code cell. The method can include generating input/output examples by, for each input/output example by masking one of (i) content of the markdown cell or (ii) content of the code cell resulting in a masked cell, identifying the masked cell and content of another cell of the markdown cell or the code that is not masked as an input for an input/output example, and identifying the content of the masked cell as an output for the input/output example. The method can include training, based on the input/output examples, a natural language processing model that generates a prediction of the content of a second masked cell as an output.

Type: Grant

Filed: December 8, 2021

Date of Patent: April 9, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Colin Bruce Clement, Shubham Chandel, Guillermo Serrato Castilla, Neelakantan Sundaresan
Custom models for source code generation via prefix-tuning

Patent number: 11947935

Abstract: Custom source code generation models are generated by tuning a pre-trained deep learning model by freezing the model parameters and optimizing a prefix. The tuning process is distributed across a user space and a model space where the embedding and output layers are performed in the user space and the execution of the model is performed in a model space that is isolated from the user space. The tuning process updates the embeddings of the prefix across the separate execution spaces in a manner that preserves the privacy of the data used in the tuning process.

Type: Grant

Filed: November 24, 2021

Date of Patent: April 2, 2024

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.

Inventors: Colin Bruce Clement, Neelakantan Sundaresan, Alexey Svyatkovskiy, Michele Tufano, Andrei Zlotchevski
DEBUGGING TOOL FOR CODE GENERATION NEURAL LANGUAGE MODELS

Publication number: 20240104001

Abstract: A debugging tool identifies the smallest subset of an input sequence or rationales that influenced a neural language model to generate an output sequence. The debugging tool uses the rationales to understand why the model made its predictions and in particular, the particular input tokens that had the most impact on the output sequence. In the case of erroneous output, the rationales are used to alter the input sequence to avoid the error or to tailor a new training dataset to retrain the model to improve its performance.

Type: Application

Filed: December 15, 2022

Publication date: March 28, 2024

Inventors: COLIN BRUCE CLEMENT, DAVID ALBERTO NADER PALACIO, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY, MICHELE TUFANO
Transfer learning system for automated software engineering tasks

Patent number: 11900261

Abstract: A transfer learning system is used for the development of neural transformer models pertaining to software engineering tasks. The transfer learning system trains source code domain neural transformer models with attention in various configurations on a large corpus of unsupervised training dataset of source code programs and/or source code-related natural language text. A web service provides the trained models for use in developing a model that may be fine-tuned on a supervised training dataset associated with a software engineering task thereby generating a tool to perform the software engineering task.

Type: Grant

Filed: November 6, 2022

Date of Patent: February 13, 2024

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.

Inventors: Colin Bruce Clement, Dawn Drain, Neelakantan Sundaresan, Alexey Svyatkovskiy
SOURCE CODE VULNERABILITY DETECTION AND REPAIR THROUGH MACHINE LEARNING

Publication number: 20240028740

Abstract: A neural classifier model is used to detect cybersecurity vulnerabilities in the source code predicted by a deep learning code generation model having been trained on source code possibly containing security bugs. Upon the classifier model classifying a given source code snippet as likely containing a cybersecurity vulnerability, a proposed repair for the cybersecurity vulnerability is predicted from a neural decoder transformer model having been trained on non-vulnerable source code. The neural decoder transformer model is used to predict source code that repairs the cybersecurity vulnerability given the source code classified with a cybersecurity vulnerability.

Type: Application

Filed: September 21, 2022

Publication date: January 25, 2024

Inventors: AARON YUE-CHIU CHAN, COLIN BRUCE CLEMENT, YEVHEN MOHYLEVSKYY, NEELAKANTAN SUNDARESAN, ROSHANAK ZILOUCHIAN MOGHADDAM
MULTI-LINGUAL CODE GENERATION WITH ZERO-SHOT INFERENCE

Publication number: 20230359443

Abstract: A neural transformer model with attention is trained to predict candidates to complete a line of source code with a zero-inference capability. The model is trained on an unsupervised training dataset that includes features from source code written in multiple programming languages. The features include a file-level context and a local context, where the file-level context includes a global context, a class context, a function context, and/or a method context for each class, function and/or method of the source code programs used in the training dataset. The local context includes method bodies, function bodies, and/or stand-alone code of main method routines. From these features, the model is able to learn to predict an ordered sequence of code elements that complete a line of source code in a programming language seen and not seen during training.

Type: Application

Filed: May 24, 2023

Publication date: November 9, 2023

Inventors: COLIN BRUCE CLEMENT, SHUAI LU, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY, DUYU TANG
Automated program repair using stack traces and back translations

Patent number: 11809302

Abstract: An automated program repair system uses a neural transformer model with attention to predict a bug-free version of a method having a source code bug identified in an associated stack trace. The neural transformer model is pre-trained with English language text and the source code of a target programming language. The pre-trained neural transformer model is trained to create synthetic bugs in bug-free methods. The bug-free methods with the synthetic bugs are executed with a test case to obtain a stack trace of the source code bug. The method with the synthetic bug, without the bug, and its stack trace are used to train the neural transformer model to predict repairs for buggy methods.

Type: Grant

Filed: February 16, 2023

Date of Patent: November 7, 2023

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.

Inventors: Colin Bruce Clement, Dawn Drain, Guillermo Serrato Castilla, Neelakantan Sundaresan
AUTOMATING TEST-DRIVEN DEVELOPMENT WITH TRANSFORMERS

Publication number: 20230342287

Abstract: A test-driven development system utilizes a neural transformer model with attention to generate method bodies for a focal method given its associated test cases, and optionally a method signature and a docstring of the focal method. The candidate method bodies are validated for syntactic correctness, tested using the given test cases, and tested with a donor class in a target system. Those candidate method bodies passing the validation and testing are then ranked based on a PLUM score that analyzes the candidate method bodies against various quality and performance metrics.

Type: Application

Filed: June 19, 2023

Publication date: October 26, 2023

Inventors: COLIN BRUCE CLEMENT, SHAO KUN DENG, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY, MICHELE TUFANO
Automating test-driven development with transformers

Patent number: 11797426

Abstract: A test-driven development system utilizes a neural transformer model with attention to generate method bodies for a focal method given its associated test cases, and optionally a method signature and a docstring of the focal method. The candidate method bodies are validated for syntactic correctness, tested using the given test cases, and tested with a donor class in a target system. Those candidate method bodies passing the validation and testing are then ranked based on a PLUM score that analyzes the candidate method bodies against various quality and performance metrics.

Type: Grant

Filed: October 22, 2021

Date of Patent: October 24, 2023

Assignee: MICROSOFT TECHNOLOGY LICENSING

Inventors: Colin Bruce Clement, Shao Kun Deng, Neelakantan Sundaresan, Alexey Svyatkovskiy, Michele Tufano
CONSTRAINED DECODING FOR SOURCE CODE GENERATION

Publication number: 20230281318

Abstract: A constrained decoding technique incorporates token constraints into a beam search at each time step of a decoding process in order to generate viable candidate sequences that are syntactically and semantically correct. The token constraints identify source code tokens or sequences of tokens that should appear in a candidate sequence. The token constraints are generated from checking whether a token predicted at each decoding step is feasible for a partial solution based on the production rules of the grammar of the programming language, the syntactic correctness of a partial sequence, and/or static type correctness.

Type: Application

Filed: March 7, 2022

Publication date: September 7, 2023

Inventors: COLIN BRUCE CLEMENT, SHAO KUN DENG, XIAOYU LIU, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY
FALSE POSITIVE VULNERABILITY DETECTION USING NEURAL TRANSFORMERS

Publication number: 20230281317

Abstract: A false positive vulnerability system detects whether a software vulnerability identified by a static code vulnerability analyzer is a true vulnerability or a false positive. The system utilizes deep learning models to predict whether an identified vulnerability is accurate given the source code context of the identified vulnerability. A neural encoder transformer model is trained to classify a false positive given the method body including the identified vulnerability. A neural decoder transformer model is trained to predict a candidate line-of-code to complete a prompt inserted into the context of the identified vulnerability. The candidate line-of-code that successfully completes the prompt is used as a signal to identify that the identified vulnerability is a false positive.

Type: Application

Filed: March 4, 2022

Publication date: September 7, 2023

Inventors: COLIN BRUCE CLEMENT, MATTHEW GLENN JIN, ANANT GIRISH KHARKAR, XIAOYU LIU, XIN SHI, NEELAKANTAN SUNDARESAN, ROSHANAK ZILOUCHIAN MOGHADDAM
LONG-RANGE MODELING OF SOURCE CODE FILES BY SYNTAX HIERARCHY

Publication number: 20230251831

Abstract: The syntax elements of a source code program used to represent the context of a focal method are selected based on a priority order. The selected syntax elements are input into a fixed-size context window that is used to train a neural transformer with attention model to learn to generate source code and used by the neural transformer model to generate source code. The context window contains prioritized sequences of tokens that extend beyond the target focus in order to provide a longer visibility back into the source code program for the model to learn predictive patterns. This gives the model a file-level context of the source code program without increasing the size of the context window.

Type: Application

Filed: April 17, 2023

Publication date: August 10, 2023

Inventors: COLIN BRUCE CLEMENT, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY, MICHELE TUFANO
MODEL QUANTIZATION FOR SOFTWARE ENGINEERING TASKS

Publication number: 20230222334

Abstract: A deep learning model is quantized during its training to perform a target software engineering task. During training, a portion of the full-precision floating point weights is quantized into INT4 or INT 8 data types through scalar quantization or product quantization to make the model more resilient to quantization and to reduce the noise between the quantized and full-precision model outputs. In scalar quantization, each sub-block consists of a single weight that is mapped into a codeword of a codebook. In product quantization, an identity matrix and a codebook of centroids is used to map a quantized weight into its original value.

Type: Application

Filed: January 10, 2022

Publication date: July 13, 2023

Inventors: COLIN BRUCE CLEMENT, SHAO KUN DENG, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY
Multi-lingual code generation with zero-shot inference

Patent number: 11693630

Abstract: A neural transformer model with attention is trained to predict candidates to complete a line of source code with a zero-inference capability. The model is trained on an unsupervised training dataset that includes features from source code written in multiple programming languages. The features include a file-level context and a local context, where the file-level context includes a global context, a class context, a function context, and/or a method context for each class, function and/or method of the source code programs used in the training dataset. The local context includes method bodies, function bodies, and/or stand-alone code of main method routines. From these features, the model is able to learn to predict an ordered sequence of code elements that complete a line of source code in a programming language seen and not seen during training.

Type: Grant

Filed: November 1, 2022

Date of Patent: July 4, 2023

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.

Inventors: Colin Bruce Clement, Shuai Lu, Neelakantan Sundaresan, Alexey Svyatkovskiy, Duyu Tang
AUTOMATED PROGRAM REPAIR USING STACK TRACES AND BACK TRANSLATIONS

Publication number: 20230195600

Abstract: An automated program repair system uses a neural transformer model with attention to predict a bug-free version of a method having a source code bug identified in an associated stack trace. The neural transformer model is pre-trained with English language text and the source code of a target programming language. The pre-trained neural transformer model is trained to create synthetic bugs in bug-free methods. The bug-free methods with the synthetic bugs are executed with a test case to obtain a stack trace of the source code bug. The method with the synthetic bug, without the bug, and its stack trace are used to train the neural transformer model to predict repairs for buggy methods.

Type: Application

Filed: February 16, 2023

Publication date: June 22, 2023

Inventors: COLIN BRUCE CLEMENT, DAWN DRAIN, GUILLERMO SERRATO CASTILLA, NEELAKANTAN SUNDARESAN
AUTOMATED NOTEBOOK COMPLETION USING SEQUENCE-TO-SEQUENCE TRANSFORMER

Publication number: 20230177261

Abstract: Generally discussed herein are devices, systems, and methods for generating an automatic interactive digital notebook completion model. A method can include receiving notebook content of an interactive digital notebook, the notebook content including a markdown cell followed by a code cell. The method can include generating input/output examples by, for each input/output example by masking one of (i) content of the markdown cell or (ii) content of the code cell resulting in a masked cell, identifying the masked cell and content of another cell of the markdown cell or the code that is not masked as an input for an input/output example, and identifying the content of the masked cell as an output for the input/output example. The method can include training, based on the input/output examples, a natural language processing model that generates a prediction of the content of a second masked cell as an output.

Type: Application

Filed: December 8, 2021

Publication date: June 8, 2023

Inventors: Colin Bruce CLEMENT, Shubham Chandel, Guillermo Serrato Castilla, Neelakantan Sundaresan
Long-range modeling of source code files by syntax hierarchy

Patent number: 11656851

Abstract: The syntax elements of a source code program used to represent the context of a focal method are selected based on a priority order. The selected syntax elements are input into a fixed-size context window that is used to train a neural transformer with attention model to learn to generate source code and used by the neural transformer model to generate source code. The context window contains prioritized sequences of tokens that extend beyond the target focus in order to provide a longer visibility back into the source code program for the model to learn predictive patterns. This gives the model a file-level context of the source code program without increasing the size of the context window.

Type: Grant

Filed: October 22, 2021

Date of Patent: May 23, 2023

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.

Inventors: Colin Bruce Clement, Neelakantan Sundaresan, Alexey Svyatkovskiy, Michele Tufano
AUTOMATING TEST-DRIVEN DEVELOPMENT WITH TRANSFORMERS

Publication number: 20230128008

Abstract: A test-driven development system utilizes a neural transformer model with attention to generate method bodies for a focal method given its associated test cases, and optionally a method signature and a docstring of the focal method. The candidate method bodies are validated for syntactic correctness, tested using the given test cases, and tested with a donor class in a target system. Those candidate method bodies passing the validation and testing are then ranked based on a PLUM score that analyzes the candidate method bodies against various quality and performance metrics.

Type: Application

Filed: October 22, 2021

Publication date: April 27, 2023

Inventors: COLIN BRUCE CLEMENT, SHAO KUN DENG, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY, MICHELE TUFANO
LONG-RANGE MODELING OF SOURCE CODE FILES BY SYNTAX HIERARCHY

Publication number: 20230128200

Abstract: The syntax elements of a source code program used to represent the context of a focal method are selected based on a priority order. The selected syntax elements are input into a fixed-size context window that is used to train a neural transformer with attention model to learn to generate source code and used by the neural transformer model to generate source code. The context window contains prioritized sequences of tokens that extend beyond the target focus in order to provide a longer visibility back into the source code program for the model to learn predictive patterns. This gives the model a file-level context of the source code program without increasing the size of the context window.

Type: Application

Filed: October 22, 2021

Publication date: April 27, 2023

Inventors: COLIN BRUCE CLEMENT, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY, MICHELE TUFANO

1 2 3 next