Patents by Inventor COLIN BRUCE CLEMENT

COLIN BRUCE CLEMENT has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Syntax unit testing and fine-tuning of neural transcompilation models

Patent number: 12321735

Abstract: A neural transcompilation model is tested with a set of syntax unit tests to determine the syntax elements of a source code program written in a source programming language that fail to translate properly into a target programming language. The syntax elements having a translation defect are identified and ranked according to a translation failure rate. The neural transcompilation model is then fine-tuned with training samples of the syntax elements having the highest translation failure rates and their paired correct translation in order to teach the model to learn the association between the syntax elements of a source programming language causing translation defects and its correct translation in a target programming language.

Type: Grant

Filed: December 23, 2022

Date of Patent: June 3, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Colin Bruce Clement, Yufan Huang, Neelakantan Sundaresan, Yiding Tian, Maoquan Wang
Code search for examples to augment model prompt

Patent number: 12314301

Abstract: A user query for information regarding data of a codebase is answered by a large language model given a prompt that includes examples of code segments from the codebase that are similar to the user query. The code segments from the codebase are associated with metadata that includes both natural language text and source code. The search for the examples of code segments from the codebase is based on embeddings of code segments and associated metadata that are closely similar to an embedding of the user query and context.

Type: Grant

Filed: August 24, 2023

Date of Patent: May 27, 2025

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.

Inventors: Shubham Chandel, Colin Bruce Clement, Shengyu Fu, Neelakantan Sundaresan
Transfer learning system for automated software engineering tasks

Patent number: 12314865

Abstract: A transfer learning system is used for the development of neural transformer models pertaining to software engineering tasks. The transfer learning system trains source code domain neural transformer models with attention in various configurations on a large corpus of unsupervised training dataset of source code programs and/or source code-related natural language text. A web service provides the trained models for use in developing a model that may be fine-tuned on a supervised training dataset associated with a software engineering task thereby generating a tool to perform the software engineering task.

Type: Grant

Filed: January 17, 2024

Date of Patent: May 27, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Colin Bruce Clement, Dawn Drain, Neelakantan Sundaresan, Alexey Svyatkovskiy
Custom models for source code generation via prefix-tuning

Patent number: 12242822

Abstract: Custom source code generation models are generated by tuning a pre-trained deep learning model by freezing the model parameters and optimizing a prefix. The tuning process is distributed across a user space and a model space where the embedding and output layers are performed in the user space and the execution of the model is performed in a model space that is isolated from the user space. The tuning process updates the embeddings of the prefix across the separate execution spaces in a manner that preserves the privacy of the data used in the tuning process.

Type: Grant

Filed: March 13, 2024

Date of Patent: March 4, 2025

Assignee: Microsoft Technology Licensing, LLC.

Inventors: Colin Bruce Clement, Neelakantan Sundaresan, Alexey Svyatkovskiy, Michele Tufano, Andrei Zlotchevski
CODE SEARCH FOR EXAMPLES TO AUGMENT MODEL PROMPT

Publication number: 20250068665

Abstract: A user query for information regarding data of a codebase is answered by a large language model given a prompt that includes examples of code segments from the codebase that are similar to the user query. The code segments from the codebase are associated with metadata that includes both natural language text and source code. The search for the examples of code segments from the codebase is based on embeddings of code segments and associated metadata that are closely similar to an embedding of the user query and context.

Type: Application

Filed: August 24, 2023

Publication date: February 27, 2025

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC.

Inventors: SHUBHAM CHANDEL, COLIN BRUCE CLEMENT, SHENGYU FU, NEELAKANTAN SUNDARESAN
SOURCE CODE VULNERABILITY DETECTION AND REPAIR THROUGH MACHINE LEARNING

Publication number: 20250036778

Abstract: A neural classifier model is used to detect cybersecurity vulnerabilities in the source code predicted by a deep learning code generation model having been trained on source code possibly containing security bugs. Upon the classifier model classifying a given source code snippet as likely containing a cybersecurity vulnerability, a proposed repair for the cybersecurity vulnerability is predicted from a neural decoder transformer model having been trained on non-vulnerable source code. The neural decoder transformer model is used to predict source code that repairs the cybersecurity vulnerability given the source code classified with a cybersecurity vulnerability.

Type: Application

Filed: October 16, 2024

Publication date: January 30, 2025

Inventors: AARON YUE-CHIU CHAN, COLIN BRUCE CLEMENT, YEVHEN MOHYLEVSKYY, NEELAKANTAN SUNDARESAN, ROSHANAK ZILOUCHIAN MOGHADDAM
DEBUGGING TOOL FOR CODE GENERATION NEURAL LANGUAGE MODELS

Publication number: 20250004918

Abstract: A debugging tool identifies the smallest subset of an input sequence or rationales that influenced a neural language model to generate an output sequence. The debugging tool uses the rationales to understand why the model made its predictions and in particular, the particular input tokens that had the most impact on the output sequence. In the case of erroneous output, the rationales are used to alter the input sequence to avoid the error or to tailor a new training dataset to retrain the model to improve its performance.

Type: Application

Filed: September 9, 2024

Publication date: January 2, 2025

Inventors: COLIN BRUCE CLEMENT, DAVID ALBERTO NADER PALACIO, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY, MICHELE TUFANO
CUSTOMIZED PROMPT GENERATION SERVICE FOR SOFTWARE ENGINEERING TASKS

Publication number: 20240419917

Abstract: A customized prompt generation service automates prompts to a large language model to perform a specified software engineering task. The service stores the custom data of a client that includes code diff hunks, source code segments, code reviews, repaired code, and unit tests from a code base or repository of the client. Prompt templates are associated with each software engineering task that include the requisite information needed for the large language model to perform the target task. A prompt to a large language model includes examples of the software engineering task from the custom data of the client of the service.

Type: Application

Filed: June 14, 2023

Publication date: December 19, 2024

Inventors: COLIN BRUCE CLEMENT, SHENGYU FU, SPANDAN GARG, NEELAKANTAN SUNDARESAN, DONGJIANG YOU, ROSHANAK ZILOUCHIAN MOGHADDAM
PREDICTING CODE COVERAGE WITHOUT EXECUTION

Publication number: 20240403198

Abstract: A code coverage prediction system utilizes a neural transformer model with attention to generate a sequence of coverage symbols given a focal method and a test case. The sequence of coverage symbols indicates whether a line of source code is covered by the test case, missed by the test case or unreachable. The sequence of coverage symbols is aligned with the focal method to produce a coverage-annotated focal method that associates a predicted coverage symbol with each line of source code in the focal method.

Type: Application

Filed: June 1, 2023

Publication date: December 5, 2024

Inventors: SHUBHAM CHANDEL, COLIN BRUCE CLEMENT, NEELAKANTAN SUNDARESAN, MICHELE TUFANO
CONSTRAINED DECODING FOR SOURCE CODE GENERATION

Publication number: 20240394384

Abstract: A constrained decoding technique incorporates token constraints into a beam search at each time step of a decoding process in order to generate viable candidate sequences that are syntactically and semantically correct. The token constraints identify source code tokens or sequences of tokens that should appear in a candidate sequence. The token constraints are generated from checking whether a token predicted at each decoding step is feasible for a partial solution based on the production rules of the grammar of the programming language, the syntactic correctness of a partial sequence, and/or static type correctness.

Type: Application

Filed: August 7, 2024

Publication date: November 28, 2024

Inventors: COLIN BRUCE CLEMENT, SHAO KUN DENG, XIAOYU LIU, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY
Source code vulnerability detection and repair through machine learning

Patent number: 12153684

Abstract: A neural classifier model is used to detect cybersecurity vulnerabilities in the source code predicted by a deep learning code generation model having been trained on source code possibly containing security bugs. Upon the classifier model classifying a given source code snippet as likely containing a cybersecurity vulnerability, a proposed repair for the cybersecurity vulnerability is predicted from a neural decoder transformer model having been trained on non-vulnerable source code. The neural decoder transformer model is used to predict source code that repairs the cybersecurity vulnerability given the source code classified with a cybersecurity vulnerability.

Type: Grant

Filed: September 21, 2022

Date of Patent: November 26, 2024

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.

Inventors: Aaron Yue-Chiu Chan, Colin Bruce Clement, Yevhen Mohylevskyy, Neelakantan Sundaresan, Roshanak Zilouchian Moghaddam
SIGNING LARGE LANGUAGE MODEL PROMPTS TO PREVENT UNINTENDED RESPONSE

Publication number: 20240386103

Abstract: A technique to prevent a prompt injection attack utilizes a security agent to sign a large language model prompt with a secret that is isolated from the user application or device that generates a user prompt. The secret is tailored for a specific user identifier and session identifier. The large language model is instructed to repeat the secret in each response. The security agent retrieves the response from the large language model and checks for the secret. When the secret is not part of the response, an error message is forwarded to the user application instead of the response.

Type: Application

Filed: August 19, 2023

Publication date: November 21, 2024

Inventors: COLIN BRUCE CLEMENT, SHENGYU FU, NEELAKANTAN SUNDARESAN, DONGJIANG YOU
SEMI-SUPERVISED TRANSLATION OFSOURCE CODE PROGRAMS USING NEURAL TRANSFORMERS

Publication number: 20240370244

Abstract: An automated system for translating source code written in one programming language into a different programming language utilizes a neural transformer with attention trained on semi-supervised data. The model is jointly pre-trained with a masked language model objective and an autoregressive objective on a large unsupervised source code corpus to learn to comprehend the syntactic structure and semantics of source code. The pre-trained model is then fine-tuned with a token-type prediction objective and an autoregressive objective on supervised translation tasks and data augmented tasks to learn to translate source code from one programming language into a different programming language.

Type: Application

Filed: July 19, 2024

Publication date: November 7, 2024

Inventors: COLIN BRUCE CLEMENT, DAWN DRAIN, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY, CHEN WU
MULTI-LINGUAL CODE GENERATION WITH ZERO-SHOT INFERENCE

Publication number: 20240361992

Abstract: A neural transformer model with attention is trained to predict candidates to complete a line of source code with a zero-inference capability. The model is trained on an unsupervised training dataset that includes features from source code written in multiple programming languages. The features include a file-level context and a local context, where the file-level context includes a global context, a class context, a function context, and/or a method context for each class, function and/or method of the source code programs used in the training dataset. The local context includes method bodies, function bodies, and/or stand-alone code of main method routines. From these features, the model is able to learn to predict an ordered sequence of code elements that complete a line of source code in a programming language seen and not seen during training.

Type: Application

Filed: April 8, 2024

Publication date: October 31, 2024

Inventors: COLIN BRUCE CLEMENT, SHUAI LU, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY, DUYU TANG
Debugging tool for code generation neural language models

Patent number: 12111751

Abstract: A debugging tool identifies the smallest subset of an input sequence or rationales that influenced a neural language model to generate an output sequence. The debugging tool uses the rationales to understand why the model made its predictions and in particular, the particular input tokens that had the most impact on the output sequence. In the case of erroneous output, the rationales are used to alter the input sequence to avoid the error or to tailor a new training dataset to retrain the model to improve its performance.

Type: Grant

Filed: December 15, 2022

Date of Patent: October 8, 2024

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.

Inventors: Colin Bruce Clement, David Alberto Nader Palacio, Neelakantan Sundaresan, Alexey Svyatkovskiy, Michele Tufano
Constrained decoding for source code generation

Patent number: 12086268

Abstract: A constrained decoding technique incorporates token constraints into a beam search at each time step of a decoding process in order to generate viable candidate sequences that are syntactically and semantically correct. The token constraints identify source code tokens or sequences of tokens that should appear in a candidate sequence. The token constraints are generated from checking whether a token predicted at each decoding step is feasible for a partial solution based on the production rules of the grammar of the programming language, the syntactic correctness of a partial sequence, and/or static type correctness.

Type: Grant

Filed: March 7, 2022

Date of Patent: September 10, 2024

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.

Inventors: Colin Bruce Clement, Shao Kun Deng, Xiaoyu Liu, Neelakantan Sundaresan, Alexey Svyatkovskiy
NEURAL METHOD COMPLETION BASED ON NATURAL LANGUAGE AND SOURCE CODE

Publication number: 20240256230

Abstract: A code completion tool uses a neural transformer model with attention to generate candidate sequences to complete a method body of a method signature. The neural transformer model is trained with source code programs and natural language text. The neural transformer model learns the meaning of a method name, its corresponding method parameters and types from a large corpus of unsupervised dataset of source code methods and a supervised dataset of tasks including source code constructs in combination with natural language docstrings to infer a candidate sequence of subtokens that represent a method body for a particular method signature.

Type: Application

Filed: April 11, 2024

Publication date: August 1, 2024

Inventors: COLIN BRUCE CLEMENT, DAWN DRAIN, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY
Semi-supervised translation of source code programs using neural transformers

Patent number: 12045592

Abstract: An automated system for translating source code written in one programming language into a different programming language utilizes a neural transformer with attention trained on semi-supervised data. The model is jointly pre-trained with a masked language model objective and an autoregressive objective on a large unsupervised source code corpus to learn to comprehend the syntactic structure and semantics of source code. The pre-trained model is then fine-tuned with a token-type prediction objective and an autoregressive objective on supervised translation tasks and data augmented tasks to learn to translate source code from one programming language into a different programming language.

Type: Grant

Filed: March 25, 2021

Date of Patent: July 23, 2024

Assignee: Microsoft Technology Licensing, LLC.

Inventors: Colin Bruce Clement, Dawn Drain, Neelakantan Sundaresan, Alexey Svyatkovskiy, Chen Wu
AUTOMATED NOTEBOOK COMPLETION USING SEQUENCE-TO-SEQUENCE TRANSFORMER

Publication number: 20240232519

Abstract: Generally discussed herein are devices, systems, and methods for generating an automatic interactive digital notebook completion model. A method can include receiving notebook content of an interactive digital notebook, the notebook content including a markdown cell followed by a code cell. The method can include generating input/output examples by, for each input/output example by masking one of (i) content of the markdown cell or (ii) content of the code cell resulting in a masked cell, identifying the masked cell and content of another cell of the markdown cell or the code that is not masked as an input for an input/output example, and identifying the content of the masked cell as an output for the input/output example. The method can include training, based on the input/output examples, a natural language processing model that generates a prediction of the content of a second masked cell as an output.

Type: Application

Filed: March 20, 2024

Publication date: July 11, 2024

Inventors: Colin Bruce CLEMENT, Shubham Chandel, Guillermo Serrato Castilla, Neelakantan Sundaresan
CUSTOM MODELS FOR SOURCE CODE GENERATION VIA PREFIX-TUNING

Publication number: 20240220215

Abstract: Custom source code generation models are generated by tuning a pre-trained deep learning model by freezing the model parameters and optimizing a prefix. The tuning process is distributed across a user space and a model space where the embedding and output layers are performed in the user space and the execution of the model is performed in a model space that is isolated from the user space. The tuning process updates the embeddings of the prefix across the separate execution spaces in a manner that preserves the privacy of the data used in the tuning process.

Type: Application

Filed: March 13, 2024

Publication date: July 4, 2024

Inventors: COLIN BRUCE CLEMENT, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY, MICHELE TUFANO, ANDREI ZLOTCHEVSKI

1 2 3 next