Patents by Inventor COLIN BRUCE CLEMENT

COLIN BRUCE CLEMENT has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12242822
    Abstract: Custom source code generation models are generated by tuning a pre-trained deep learning model by freezing the model parameters and optimizing a prefix. The tuning process is distributed across a user space and a model space where the embedding and output layers are performed in the user space and the execution of the model is performed in a model space that is isolated from the user space. The tuning process updates the embeddings of the prefix across the separate execution spaces in a manner that preserves the privacy of the data used in the tuning process.
    Type: Grant
    Filed: March 13, 2024
    Date of Patent: March 4, 2025
    Assignee: Microsoft Technology Licensing, LLC.
    Inventors: Colin Bruce Clement, Neelakantan Sundaresan, Alexey Svyatkovskiy, Michele Tufano, Andrei Zlotchevski
  • Publication number: 20250068665
    Abstract: A user query for information regarding data of a codebase is answered by a large language model given a prompt that includes examples of code segments from the codebase that are similar to the user query. The code segments from the codebase are associated with metadata that includes both natural language text and source code. The search for the examples of code segments from the codebase is based on embeddings of code segments and associated metadata that are closely similar to an embedding of the user query and context.
    Type: Application
    Filed: August 24, 2023
    Publication date: February 27, 2025
    Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC.
    Inventors: SHUBHAM CHANDEL, COLIN BRUCE CLEMENT, SHENGYU FU, NEELAKANTAN SUNDARESAN
  • Publication number: 20250036778
    Abstract: A neural classifier model is used to detect cybersecurity vulnerabilities in the source code predicted by a deep learning code generation model having been trained on source code possibly containing security bugs. Upon the classifier model classifying a given source code snippet as likely containing a cybersecurity vulnerability, a proposed repair for the cybersecurity vulnerability is predicted from a neural decoder transformer model having been trained on non-vulnerable source code. The neural decoder transformer model is used to predict source code that repairs the cybersecurity vulnerability given the source code classified with a cybersecurity vulnerability.
    Type: Application
    Filed: October 16, 2024
    Publication date: January 30, 2025
    Inventors: AARON YUE-CHIU CHAN, COLIN BRUCE CLEMENT, YEVHEN MOHYLEVSKYY, NEELAKANTAN SUNDARESAN, ROSHANAK ZILOUCHIAN MOGHADDAM
  • Publication number: 20250004918
    Abstract: A debugging tool identifies the smallest subset of an input sequence or rationales that influenced a neural language model to generate an output sequence. The debugging tool uses the rationales to understand why the model made its predictions and in particular, the particular input tokens that had the most impact on the output sequence. In the case of erroneous output, the rationales are used to alter the input sequence to avoid the error or to tailor a new training dataset to retrain the model to improve its performance.
    Type: Application
    Filed: September 9, 2024
    Publication date: January 2, 2025
    Inventors: COLIN BRUCE CLEMENT, DAVID ALBERTO NADER PALACIO, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY, MICHELE TUFANO
  • Publication number: 20240419917
    Abstract: A customized prompt generation service automates prompts to a large language model to perform a specified software engineering task. The service stores the custom data of a client that includes code diff hunks, source code segments, code reviews, repaired code, and unit tests from a code base or repository of the client. Prompt templates are associated with each software engineering task that include the requisite information needed for the large language model to perform the target task. A prompt to a large language model includes examples of the software engineering task from the custom data of the client of the service.
    Type: Application
    Filed: June 14, 2023
    Publication date: December 19, 2024
    Inventors: COLIN BRUCE CLEMENT, SHENGYU FU, SPANDAN GARG, NEELAKANTAN SUNDARESAN, DONGJIANG YOU, ROSHANAK ZILOUCHIAN MOGHADDAM
  • Publication number: 20240403198
    Abstract: A code coverage prediction system utilizes a neural transformer model with attention to generate a sequence of coverage symbols given a focal method and a test case. The sequence of coverage symbols indicates whether a line of source code is covered by the test case, missed by the test case or unreachable. The sequence of coverage symbols is aligned with the focal method to produce a coverage-annotated focal method that associates a predicted coverage symbol with each line of source code in the focal method.
    Type: Application
    Filed: June 1, 2023
    Publication date: December 5, 2024
    Inventors: SHUBHAM CHANDEL, COLIN BRUCE CLEMENT, NEELAKANTAN SUNDARESAN, MICHELE TUFANO
  • Publication number: 20240394384
    Abstract: A constrained decoding technique incorporates token constraints into a beam search at each time step of a decoding process in order to generate viable candidate sequences that are syntactically and semantically correct. The token constraints identify source code tokens or sequences of tokens that should appear in a candidate sequence. The token constraints are generated from checking whether a token predicted at each decoding step is feasible for a partial solution based on the production rules of the grammar of the programming language, the syntactic correctness of a partial sequence, and/or static type correctness.
    Type: Application
    Filed: August 7, 2024
    Publication date: November 28, 2024
    Inventors: COLIN BRUCE CLEMENT, SHAO KUN DENG, XIAOYU LIU, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY
  • Patent number: 12153684
    Abstract: A neural classifier model is used to detect cybersecurity vulnerabilities in the source code predicted by a deep learning code generation model having been trained on source code possibly containing security bugs. Upon the classifier model classifying a given source code snippet as likely containing a cybersecurity vulnerability, a proposed repair for the cybersecurity vulnerability is predicted from a neural decoder transformer model having been trained on non-vulnerable source code. The neural decoder transformer model is used to predict source code that repairs the cybersecurity vulnerability given the source code classified with a cybersecurity vulnerability.
    Type: Grant
    Filed: September 21, 2022
    Date of Patent: November 26, 2024
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.
    Inventors: Aaron Yue-Chiu Chan, Colin Bruce Clement, Yevhen Mohylevskyy, Neelakantan Sundaresan, Roshanak Zilouchian Moghaddam
  • Publication number: 20240386103
    Abstract: A technique to prevent a prompt injection attack utilizes a security agent to sign a large language model prompt with a secret that is isolated from the user application or device that generates a user prompt. The secret is tailored for a specific user identifier and session identifier. The large language model is instructed to repeat the secret in each response. The security agent retrieves the response from the large language model and checks for the secret. When the secret is not part of the response, an error message is forwarded to the user application instead of the response.
    Type: Application
    Filed: August 19, 2023
    Publication date: November 21, 2024
    Inventors: COLIN BRUCE CLEMENT, SHENGYU FU, NEELAKANTAN SUNDARESAN, DONGJIANG YOU
  • Publication number: 20240370244
    Abstract: An automated system for translating source code written in one programming language into a different programming language utilizes a neural transformer with attention trained on semi-supervised data. The model is jointly pre-trained with a masked language model objective and an autoregressive objective on a large unsupervised source code corpus to learn to comprehend the syntactic structure and semantics of source code. The pre-trained model is then fine-tuned with a token-type prediction objective and an autoregressive objective on supervised translation tasks and data augmented tasks to learn to translate source code from one programming language into a different programming language.
    Type: Application
    Filed: July 19, 2024
    Publication date: November 7, 2024
    Inventors: COLIN BRUCE CLEMENT, DAWN DRAIN, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY, CHEN WU
  • Publication number: 20240361992
    Abstract: A neural transformer model with attention is trained to predict candidates to complete a line of source code with a zero-inference capability. The model is trained on an unsupervised training dataset that includes features from source code written in multiple programming languages. The features include a file-level context and a local context, where the file-level context includes a global context, a class context, a function context, and/or a method context for each class, function and/or method of the source code programs used in the training dataset. The local context includes method bodies, function bodies, and/or stand-alone code of main method routines. From these features, the model is able to learn to predict an ordered sequence of code elements that complete a line of source code in a programming language seen and not seen during training.
    Type: Application
    Filed: April 8, 2024
    Publication date: October 31, 2024
    Inventors: COLIN BRUCE CLEMENT, SHUAI LU, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY, DUYU TANG
  • Patent number: 12111751
    Abstract: A debugging tool identifies the smallest subset of an input sequence or rationales that influenced a neural language model to generate an output sequence. The debugging tool uses the rationales to understand why the model made its predictions and in particular, the particular input tokens that had the most impact on the output sequence. In the case of erroneous output, the rationales are used to alter the input sequence to avoid the error or to tailor a new training dataset to retrain the model to improve its performance.
    Type: Grant
    Filed: December 15, 2022
    Date of Patent: October 8, 2024
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.
    Inventors: Colin Bruce Clement, David Alberto Nader Palacio, Neelakantan Sundaresan, Alexey Svyatkovskiy, Michele Tufano
  • Patent number: 12086268
    Abstract: A constrained decoding technique incorporates token constraints into a beam search at each time step of a decoding process in order to generate viable candidate sequences that are syntactically and semantically correct. The token constraints identify source code tokens or sequences of tokens that should appear in a candidate sequence. The token constraints are generated from checking whether a token predicted at each decoding step is feasible for a partial solution based on the production rules of the grammar of the programming language, the syntactic correctness of a partial sequence, and/or static type correctness.
    Type: Grant
    Filed: March 7, 2022
    Date of Patent: September 10, 2024
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.
    Inventors: Colin Bruce Clement, Shao Kun Deng, Xiaoyu Liu, Neelakantan Sundaresan, Alexey Svyatkovskiy
  • Publication number: 20240256230
    Abstract: A code completion tool uses a neural transformer model with attention to generate candidate sequences to complete a method body of a method signature. The neural transformer model is trained with source code programs and natural language text. The neural transformer model learns the meaning of a method name, its corresponding method parameters and types from a large corpus of unsupervised dataset of source code methods and a supervised dataset of tasks including source code constructs in combination with natural language docstrings to infer a candidate sequence of subtokens that represent a method body for a particular method signature.
    Type: Application
    Filed: April 11, 2024
    Publication date: August 1, 2024
    Inventors: COLIN BRUCE CLEMENT, DAWN DRAIN, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY
  • Patent number: 12045592
    Abstract: An automated system for translating source code written in one programming language into a different programming language utilizes a neural transformer with attention trained on semi-supervised data. The model is jointly pre-trained with a masked language model objective and an autoregressive objective on a large unsupervised source code corpus to learn to comprehend the syntactic structure and semantics of source code. The pre-trained model is then fine-tuned with a token-type prediction objective and an autoregressive objective on supervised translation tasks and data augmented tasks to learn to translate source code from one programming language into a different programming language.
    Type: Grant
    Filed: March 25, 2021
    Date of Patent: July 23, 2024
    Assignee: Microsoft Technology Licensing, LLC.
    Inventors: Colin Bruce Clement, Dawn Drain, Neelakantan Sundaresan, Alexey Svyatkovskiy, Chen Wu
  • Publication number: 20240232519
    Abstract: Generally discussed herein are devices, systems, and methods for generating an automatic interactive digital notebook completion model. A method can include receiving notebook content of an interactive digital notebook, the notebook content including a markdown cell followed by a code cell. The method can include generating input/output examples by, for each input/output example by masking one of (i) content of the markdown cell or (ii) content of the code cell resulting in a masked cell, identifying the masked cell and content of another cell of the markdown cell or the code that is not masked as an input for an input/output example, and identifying the content of the masked cell as an output for the input/output example. The method can include training, based on the input/output examples, a natural language processing model that generates a prediction of the content of a second masked cell as an output.
    Type: Application
    Filed: March 20, 2024
    Publication date: July 11, 2024
    Inventors: Colin Bruce CLEMENT, Shubham Chandel, Guillermo Serrato Castilla, Neelakantan Sundaresan
  • Publication number: 20240220215
    Abstract: Custom source code generation models are generated by tuning a pre-trained deep learning model by freezing the model parameters and optimizing a prefix. The tuning process is distributed across a user space and a model space where the embedding and output layers are performed in the user space and the execution of the model is performed in a model space that is isolated from the user space. The tuning process updates the embeddings of the prefix across the separate execution spaces in a manner that preserves the privacy of the data used in the tuning process.
    Type: Application
    Filed: March 13, 2024
    Publication date: July 4, 2024
    Inventors: COLIN BRUCE CLEMENT, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY, MICHELE TUFANO, ANDREI ZLOTCHEVSKI
  • Publication number: 20240211224
    Abstract: A neural transcompilation model is tested with a set of syntax unit tests to determine the syntax elements of a source code program written in a source programming language that fail to translate properly into a target programming language. The syntax elements having a translation defect is identified and ranked according a translation failure rate. The neural transcompilation model is then fine-tuned with training samples of the syntax elements having the highest translation failure rates and their paired correct translation in order to teach the model to learn the association between the syntax elements of a source programming language causing translation defects and its correct translation in a target programming language.
    Type: Application
    Filed: December 23, 2022
    Publication date: June 27, 2024
    Inventors: COLIN BRUCE CLEMENT, YUFAN HUANG, NEELAKANTAN SUNDARESAN, YIDING TIAN, MAOQUAN WANG
  • Publication number: 20240160940
    Abstract: A transfer learning system is used for the development of neural transformer models pertaining to software engineering tasks. The transfer learning system trains source code domain neural transformer models with attention in various configurations on a large corpus of unsupervised training dataset of source code programs and/or source code-related natural language text. A web service provides the trained models for use in developing a model that may be fine-tuned on a supervised training dataset associated with a software engineering task thereby generating a tool to perform the software engineering task.
    Type: Application
    Filed: January 17, 2024
    Publication date: May 16, 2024
    Inventors: COLIN BRUCE CLEMENT, DAWN DRAIN, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY
  • Patent number: 11983513
    Abstract: A neural transformer model with attention is trained to predict candidates to complete a line of source code with a zero-inference capability. The model is trained on an unsupervised training dataset that includes features from source code written in multiple programming languages. The features include a file-level context and a local context, where the file-level context includes a global context, a class context, a function context, and/or a method context for each class, function and/or method of the source code programs used in the training dataset. The local context includes method bodies, function bodies, and/or stand-alone code of main method routines. From these features, the model is able to learn to predict an ordered sequence of code elements that complete a line of source code in a programming language seen and not seen during training.
    Type: Grant
    Filed: May 24, 2023
    Date of Patent: May 14, 2024
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.
    Inventors: Colin Bruce Clement, Shuai Lu, Neelakantan Sundaresan, Alexey Svyatkovskiy, Duyu Tang