Patents by Inventor COLIN BRUCE CLEMENT
COLIN BRUCE CLEMENT has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12242822Abstract: Custom source code generation models are generated by tuning a pre-trained deep learning model by freezing the model parameters and optimizing a prefix. The tuning process is distributed across a user space and a model space where the embedding and output layers are performed in the user space and the execution of the model is performed in a model space that is isolated from the user space. The tuning process updates the embeddings of the prefix across the separate execution spaces in a manner that preserves the privacy of the data used in the tuning process.Type: GrantFiled: March 13, 2024Date of Patent: March 4, 2025Assignee: Microsoft Technology Licensing, LLC.Inventors: Colin Bruce Clement, Neelakantan Sundaresan, Alexey Svyatkovskiy, Michele Tufano, Andrei Zlotchevski
-
Publication number: 20250068665Abstract: A user query for information regarding data of a codebase is answered by a large language model given a prompt that includes examples of code segments from the codebase that are similar to the user query. The code segments from the codebase are associated with metadata that includes both natural language text and source code. The search for the examples of code segments from the codebase is based on embeddings of code segments and associated metadata that are closely similar to an embedding of the user query and context.Type: ApplicationFiled: August 24, 2023Publication date: February 27, 2025Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC.Inventors: SHUBHAM CHANDEL, COLIN BRUCE CLEMENT, SHENGYU FU, NEELAKANTAN SUNDARESAN
-
Publication number: 20250036778Abstract: A neural classifier model is used to detect cybersecurity vulnerabilities in the source code predicted by a deep learning code generation model having been trained on source code possibly containing security bugs. Upon the classifier model classifying a given source code snippet as likely containing a cybersecurity vulnerability, a proposed repair for the cybersecurity vulnerability is predicted from a neural decoder transformer model having been trained on non-vulnerable source code. The neural decoder transformer model is used to predict source code that repairs the cybersecurity vulnerability given the source code classified with a cybersecurity vulnerability.Type: ApplicationFiled: October 16, 2024Publication date: January 30, 2025Inventors: AARON YUE-CHIU CHAN, COLIN BRUCE CLEMENT, YEVHEN MOHYLEVSKYY, NEELAKANTAN SUNDARESAN, ROSHANAK ZILOUCHIAN MOGHADDAM
-
Publication number: 20250004918Abstract: A debugging tool identifies the smallest subset of an input sequence or rationales that influenced a neural language model to generate an output sequence. The debugging tool uses the rationales to understand why the model made its predictions and in particular, the particular input tokens that had the most impact on the output sequence. In the case of erroneous output, the rationales are used to alter the input sequence to avoid the error or to tailor a new training dataset to retrain the model to improve its performance.Type: ApplicationFiled: September 9, 2024Publication date: January 2, 2025Inventors: COLIN BRUCE CLEMENT, DAVID ALBERTO NADER PALACIO, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY, MICHELE TUFANO
-
Publication number: 20240419917Abstract: A customized prompt generation service automates prompts to a large language model to perform a specified software engineering task. The service stores the custom data of a client that includes code diff hunks, source code segments, code reviews, repaired code, and unit tests from a code base or repository of the client. Prompt templates are associated with each software engineering task that include the requisite information needed for the large language model to perform the target task. A prompt to a large language model includes examples of the software engineering task from the custom data of the client of the service.Type: ApplicationFiled: June 14, 2023Publication date: December 19, 2024Inventors: COLIN BRUCE CLEMENT, SHENGYU FU, SPANDAN GARG, NEELAKANTAN SUNDARESAN, DONGJIANG YOU, ROSHANAK ZILOUCHIAN MOGHADDAM
-
Publication number: 20240403198Abstract: A code coverage prediction system utilizes a neural transformer model with attention to generate a sequence of coverage symbols given a focal method and a test case. The sequence of coverage symbols indicates whether a line of source code is covered by the test case, missed by the test case or unreachable. The sequence of coverage symbols is aligned with the focal method to produce a coverage-annotated focal method that associates a predicted coverage symbol with each line of source code in the focal method.Type: ApplicationFiled: June 1, 2023Publication date: December 5, 2024Inventors: SHUBHAM CHANDEL, COLIN BRUCE CLEMENT, NEELAKANTAN SUNDARESAN, MICHELE TUFANO
-
Publication number: 20240394384Abstract: A constrained decoding technique incorporates token constraints into a beam search at each time step of a decoding process in order to generate viable candidate sequences that are syntactically and semantically correct. The token constraints identify source code tokens or sequences of tokens that should appear in a candidate sequence. The token constraints are generated from checking whether a token predicted at each decoding step is feasible for a partial solution based on the production rules of the grammar of the programming language, the syntactic correctness of a partial sequence, and/or static type correctness.Type: ApplicationFiled: August 7, 2024Publication date: November 28, 2024Inventors: COLIN BRUCE CLEMENT, SHAO KUN DENG, XIAOYU LIU, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY
-
Patent number: 12153684Abstract: A neural classifier model is used to detect cybersecurity vulnerabilities in the source code predicted by a deep learning code generation model having been trained on source code possibly containing security bugs. Upon the classifier model classifying a given source code snippet as likely containing a cybersecurity vulnerability, a proposed repair for the cybersecurity vulnerability is predicted from a neural decoder transformer model having been trained on non-vulnerable source code. The neural decoder transformer model is used to predict source code that repairs the cybersecurity vulnerability given the source code classified with a cybersecurity vulnerability.Type: GrantFiled: September 21, 2022Date of Patent: November 26, 2024Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.Inventors: Aaron Yue-Chiu Chan, Colin Bruce Clement, Yevhen Mohylevskyy, Neelakantan Sundaresan, Roshanak Zilouchian Moghaddam
-
Publication number: 20240386103Abstract: A technique to prevent a prompt injection attack utilizes a security agent to sign a large language model prompt with a secret that is isolated from the user application or device that generates a user prompt. The secret is tailored for a specific user identifier and session identifier. The large language model is instructed to repeat the secret in each response. The security agent retrieves the response from the large language model and checks for the secret. When the secret is not part of the response, an error message is forwarded to the user application instead of the response.Type: ApplicationFiled: August 19, 2023Publication date: November 21, 2024Inventors: COLIN BRUCE CLEMENT, SHENGYU FU, NEELAKANTAN SUNDARESAN, DONGJIANG YOU
-
Publication number: 20240370244Abstract: An automated system for translating source code written in one programming language into a different programming language utilizes a neural transformer with attention trained on semi-supervised data. The model is jointly pre-trained with a masked language model objective and an autoregressive objective on a large unsupervised source code corpus to learn to comprehend the syntactic structure and semantics of source code. The pre-trained model is then fine-tuned with a token-type prediction objective and an autoregressive objective on supervised translation tasks and data augmented tasks to learn to translate source code from one programming language into a different programming language.Type: ApplicationFiled: July 19, 2024Publication date: November 7, 2024Inventors: COLIN BRUCE CLEMENT, DAWN DRAIN, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY, CHEN WU
-
Publication number: 20240361992Abstract: A neural transformer model with attention is trained to predict candidates to complete a line of source code with a zero-inference capability. The model is trained on an unsupervised training dataset that includes features from source code written in multiple programming languages. The features include a file-level context and a local context, where the file-level context includes a global context, a class context, a function context, and/or a method context for each class, function and/or method of the source code programs used in the training dataset. The local context includes method bodies, function bodies, and/or stand-alone code of main method routines. From these features, the model is able to learn to predict an ordered sequence of code elements that complete a line of source code in a programming language seen and not seen during training.Type: ApplicationFiled: April 8, 2024Publication date: October 31, 2024Inventors: COLIN BRUCE CLEMENT, SHUAI LU, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY, DUYU TANG
-
Patent number: 12111751Abstract: A debugging tool identifies the smallest subset of an input sequence or rationales that influenced a neural language model to generate an output sequence. The debugging tool uses the rationales to understand why the model made its predictions and in particular, the particular input tokens that had the most impact on the output sequence. In the case of erroneous output, the rationales are used to alter the input sequence to avoid the error or to tailor a new training dataset to retrain the model to improve its performance.Type: GrantFiled: December 15, 2022Date of Patent: October 8, 2024Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.Inventors: Colin Bruce Clement, David Alberto Nader Palacio, Neelakantan Sundaresan, Alexey Svyatkovskiy, Michele Tufano
-
Patent number: 12086268Abstract: A constrained decoding technique incorporates token constraints into a beam search at each time step of a decoding process in order to generate viable candidate sequences that are syntactically and semantically correct. The token constraints identify source code tokens or sequences of tokens that should appear in a candidate sequence. The token constraints are generated from checking whether a token predicted at each decoding step is feasible for a partial solution based on the production rules of the grammar of the programming language, the syntactic correctness of a partial sequence, and/or static type correctness.Type: GrantFiled: March 7, 2022Date of Patent: September 10, 2024Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.Inventors: Colin Bruce Clement, Shao Kun Deng, Xiaoyu Liu, Neelakantan Sundaresan, Alexey Svyatkovskiy
-
Publication number: 20240256230Abstract: A code completion tool uses a neural transformer model with attention to generate candidate sequences to complete a method body of a method signature. The neural transformer model is trained with source code programs and natural language text. The neural transformer model learns the meaning of a method name, its corresponding method parameters and types from a large corpus of unsupervised dataset of source code methods and a supervised dataset of tasks including source code constructs in combination with natural language docstrings to infer a candidate sequence of subtokens that represent a method body for a particular method signature.Type: ApplicationFiled: April 11, 2024Publication date: August 1, 2024Inventors: COLIN BRUCE CLEMENT, DAWN DRAIN, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY
-
Patent number: 12045592Abstract: An automated system for translating source code written in one programming language into a different programming language utilizes a neural transformer with attention trained on semi-supervised data. The model is jointly pre-trained with a masked language model objective and an autoregressive objective on a large unsupervised source code corpus to learn to comprehend the syntactic structure and semantics of source code. The pre-trained model is then fine-tuned with a token-type prediction objective and an autoregressive objective on supervised translation tasks and data augmented tasks to learn to translate source code from one programming language into a different programming language.Type: GrantFiled: March 25, 2021Date of Patent: July 23, 2024Assignee: Microsoft Technology Licensing, LLC.Inventors: Colin Bruce Clement, Dawn Drain, Neelakantan Sundaresan, Alexey Svyatkovskiy, Chen Wu
-
Publication number: 20240232519Abstract: Generally discussed herein are devices, systems, and methods for generating an automatic interactive digital notebook completion model. A method can include receiving notebook content of an interactive digital notebook, the notebook content including a markdown cell followed by a code cell. The method can include generating input/output examples by, for each input/output example by masking one of (i) content of the markdown cell or (ii) content of the code cell resulting in a masked cell, identifying the masked cell and content of another cell of the markdown cell or the code that is not masked as an input for an input/output example, and identifying the content of the masked cell as an output for the input/output example. The method can include training, based on the input/output examples, a natural language processing model that generates a prediction of the content of a second masked cell as an output.Type: ApplicationFiled: March 20, 2024Publication date: July 11, 2024Inventors: Colin Bruce CLEMENT, Shubham Chandel, Guillermo Serrato Castilla, Neelakantan Sundaresan
-
Publication number: 20240220215Abstract: Custom source code generation models are generated by tuning a pre-trained deep learning model by freezing the model parameters and optimizing a prefix. The tuning process is distributed across a user space and a model space where the embedding and output layers are performed in the user space and the execution of the model is performed in a model space that is isolated from the user space. The tuning process updates the embeddings of the prefix across the separate execution spaces in a manner that preserves the privacy of the data used in the tuning process.Type: ApplicationFiled: March 13, 2024Publication date: July 4, 2024Inventors: COLIN BRUCE CLEMENT, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY, MICHELE TUFANO, ANDREI ZLOTCHEVSKI
-
Publication number: 20240211224Abstract: A neural transcompilation model is tested with a set of syntax unit tests to determine the syntax elements of a source code program written in a source programming language that fail to translate properly into a target programming language. The syntax elements having a translation defect is identified and ranked according a translation failure rate. The neural transcompilation model is then fine-tuned with training samples of the syntax elements having the highest translation failure rates and their paired correct translation in order to teach the model to learn the association between the syntax elements of a source programming language causing translation defects and its correct translation in a target programming language.Type: ApplicationFiled: December 23, 2022Publication date: June 27, 2024Inventors: COLIN BRUCE CLEMENT, YUFAN HUANG, NEELAKANTAN SUNDARESAN, YIDING TIAN, MAOQUAN WANG
-
Publication number: 20240160940Abstract: A transfer learning system is used for the development of neural transformer models pertaining to software engineering tasks. The transfer learning system trains source code domain neural transformer models with attention in various configurations on a large corpus of unsupervised training dataset of source code programs and/or source code-related natural language text. A web service provides the trained models for use in developing a model that may be fine-tuned on a supervised training dataset associated with a software engineering task thereby generating a tool to perform the software engineering task.Type: ApplicationFiled: January 17, 2024Publication date: May 16, 2024Inventors: COLIN BRUCE CLEMENT, DAWN DRAIN, NEELAKANTAN SUNDARESAN, ALEXEY SVYATKOVSKIY
-
Patent number: 11983513Abstract: A neural transformer model with attention is trained to predict candidates to complete a line of source code with a zero-inference capability. The model is trained on an unsupervised training dataset that includes features from source code written in multiple programming languages. The features include a file-level context and a local context, where the file-level context includes a global context, a class context, a function context, and/or a method context for each class, function and/or method of the source code programs used in the training dataset. The local context includes method bodies, function bodies, and/or stand-alone code of main method routines. From these features, the model is able to learn to predict an ordered sequence of code elements that complete a line of source code in a programming language seen and not seen during training.Type: GrantFiled: May 24, 2023Date of Patent: May 14, 2024Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC.Inventors: Colin Bruce Clement, Shuai Lu, Neelakantan Sundaresan, Alexey Svyatkovskiy, Duyu Tang