Patents by Inventor Heewoo JUN

Heewoo JUN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Systems and methods for language model-based text insertion

Patent number: 11886826

Abstract: Disclosed herein are methods, systems, and computer-readable media for automatically generating and inserting text. In an embodiment, a method may include receiving an input text prompt comprising a prefix portion and a suffix portion. The method may also include accessing a language model based on the input text prompt, and determining a set of context parameters based on the input text prompt and the language model. The method may also include generating an output text prompt based on the set of context parameters and the language model, and inserting the output text prompt into the input text prompt.

Type: Grant

Filed: March 14, 2023

Date of Patent: January 30, 2024

Assignee: OpenAI Opco LLC

Inventors: Mohammad Bavarian, Heewoo Jun
SYSTEMS AND METHODS FOR GENERATING NATURAL LANGUAGE USING LANGUAGE MODELS TRAINED ON COMPUTER CODE

Publication number: 20240020116

Abstract: Disclosed herein are methods, systems, and computer-readable media for generating natural language based on computer code input. In an embodiment, a method may comprise one or more of: accessing a docstring generation model configured to generate docstrings from computer code; receiving one or more computer code samples; generating, using the docstring generation model and based on the received one or more computer code samples, one or more candidate docstrings representing natural language text, each of the one or more candidate docstrings being associated with at least a portion of the one or more computer code samples; identifying at least one of the one or more candidate docstrings that provides an intent of the at least a portion of the one or more computer code samples; and/or outputting, via a user interface, the at least one identified docstring with the at least a portion of the one or more computer code samples.

Type: Application

Filed: May 23, 2023

Publication date: January 18, 2024

Applicant: OpenAI Opco, LLC

Inventors: Mark CHEN, Jerry TWOREK, Ilya SUTSKEVER, Wojciech ZAREMBA, Heewoo JUN, Henrique PONDE DE OLIVEIRA PINTO
SYSTEMS AND METHODS FOR GENERATING CODE USING LANGUAGE MODELS TRAINED ON COMPUTER CODE

Publication number: 20240020096

Abstract: Disclosed herein are methods, systems, and computer-readable media for generating computer code based on natural language input. In an embodiment, a method may comprise one or more of: receiving a docstring representing natural language text specifying a digital programming result; generating, using a trained machine learning model, and based on the docstring, a computer code sample configured to produce respective candidate results; causing the computer code sample to be executed; identifying, based on the executing, a computer code sample configured to produce a particular candidate result associated with the digital programming result; performing at least one of outputting, via a user interface, the identified computer code sample, compiling the identified computer code sample, transmitting the identified computer code sample to a recipient device, storing the identified computer code sample, and/or re-executing the identified computer code sample.

Type: Application

Filed: May 23, 2023

Publication date: January 18, 2024

Applicant: OpenAI Opco, LLC

Inventors: Mark CHEN, Jerry TWOREK, Ilya SUTSKEVER, Wojciech ZAREMBA, Heewoo JUN, Henrique PONDE DE OLIVEIRA PINTO
Cold fusing sequence-to-sequence models with language models

Patent number: 11620986

Abstract: Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.

Type: Grant

Filed: October 1, 2020

Date of Patent: April 4, 2023

Assignee: Baidu USA LLC

Inventors: Anuroop Sriram, Heewoo Jun, Sanjeev Satheesh, Adam Coates
COLD FUSING SEQUENCE-TO-SEQUENCE MODELS WITH LANGUAGE MODELS

Publication number: 20210027767

Abstract: Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.

Type: Application

Filed: October 1, 2020

Publication date: January 28, 2021

Applicant: Baidu USA LLC

Inventors: Anuroop SRIRAM, Heewoo JUN, Sanjeev SATHEESH, Adam COATES
Cold fusing sequence-to-sequence models with language models

Patent number: 10867595

Abstract: Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.

Type: Grant

Filed: March 6, 2018

Date of Patent: December 15, 2020

Assignee: Baidu USA LLC

Inventors: Anuroop Sriram, Heewoo Jun, Sanjeev Satheesh, Adam Coates
Systems and methods for principled bias reduction in production speech models

Patent number: 10657955

Abstract: Described herein are systems and methods to identify and address sources of bias in an end-to-end speech model. In one or more embodiments, the end-to-end model may be a recurrent neural network with two 2D-convolutional input layers, followed by multiple bidirectional recurrent layers and one fully connected layer before a softmax layer. In one or more embodiments, the network is trained end-to-end using the CTC loss function to directly predict sequences of characters from log spectrograms of audio. With optimized recurrent layers and training together with alignment information, some unwanted bias induced by using purely forward only recurrences may be removed in a deployed model.

Type: Grant

Filed: January 30, 2018

Date of Patent: May 19, 2020

Assignee: Baidu USA LLC

Inventors: Eric Battenberg, Rewon Child, Adam Coates, Christopher Fougner, Yashesh Gaur, Jiaji Huang, Heewoo Jun, Ajay Kannan, Markus Kliegl, Atul Kumar, Hairong Liu, Vinay Rao, Sanjeev Satheesh, David Seetapun, Anuroop Sriram, Zhenyao Zhu
COLD FUSING SEQUENCE-TO-SEQUENCE MODELS WITH LANGUAGE MODELS

Publication number: 20180336884

Abstract: Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.

Type: Application

Filed: March 6, 2018

Publication date: November 22, 2018

Applicant: Baidu USA LLC

Inventors: Anuroop SRIRAM, Heewoo JUN, Sanjeev SATHEESH, Adam COATES
SYSTEMS AND METHODS FOR PRINCIPLED BIAS REDUCTION IN PRODUCTION SPEECH MODELS

Publication number: 20180247643

Abstract: Described herein are systems and methods to identify and address sources of bias in an end-to-end speech model. In one or more embodiments, the end-to-end model may be a recurrent neural network with two 2D-convolutional input layers, followed by multiple bidirectional recurrent layers and one fully connected layer before a softmax layer. In one or more embodiments, the network is trained end-to-end using the CTC loss function to directly predict sequences of characters from log spectrograms of audio. With optimized recurrent layers and training together with alignment information, some unwanted bias induced by using purely forward only recurrences may be removed in a deployed model.

Type: Application

Filed: January 30, 2018

Publication date: August 30, 2018

Applicant: Baidu USA LLC

Inventors: Eric BATTENBERG, Rewon CHILD, Adam COATES, Christopher FOUGNER, Yashesh GAUR, Jiaji HUANG, Heewoo JUN, Ajay KANNAN, Markus KLIEGL, Atul KUMAR, Hairong LIU, Vinay RAO, Sanjeev SATHEESH, David SEETAPUN, Anuroop SRIRAM, Zhenyao ZHU