Patents by Inventor Chu Hong Hoi

Chu Hong Hoi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11989941
    Abstract: Embodiments described a method of video-text pre-learning to effectively learn cross-modal representations from sparse video frames and text. Specifically, an align and prompt framework provides a video and language pre-training framework that encodes the frames and text independently using a transformer-based video encoder and a text encoder. A multi-modal encoder is then employed to capture cross-modal interaction between a plurality of video frames and a plurality of texts. The pre-training includes a prompting entity modeling that enables the model to capture fine-grained region-entity alignment.
    Type: Grant
    Filed: December 30, 2021
    Date of Patent: May 21, 2024
    Assignee: Salesforce, Inc.
    Inventors: Dongxu Li, Junnan Li, Chu Hong Hoi
  • Publication number: 20240161369
    Abstract: Embodiments described herein provide systems and methods of subject-driven image generation. In at least one embodiment, a system receives, via a data interface, an image containing a subject, a text description of the subject in the image, and a text prompt relating to a different rendition of the subject. The system encodes, via an image encoder, the image into an image feature vector. The system encodes, via a text encoder, the text description int a text feature vector. The system generates, by a multimodal encoder, a vector representation of the subject based on the image feature vector and the text feature vector. The system generates, by a neural network based image generation model, an output image based on an input combining the text prompt and the vector representation.
    Type: Application
    Filed: October 31, 2023
    Publication date: May 16, 2024
    Inventors: Junnan Li, Chu Hong Hoi, Dongxu Li
  • Publication number: 20240160853
    Abstract: Embodiments described herein provide a multimodal vision-language model. The multimodal vision-language model contains a Generalist Multimodal Transformer capable of complete multiple tasks using the same set of parameters learning from pre-training. The Generalist Multimodal Transformer allows alignment between frozen, unimodal encoders, such as image encoders and large language models. The Generalist Multimodal Transformer eliminates the need for fine-tuning the image encoders and large language models.
    Type: Application
    Filed: January 27, 2023
    Publication date: May 16, 2024
    Inventors: Junnan Li, Chu Hong Hoi
  • Publication number: 20240160858
    Abstract: Embodiments described herein provide a method of generating a vision-language task output to a text instruction relating to an input image, the method comprising receiving, via a data interface, the input image and the text instruction comprising an instruction relating to the image. The method further includes encoding, via an image encoder, the image into a first image representation. The method further includes generating, by a multimodal encoder, a second image representation based on cross-attending the first image representation to the text instruction. The method further includes generating, by a neural network based language model, a vision-language task output in response to the text instruction based on an input combining the second image representation and the text instruction.
    Type: Application
    Filed: November 9, 2023
    Publication date: May 16, 2024
    Inventors: Wenliang Dai, Junnan Li, Chu Hong Hoi, Dongxu Li
  • Publication number: 20240161520
    Abstract: Embodiments described herein provide a multimodal vision-language model. The multimodal vision-language model contains a Generalist Multimodal Transformer capable of complete multiple tasks using the same set of parameters learning from pre-training. The Generalist Multimodal Transformer allows alignment between frozen, unimodal encoders, such as image encoders and large language models. The Generalist Multimodal Transformer eliminates the need for fine-tuning the image encoders and large language models.
    Type: Application
    Filed: January 27, 2023
    Publication date: May 16, 2024
    Inventors: Junnan Li, Chu Hong Hoi
  • Publication number: 20240134773
    Abstract: Embodiments described herein provide a unified debugging framework that adapts a pretrained programming language model for line-level debugging and repair. Specifically, the debugging framework follow the logic of programmers on how to debug their code. For example, the debugging framework first determines whether or not a function is buggy. If it is buggy, the debugging framework localizes the problematic line and provide a patch (repair).
    Type: Application
    Filed: October 17, 2022
    Publication date: April 25, 2024
    Inventors: Nghi Bui, Yue Wang, Chu Hong Hoi
  • Publication number: 20240119257
    Abstract: Embodiments described herein provide systems and methods for providing zero-shot visual question answering. A first image and a first question relating to a visual content of the first image are received. One or more image captions relevant to the first question are determined using a visual-language neural model by determining portions of the first image relevant to the first question. Answer candidates are generated using the one or more image captions, answer candidates. Synthetic question-answer pairs are generated using synthetic questions generated using the answer candidates and the answer candidates. A prompt is generated by concatenating the synthetic question-answer pairs. A first answer to the first question is generated using a language network model using an input of the first question prepended with the prompt.
    Type: Application
    Filed: January 4, 2023
    Publication date: April 11, 2024
    Inventors: Jiaxian GUO, Junnan LI, Chu Hong HOI
  • Patent number: 11915500
    Abstract: A system uses a neural network based model to perform scene text recognition. The system achieves high accuracy of prediction of text from scenes based on a neural network architecture that uses double attention mechanism. The neural network based model includes a convolutional neural network component that outputs a set of visual features and an attention extractor neural network component that determines attention scores based on the visual features. The visual features and the attention scores are combined to generate mixed features that are provided as input to a character recognizer component that determines a second attention score and recognizes the characters based on the second attention score. The system trains the neural network based model by adjusting the neural network parameters to minimize a multi-class gradient harmonizing mechanism (GHM) loss. The multi-class GHM loss varies based on a level of difficulty of the sample.
    Type: Grant
    Filed: January 28, 2021
    Date of Patent: February 27, 2024
    Assignee: Salesforce, Inc.
    Inventors: Pan Zhou, Peng Tang, Ran Xu, Chu Hong Hoi
  • Publication number: 20240020102
    Abstract: Embodiments described herein a code generation and understanding model that builds on a Transformer-based encoder-decoder framework. The code generation and understanding model is configured to derive generic representations for programming language (PL) and natural language (NL) in code domain via pre-training on unlabeled code corpus, and then to benefit many code-related downstream tasks with fine-tuning. Apart from the denoising sequence-to-sequence objectives widely adopted for pre-training on natural language, identifier tagging and prediction pre-training objective is adopted to enable the model to better leverage the crucial token type information from PL, which specifically are the identifiers assigned by developers.
    Type: Application
    Filed: September 26, 2023
    Publication date: January 18, 2024
    Inventors: Yue Wang, Weishi Wang, Shafiq Rayhan Joty, Chu Hong Hoi
  • Publication number: 20240020486
    Abstract: Embodiments described herein provide a parameter-efficient finetuning mechanism, referred to as “factor-tuning,” which first learns a compact representation of parameter changes with existing datasets on multiple domains, and then fine-tunes a small number of parameters (automatically extracted from the learned representation) on a new downstream task. In this way, the representation learned in the first step is shared across domains and transferred to new downstream tasks.
    Type: Application
    Filed: December 9, 2022
    Publication date: January 18, 2024
    Inventors: Wenzhuo Yang, Chu Hong Hoi, Kun Zhang
  • Publication number: 20230419037
    Abstract: Embodiments described herein provide label modular prompts for a text classification task. A label modular prompt generator may determine a set of class labels of interest from a set of possible class labels associated with an input text sequence. The label modular prompt generator may generate a plurality of label prompts based on the set of class labels of interest. A first class label and a sequence of soft tokens that are generated based on representations associated with the first class label are concatenated into a first label prompt. The soft tokens are tunable using a plurality of parameters of the label modular prompt generator. The label modular prompt generator may provide an input of the input text sequence prepended with the plurality of label prompts to a pretrained language model. The pretrained language model may generate a task output in response to the input text sequence.
    Type: Application
    Filed: November 28, 2022
    Publication date: December 28, 2023
    Inventors: Hailin CHEN, Amrita SAHA, Shafiq Rayhan JOTY, Chu Hong HOI
  • Publication number: 20230419049
    Abstract: Embodiments described herein provide training a prompt generator for text classification. A first training dataset associated with a first plurality of class labels is received for a first training process. For a first instance of the first training dataset, a set of labels of interest is generated by sampling from a set of possible class labels including the first plurality of class labels. The prompt generator generates a first prompt based on the set of labels of interest. A pretrained language model generates a task output in response to an input of the first instance prepended with the first prompt. A loss objective is generated based on the task output and the set of labels of interest. Parameters of the prompt generator are updated based on the computed loss function via backpropagation while the pretrained language model is frozen.
    Type: Application
    Filed: November 28, 2022
    Publication date: December 28, 2023
    Inventors: Hailin CHEN, Amrita SAHA, Shafiq Rayhan JOTY, Chu Hong HOI
  • Publication number: 20230419652
    Abstract: Embodiments described herein provide a zero-shot visual question answering (VQA) framework, which conjoins foundation network models with zero additional training. A first image and a question relating to the first image are received. The first image is divided into a plurality of image patches. A plurality of relevant image patches that are relevant to the question are determined, using a first neural network model, from the plurality of image patches. A plurality of image captions are generated, using a second neural network model, based on the plurality of relevant image patches. An answer to the question is generated based on the plurality of image captions.
    Type: Application
    Filed: September 23, 2022
    Publication date: December 28, 2023
    Inventors: Anthony Meng Huat Tiong, Junnan Li, Chu Hong Hoi
  • Publication number: 20230409901
    Abstract: Systems and methods for providing a neural network system for time series forecasting are described. A time series dataset that includes datapoints at a plurality of timestamps in an observed space is received. The neural network system is trained using the time series dataset. The training the neural network includes: generating, using an encoder of the neural network system, one or more estimated latent variables of a latent space for the time series dataset; generating, using an auxiliary predictor of the neural network system, a first latent-space prediction result based on the one or more estimated latent variables; transforming, using a decoder of the neural network system, the first latent-space prediction result to a first observed-space prediction result; and updating parameters of the neural network system based on a loss based on the first observed-space prediction result.
    Type: Application
    Filed: September 16, 2022
    Publication date: December 21, 2023
    Inventors: Chenghao Liu, Chu Hong Hoi, Kun Zhang
  • Patent number: 11836037
    Abstract: Some embodiments of the current disclosure disclose methods and systems for analyzing root causes of an incident disrupting information technology services such as cloud services. In some embodiments, a set of problem review board (PRB) documents including information about said incidents may be parsed using a natural language processing (NLP) neural model to extract structured PRB data from the unstructured investigative information contained in the PRB documents. The structured PRB data may include symptoms of the incident, root causes of the incident, resolutions of the incidents, etc., and a causal knowledge graph causally relating the symptoms, root causes, resolutions of the incidents may be generated.
    Type: Grant
    Filed: September 16, 2021
    Date of Patent: December 5, 2023
    Assignee: salesforce.com, inc.
    Inventors: Amrita Saha, Chu Hong Hoi
  • Publication number: 20230376734
    Abstract: Systems and methods for providing a neural network system for time series forecasting are described. A time series dataset that includes datapoints at a plurality of timestamps in an observed space is received. A first state-space model of a dynamical system underlying the time series dataset is provided. The first state-space model includes a non-parametric latent transition model. One or more latent variables of a latent space for the time series dataset are determined using the neural network system based on the first state-space model. A first prediction result for the time series dataset is provided by the neural network system based on the estimated latent variables.
    Type: Application
    Filed: September 16, 2022
    Publication date: November 23, 2023
    Inventors: Chenghao Liu, Chu Hong Hoi, Kun Zhang
  • Publication number: 20230376840
    Abstract: Embodiments described herein provide a reinforcement learning based framework engaging pretrained language models (LMs) for program synthesis tasks. Specifically, the framework adopts a training strategy that optimizes pretrained LMs for program synthesis tasks in an actor-critic approach.
    Type: Application
    Filed: August 26, 2022
    Publication date: November 23, 2023
    Inventors: Hung Le, Yue Wang, Akhilesh Deepak Gotmare, Chu Hong Hoi
  • Publication number: 20230376746
    Abstract: Embodiments described herein provide a time-index model for forecasting time-series data. The architecture of the model takes a normalized time index as an input, uses a model, g_?, to produce a vector representation of the time-index, and uses a “ridge regressor” which takes the vector representation and provides an estimated value. The model may be trained on a time-series dataset. The ridge regressor is trained for a given g_? to reproduce a given lookback window. g_? is trained over time-indexes in a horizon window, such that g_? and the corresponding ridge regressor will accurately predict the data in the horizon window. Once g_? is sufficiently trained, the ridge regressor can be updated based on that final g_? over a lookback window comprising the time-indexes with the last known values. The final g_? together with the updated ridge regressor can be given time-indexes past the known values, thereby providing forecasted values.
    Type: Application
    Filed: September 7, 2022
    Publication date: November 23, 2023
    Inventors: Gerald Woo, Chenghao Liu, Doyen Sahoo, Chu Hong Hoi
  • Publication number: 20230376841
    Abstract: Embodiments described herein provide a reinforcement learning based framework engaging pretrained language models (LMs) for program synthesis tasks. Specifically, the framework adopts a training strategy that optimizes pretrained LMs for program synthesis tasks in an actor-critic approach.
    Type: Application
    Filed: August 26, 2022
    Publication date: November 23, 2023
    Inventors: Hung Le, Yue Wang, Akhilesh Deepak Gotmare, Chu Hong Hoi
  • Publication number: 20230376401
    Abstract: Systems and methods for automatic program repair using neural network models are described. After a first buggy code patch is received, a first representation of the first buggy code patch is generated using a retriever encoder of a patch retriever. The patch retriever retrieves, based on the first representation, a first bug-fix code pair from a first plurality of bug-fix code pairs. A first augmented buggy code patch is generated based on the first buggy code patch and the first bug-fix code pair. A patch generator generates a fixed code patch based on the first augmented buggy code patch.
    Type: Application
    Filed: August 26, 2022
    Publication date: November 23, 2023
    Inventors: Yue Wang, Weishi Wang, Shafiq Rayhan Joty, Chu Hong Hoi