Patents by Inventor Junnan LI

Junnan LI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240301719
    Abstract: This disclosure provides a lock having a rotating component, the rotation of which drives the retractable movement of a latch bolt, and the rotating component is equipped with a trigger component; a first sensing component, fixedly set or arranged at the first position of the lock, where the first position is the position indicated by a knob fixedly connected to the rotating component when the lock is in the unlocked state, and when the trigger component is at the first position, the first sensing component generates a first trigger signal; a second sensing component is fixedly set or arranged at the second position of the lock, where the second position is the position indicated by the knob when the lock is in the locked state, and when the trigger component is at the second position, the second sensing component generates a second trigger signal.
    Type: Application
    Filed: March 8, 2024
    Publication date: September 12, 2024
    Inventors: Junnan Li, Da Liang, Liying Chen, Yixi Peng, Dezhou Chang, Ping Liu, Dejun Liu
  • Patent number: 12081873
    Abstract: The present invention provides an image processing apparatus (10) including a detection unit (12) that detects a plurality of predetermined points of a body of each of a plurality of persons from an image in an image circle of a fisheye image, a gravity direction determination unit (13) that determines a gravity direction in a position of each of the plurality of persons from the plurality of predetermined points, a reference point decision unit (14) that decides a reference point, based on the gravity direction in the position of each of the plurality of persons, a complementary circular image generation unit (16) that generates a complementary circular image that is a circular image acquired by adding a complementary image to the image in the image circle of the fisheye image, and that has, as a center, the reference point different from a center of the image in the image circle, and an expansion unit (17) that panoramically expands the complementary circular image, based on the reference point, and generat
    Type: Grant
    Filed: June 13, 2019
    Date of Patent: September 3, 2024
    Assignee: NEC CORPORATION
    Inventors: Jianquan Liu, Junnan Li
  • Publication number: 20240289606
    Abstract: Embodiments described herein provide a mixture of encoder-decoder Transformer framework for multi-task pretraining and flexible finetuning for both code understanding and generation tasks. Specifically, the framework is built on multimodal encoder and decoder modules. During pre-training, the encoder-decoder framework is trained with multiple learning objectives, including a diverse set of self-supervised tasks over two major stages of pretraining on unimodal and bimodal data.
    Type: Application
    Filed: February 24, 2023
    Publication date: August 29, 2024
    Inventors: Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Junnan Li, Chu Hong Hoi
  • Patent number: 12056610
    Abstract: A learning mechanism with partially-labeled web images is provided while correcting the noise labels during the learning. Specifically, the mechanism employs a momentum prototype that represents common characteristics of a specific class. One training objective is to minimize the difference between the normalized embedding of a training image sample and the momentum prototype of the corresponding class. Meanwhile, during the training process, the momentum prototype is used to generate a pseudo label for the training image sample, which can then be used to identify and remove out of distribution (OOD) samples to correct the noisy labels from the original partially-labeled training images. The momentum prototype for each class is in turn constantly updated based on the embeddings of new training samples and their pseudo labels.
    Type: Grant
    Filed: August 28, 2020
    Date of Patent: August 6, 2024
    Assignee: Salesforce, Inc.
    Inventors: Junnan Li, Chu Hong Hoi
  • Publication number: 20240239848
    Abstract: A holothurian-derived active peptide with immune activity, a preparation method and an application thereof are provided in the present disclosure, belonging to the field of biotechnology. The amino acid sequence of the holothurian-derived active peptide is shown in SEQ ID NO.1. The amino acid sequence of the holothurian-derived immunocompetent peptide is obtained using separation and purification techniques and mass spectrometry identification techniques with the help of the holothurian protein database, and the interaction of the immunocompetent peptide with the membrane recognition receptor (toll-like receptor 2) is explored by molecular docking technique and multiple rounds of screening, and the in vitro immunological activity of the peptide is verified by solid-phase synthesis and RAW264.7 cell model.
    Type: Application
    Filed: December 26, 2023
    Publication date: July 18, 2024
    Inventors: Xuepeng LI, Junnan YANG, Jianrong LI, Jinxiang WANG, Fangchao CUI, Tingting LI, Yingmei LI, Qing YANG, Zhengpeng WEI
  • Patent number: 12039772
    Abstract: The present invention provides a processing system (10) including: a sample image generation unit (11) that generates a plurality of sample images being each associated with a partial region of a first image generated using a first lens; an estimation unit (12) that generates an image content estimation result indicating a content for each of the sample images using an estimation model generated by machine learning using a second image generated using a second lens differing from the first lens; a task execution unit (14) that estimates a relative positional relationship of a plurality of the sample images in the first image; a determination unit (15) that determines whether an estimation result of the relative positional relationship is correct; and a correction unit (16) that corrects a value of a parameter of the estimation model when the estimation result of the relative positional relationship is determined to be incorrect.
    Type: Grant
    Filed: April 5, 2019
    Date of Patent: July 16, 2024
    Assignee: NEC CORPORATION
    Inventors: Jianquan Liu, Junnan Li
  • Patent number: 11989941
    Abstract: Embodiments described a method of video-text pre-learning to effectively learn cross-modal representations from sparse video frames and text. Specifically, an align and prompt framework provides a video and language pre-training framework that encodes the frames and text independently using a transformer-based video encoder and a text encoder. A multi-modal encoder is then employed to capture cross-modal interaction between a plurality of video frames and a plurality of texts. The pre-training includes a prompting entity modeling that enables the model to capture fine-grained region-entity alignment.
    Type: Grant
    Filed: December 30, 2021
    Date of Patent: May 21, 2024
    Assignee: Salesforce, Inc.
    Inventors: Dongxu Li, Junnan Li, Chu Hong Hoi
  • Publication number: 20240161369
    Abstract: Embodiments described herein provide systems and methods of subject-driven image generation. In at least one embodiment, a system receives, via a data interface, an image containing a subject, a text description of the subject in the image, and a text prompt relating to a different rendition of the subject. The system encodes, via an image encoder, the image into an image feature vector. The system encodes, via a text encoder, the text description int a text feature vector. The system generates, by a multimodal encoder, a vector representation of the subject based on the image feature vector and the text feature vector. The system generates, by a neural network based image generation model, an output image based on an input combining the text prompt and the vector representation.
    Type: Application
    Filed: October 31, 2023
    Publication date: May 16, 2024
    Inventors: Junnan Li, Chu Hong Hoi, Dongxu Li
  • Publication number: 20240160858
    Abstract: Embodiments described herein provide a method of generating a vision-language task output to a text instruction relating to an input image, the method comprising receiving, via a data interface, the input image and the text instruction comprising an instruction relating to the image. The method further includes encoding, via an image encoder, the image into a first image representation. The method further includes generating, by a multimodal encoder, a second image representation based on cross-attending the first image representation to the text instruction. The method further includes generating, by a neural network based language model, a vision-language task output in response to the text instruction based on an input combining the second image representation and the text instruction.
    Type: Application
    Filed: November 9, 2023
    Publication date: May 16, 2024
    Inventors: Wenliang Dai, Junnan Li, Chu Hong Hoi, Dongxu Li
  • Publication number: 20240160853
    Abstract: Embodiments described herein provide a multimodal vision-language model. The multimodal vision-language model contains a Generalist Multimodal Transformer capable of complete multiple tasks using the same set of parameters learning from pre-training. The Generalist Multimodal Transformer allows alignment between frozen, unimodal encoders, such as image encoders and large language models. The Generalist Multimodal Transformer eliminates the need for fine-tuning the image encoders and large language models.
    Type: Application
    Filed: January 27, 2023
    Publication date: May 16, 2024
    Inventors: Junnan Li, Chu Hong Hoi
  • Publication number: 20240161520
    Abstract: Embodiments described herein provide a multimodal vision-language model. The multimodal vision-language model contains a Generalist Multimodal Transformer capable of complete multiple tasks using the same set of parameters learning from pre-training. The Generalist Multimodal Transformer allows alignment between frozen, unimodal encoders, such as image encoders and large language models. The Generalist Multimodal Transformer eliminates the need for fine-tuning the image encoders and large language models.
    Type: Application
    Filed: January 27, 2023
    Publication date: May 16, 2024
    Inventors: Junnan Li, Chu Hong Hoi
  • Publication number: 20240119257
    Abstract: Embodiments described herein provide systems and methods for providing zero-shot visual question answering. A first image and a first question relating to a visual content of the first image are received. One or more image captions relevant to the first question are determined using a visual-language neural model by determining portions of the first image relevant to the first question. Answer candidates are generated using the one or more image captions, answer candidates. Synthetic question-answer pairs are generated using synthetic questions generated using the answer candidates and the answer candidates. A prompt is generated by concatenating the synthetic question-answer pairs. A first answer to the first question is generated using a language network model using an input of the first question prepended with the prompt.
    Type: Application
    Filed: January 4, 2023
    Publication date: April 11, 2024
    Inventors: Jiaxian GUO, Junnan LI, Chu Hong HOI
  • Publication number: 20240054350
    Abstract: Embodiments described herein provide systems and methods for federated learning. A central system may store a neural network model which has a body of a number of layers, and a classification layer comprising class prototypes which classifies the latent representations output by the body of the model. The central system may initialize the class prototypes so that they are uniformly distributed in the representation space. The model and class prototypes may be broadcast to a number of client systems, which update the body of the model locally while keeping the class prototypes fixed. The clients may return information to the central system including updated local model parameters, and a local representation of the classes based on the latent representation of items in the local training data. Based on the information from the clients, the neural network model may be updated. This process may be repeated iteratively.
    Type: Application
    Filed: December 9, 2022
    Publication date: February 15, 2024
    Inventors: Yutong Dai, Zeyuan Chen, Junnan Li
  • Publication number: 20230419652
    Abstract: Embodiments described herein provide a zero-shot visual question answering (VQA) framework, which conjoins foundation network models with zero additional training. A first image and a question relating to the first image are received. The first image is divided into a plurality of image patches. A plurality of relevant image patches that are relevant to the question are determined, using a first neural network model, from the plurality of image patches. A plurality of image captions are generated, using a second neural network model, based on the plurality of relevant image patches. An answer to the question is generated based on the plurality of image captions.
    Type: Application
    Filed: September 23, 2022
    Publication date: December 28, 2023
    Inventors: Anthony Meng Huat Tiong, Junnan Li, Chu Hong Hoi
  • Publication number: 20230359900
    Abstract: Embodiments described herein provide a masked self-training (MaST) which is an unsupervised learning approach leveraging two complimentary sources of supervision: pseudo-labels and raw image pixels. Specifically, MaST jointly optimizes three objectives to finetune a pre-trained classification model on unlabeled images: (1) self-training objective to learn global task-specific class prediction; (2) masked image modeling objective to learn local pixel-level information; (3) global-local feature alignment objective to bridge the knowledge learned from the two sources of supervision.
    Type: Application
    Filed: May 27, 2022
    Publication date: November 9, 2023
    Inventors: Junnan Li, Chu Hong Hoi
  • Patent number: 11776236
    Abstract: The system and method are directed to a prototypical contrastive learning (PCL). The PCL explicitly encodes the hierarchical semantic structure of the dataset into the learned embedding space and prevents the network from exploiting low-level cues for solving the unsupervised learning task. The PCL includes prototypes as the latent variables to help find the maximum-likelihood estimation of the network parameters in an expectation-maximization framework. The PCL iteratively performs an E-step for finding prototypes with clustering and M-step for optimizing the network on a contrastive loss.
    Type: Grant
    Filed: February 2, 2022
    Date of Patent: October 3, 2023
    Assignee: Salesforce.com, Inc.
    Inventors: Junnan Li, Chu Hong Hoi
  • Publication number: 20230237773
    Abstract: Embodiments described herein provide bootstrapping language-images pretraining for unified vision-language understanding and generation (BLIP), a unified VLP framework which transfers flexibly to both vision-language understanding and generation tasks. BLIP enables a wider range of downstream tasks, improving on both shortcomings of existing models.
    Type: Application
    Filed: May 16, 2022
    Publication date: July 27, 2023
    Inventors: Junnan Li, Chu Hong Hoi
  • Publication number: 20230237772
    Abstract: Embodiments described herein provide bootstrapping language-images pre-training for unified vision-language understanding and generation (BLIP), a unified VLP framework which transfers flexibly to both vision-language understanding and generation tasks. BLIP enables a wider range of downstream tasks, improving on both shortcomings of existing models.
    Type: Application
    Filed: May 16, 2022
    Publication date: July 27, 2023
    Inventors: Junnan Li, Chu Hong Hoi
  • Publication number: 20230154146
    Abstract: Embodiments described a method of video-text pre-learning to effectively learn cross-modal representations from sparse video frames and text. Specifically, an align and prompt framework provides a video and language pre-training framework that encodes the frames and text independently using a transformer-based video encoder and a text encoder. A multi-modal encoder is then employed to capture cross-modal interaction between a plurality of video frames and a plurality of texts. The pre-training includes a prompting entity modeling that enables the model to capture fine-grained region-entity alignment.
    Type: Application
    Filed: December 30, 2021
    Publication date: May 18, 2023
    Inventors: Dongxu Li, Junnan Li, Chu Hong Hoi
  • Publication number: 20230154188
    Abstract: Embodiments described a method of video-text pre-learning to effectively learn cross-modal representations from sparse video frames and text. Specifically, an align and prompt framework provides a video and language pre-training framework that encodes the frames and text independently using a transformer-based video encoder and a text encoder. A multi-modal encoder is then employed to capture cross-modal interaction between a plurality of video frames and a plurality of texts. The pre-training includes a prompting entity modeling that enables the model to capture fine-grained region-entity alignment.
    Type: Application
    Filed: December 30, 2021
    Publication date: May 18, 2023
    Inventors: Dongxu Li, Junnan Li, Chu Hong Hoi