Patents by Inventor Xiaohui Shen

Xiaohui Shen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250218161
    Abstract: Methods and systems for generating a feature map from an image is disclosed. The vision system includes a vision model or processing the image to generate the feature map according a neural network. The vision model includes a first convolutional block for downsampling a set of image data to obtain a first stage convoluted data; a second convolutional block for downsampling the first stage convoluted data to obtain a second stage convoluted data, wherein one or both of the first convolutional block and the second convolutional block is a mobile convolution block (MBConv) that includes: a first Gaussian Error Linear Unit (GELU) layer, a depth-wise convolution (DWConv) layer having, and a resizing convolution layer; and a transformer block (TFB) generating the feature map from the second stage convoluted data.
    Type: Application
    Filed: January 2, 2024
    Publication date: July 3, 2025
    Inventors: Qihang Yu, Jieneng Chen, Xiaohui Shen, Liang-Chieh Chen
  • Patent number: 12327328
    Abstract: Methods and systems are provided for generating enhanced image. A neural network system is trained where the training includes training a first neural network that generates enhanced images conditioned on content of an image undergoing enhancement and training a second neural network that designates realism of the enhanced images generated by the first neural network. The neural network system is trained by determine loss and accordingly adjusting the appropriate neural network(s). The trained neural network system is used to generate an enhanced aesthetic image from a selected image where the output enhanced aesthetic image has increased aesthetics when compared to the selected image.
    Type: Grant
    Filed: July 19, 2021
    Date of Patent: June 10, 2025
    Assignee: Adobe Inc.
    Inventors: Xiaohui Shen, Zhe Lin, Xin Lu, Sarah Aye Kong, I-Ming Pao, Yingcong Chen
  • Publication number: 20250157235
    Abstract: A computing system including one or more processing devices configured to receive an image. The processing devices are further configured to compute a segmentation mask that identifies a region of interest included in the image. At a feature extractor, the processing devices are further configured to compute encoded image features based on the image. The processing devices are further configured to receive a text instruction. At a visual resampler, the processing devices are further configured to compute a mask query based on the segmentation mask, the encoded image features, and the text instruction. At a generative language model, the processing devices are further configured to receive a natural language query that includes the mask query and the text instruction. Based on the natural language query, at the generative language model, the processing devices are further configured to generate and output a semantic label associated with the region of interest.
    Type: Application
    Filed: November 14, 2023
    Publication date: May 15, 2025
    Inventors: Qihang Yu, Xiaohui Shen, Liang-Chieh Chen
  • Publication number: 20250113087
    Abstract: The present disclosure describes techniques for implementing video segmentation. A video is divided into a plurality of clips. Each of the plurality of clips comprises several frames. Axial-trajectory attention is applied to each of the plurality of clips by a first sub-model. Clip features corresponding to each of the plurality of clips are generated by the first sub-model. A set of object queries corresponding to each of the plurality of clips is generated based on the clip features by a transformer decoder. Trajectory attention is applied to refine sets of object queries corresponding to the plurality of clips by a second sub-model. Video-level segmentation results are generated based on the refined object queries.
    Type: Application
    Filed: December 22, 2023
    Publication date: April 3, 2025
    Inventors: Ju He, Qihang Yu, Inkyu Shin, Xueqing Deng, Xiaohui Shen, Liang-Chieh Chen
  • Publication number: 20250104423
    Abstract: Provided in the embodiments of the present disclosure are a video processing method and device. The video processing method includes: determining a target image to be processed in a video; performing semantic segmentation on the target image through a convolutional neural network to obtain a first feature map, wherein the first feature map comprises a feature map corresponding to at least one semantic class; determining a target image region corresponding to the at least one semantic class in the target image according to the first feature map; wherein the at least one semantic class comprises an object-in-hand, and a training image adopted by the convolutional neural network in a training process is marked with an image region corresponding to the at least one semantic class.
    Type: Application
    Filed: December 27, 2022
    Publication date: March 27, 2025
    Inventors: Longyin WEN, Kai XU, Xiaohui SHEN
  • Patent number: 12260881
    Abstract: Provided are a transition type determination method, an electronic device and a storage medium. The method includes: acquiring a picture matching degree between a candidate transition type and a transition position of two adjacent video clips, and acquiring a music matching degree of the candidate transition type and background music of a video to which the two adjacent video clips belong; and determining a target transition type for the transition position according to the picture matching degree and the music matching degree, where the target transition type is used for a transition effect between the two adjacent video clips.
    Type: Grant
    Filed: August 15, 2023
    Date of Patent: March 25, 2025
    Assignees: BEIJING BYTEDANCE NETWORK TECHNOLOGY CO., LTD., BYTEDANCE INC.
    Inventors: Xiaojie Jin, Xuchen Song, Gen Li, Yan Wang, Xiaohui Shen
  • Publication number: 20250097545
    Abstract: The embodiments of the present disclosure provide a video generation method, an apparatus, a device, and a storage medium, the video generation method including: obtaining a plurality of images and music matched to the plurality of images; determining first feature information for the plurality of images and second feature information for the music; according to the first feature information, the second feature information and a plurality of pre-stored rendering effects, determining a target rendering effect combination; the rendering effects being animation, special effects or a transition; and generating a video according to the plurality of images, the music and the target rendering effect combination.
    Type: Application
    Filed: November 18, 2022
    Publication date: March 20, 2025
    Inventors: Weibo GONG, Xiaojie JIN, Ding LIU, Xiaohui SHEN
  • Publication number: 20250086758
    Abstract: The present disclosure provides an image processing method and device. The image processing method includes: performing, by an encoder and a first model, multiple iterations on an initial image to obtain a target image feature corresponding to the initial image; and performing, by a second model, image reconstruction based on the target image feature to obtain a reconstructed image of the initial image, both of the first model and the second model being neural networks for image reconstruction, wherein in the multiple iterations, an image feature extracted by the first model in the image reconstruction and an output image of the first model are feedback information for the encoder to assist the encoder in encoding the initial image.
    Type: Application
    Filed: January 13, 2023
    Publication date: March 13, 2025
    Inventors: Yichun SHI, Xiao YANG, Xiaohui SHEN
  • Publication number: 20250078570
    Abstract: The present disclosure provides an expression driving method and apparatus, and a training method and apparatus of an expression driving model. The expression driving method includes acquiring a first video; and inputting the first video into a pre-trained expression driving model to obtain a second video. The expression driving model is trained based on a target sample image and a plurality of first sample images. A facial image in the second video is generated based on the target sample image. A gesture expression feature of the facial image in the second video is the same as a gesture expression feature of a facial image in the first video.
    Type: Application
    Filed: January 4, 2023
    Publication date: March 6, 2025
    Inventors: Yizhe ZHU, Xiao YANG, Jianwei LI, Xiaohui SHEN
  • Publication number: 20250078392
    Abstract: An image generation system is described. The system comprises a neural network model configured to perform a diffusion process to generate a set of multi-view images from a same input prompt. The set of multi-view images have a same subject from different view orientation. The neural network model comprises a self-attention layer configured to relate pixels across the set of multi-view images.
    Type: Application
    Filed: August 28, 2023
    Publication date: March 6, 2025
    Inventors: Yichun SHI, Peng WANG, Jianglong YE, Long MAI, Xiao YANG, Xiaohui SHEN
  • Publication number: 20250071390
    Abstract: The present disclosure provides a video generation method based on music beats, a video generation apparatus based on music beats, an electronic device and a computer-readable storage medium. The method includes: acquiring a plurality of video objects and audio information respectively; determining a plurality of initial music beats in the audio information and characteristic information of each initial music beat, in which the characteristic information at least includes a sound intensity of each initial music beat and time of each initial music beat in the audio information; according to the characteristic information, selecting a target music beat from the plurality of initial music beats; and generating a target video according to the target music beat and the plurality of video objects.
    Type: Application
    Filed: December 15, 2022
    Publication date: February 27, 2025
    Inventors: Weibo GONG, Xiaojie JIN, Xiaohui SHEN
  • Publication number: 20250061641
    Abstract: The present disclosure provides a method of generating an image with metallic texture, and a method of training a metallic texture image generation model. The method of generating an image with metallic texture includes: acquiring a first video; and inputting the first video into a pre-trained metallic texture image generation model to obtain a second video. Each frame of images in the second video is an image with metallic texture. The metallic texture image generation model is trained based on a plurality of first sample images and second sample images with metallic texture corresponding to each first sample image.
    Type: Application
    Filed: December 14, 2022
    Publication date: February 20, 2025
    Inventors: Yizhe ZHU, Bingchen LIU, Chunpong LAI, Xiao YANG, Xiaohui SHEN
  • Publication number: 20250054271
    Abstract: The present disclosure provides a video generation method and device. The video generation method includes: extracting a first image feature from a first image; obtaining a plurality of intermediate image features by means of nonlinear interpolation according to the first image feature and a second image feature, wherein the second image feature is an image feature of a second image; and performing image reconstruction by means of an image generation model based on the first image feature, the second image feature, and the plurality of intermediate image features, so as to generate a target video, wherein the target video is used for presenting a process of a gradual change from the first image to the second image.
    Type: Application
    Filed: December 22, 2022
    Publication date: February 13, 2025
    Inventors: Yichun SHI, Xiao YANG, Xiaohui SHEN
  • Publication number: 20250056084
    Abstract: The embodiments of the present disclosure provide a video generation method, an apparatus, an electronic device, a storage medium, a computer program product and a computer program, the method including: obtaining a plurality of video segments; determining feature information corresponding to the plurality of video segments; according to the feature information and a plurality of pre-stored rendering effects, determining an effect combination to be added; the rendering effects being animation, special effects or a transition; and generating a target video according to the plurality of video segments and the effect combination to be added.
    Type: Application
    Filed: November 18, 2022
    Publication date: February 13, 2025
    Inventors: Xiaojie JIN, Weibo GONG, Quanwei HUANG, Xiaohui SHEN
  • Publication number: 20250045929
    Abstract: Single-stage frameworks for open-vocabulary panoptic segmentation are provided.
    Type: Application
    Filed: August 3, 2023
    Publication date: February 6, 2025
    Inventors: Qihang Yu, Ju He, Xueqing Deng, Xiaohui Shen, Liang-Chieh Chen
  • Publication number: 20250014606
    Abstract: A computing system for video content creation executes a chat application to cause the processor to, in a chat conversation with a user in real-time, receive communication including a command from the user for interacting with a video content, use the large language model to analyze the command and generate a natural language response and a recommended action to implement on the video content based at least on the analyzed command, and implement the recommended action on the video content based at least on the analyzed command.
    Type: Application
    Filed: July 3, 2023
    Publication date: January 9, 2025
    Inventors: Kin Chung Wong, Fan Chen, Xiu Pei, Yujie Li, Cheng Li, Chenman Zhou, Siqi Tan, Longyin Wen, Xiaohui Shen
  • Publication number: 20240338848
    Abstract: A unified place recognition framework handles both retrieval and re-ranking with a unified transformer model. The re-ranking modules utilizes feature correlation, attention value, and x/y coordinates into account, and learns to determine whether an image pair is from a same location.
    Type: Application
    Filed: April 6, 2023
    Publication date: October 10, 2024
    Inventors: Sijie Zhu, Linjie Yang, Xiaohui Shen, Heng Wang
  • Patent number: 12008464
    Abstract: Approaches are described for determining facial landmarks in images. An input image is provided to at least one trained neural network that determines a face region (e.g., bounding box of a face) of the input image and initial facial landmark locations corresponding to the face region. The initial facial landmark locations are provided to a 3D face mapper that maps the initial facial landmark locations to a 3D face model. A set of facial landmark locations are determined from the 3D face model. The set of facial landmark locations are provided to a landmark location adjuster that adjusts positions of the set of facial landmark locations based on the input image. The input image is presented on a user device using the adjusted set of facial landmark locations.
    Type: Grant
    Filed: November 16, 2017
    Date of Patent: June 11, 2024
    Assignee: ADOBE INC.
    Inventors: Haoxiang Li, Zhe Lin, Jonathan Brandt, Xiaohui Shen
  • Publication number: 20240104810
    Abstract: Embodiments of the disclosure provide a method and a device for processing a portrait image, the method includes: acquiring a to-be-processed portrait image; inputting the to-be-processed portrait image into an image processing model, and acquiring a head smear image output by the image processing model, where the image processing model is configured to smear a hair area of a portrait located above a preset boundary in the portrait image, and the image processing model is generated by training a sample data set of a sample portrait image and a sample head smear image corresponding to the sample portrait image; rendering the head smear image with a head effect material to obtain a portrait image added with an effect; and displaying the portrait image added with the effect.
    Type: Application
    Filed: November 22, 2021
    Publication date: March 28, 2024
    Inventors: Xiao YANG, Jianwei LI, Ding LIU, Yangyue WAN, Xiaohui SHEN, Jianchao YANG
  • Patent number: 11868889
    Abstract: In implementations of object detection in images, object detectors are trained using heterogeneous training datasets. A first training dataset is used to train an image tagging network to determine an attention map of an input image for a target concept. A second training dataset is used to train a conditional detection network that accepts as conditional inputs the attention map and a word embedding of the target concept. Despite the conditional detection network being trained with a training dataset having a small number of seen classes (e.g., classes in a training dataset), it generalizes to novel, unseen classes by concept conditioning, since the target concept propagates through the conditional detection network via the conditional inputs, thus influencing classification and region proposal. Hence, classes of objects that can be detected are expanded, without the need to scale training databases to include additional classes.
    Type: Grant
    Filed: January 31, 2022
    Date of Patent: January 9, 2024
    Assignee: Adobe Inc.
    Inventors: Zhe Lin, Xiaohui Shen, Mingyang Ling, Jianming Zhang, Jason Wen Yong Kuen