Patents by Inventor Xuehan Xiong

Xuehan Xiong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Video Processing Models with Streaming Feature Bank

Publication number: 20250259465

Abstract: One example aspect of the present disclosure is directed to a streaming model for video processing tasks, such as, for example, dense video captioning. Thanks to a memory mechanism, the proposed streaming model does not require access to all input frames concurrently in order to process the video. Moreover, thanks to a new streaming decoding algorithm, the proposed model can produce outputs causally without processing the entire input sequence. The streaming model is inherently suited to processing long videos—as it ingests frames sequentially (e.g., one at a time or in small batches). Moreover, as the output is streamed, intermediate predictions can be produced before processing the full video. This property means that the streaming model can be applied to process live video streams, as required for applications such as video conferencing, security and continuous monitoring among others.

Type: Application

Filed: February 12, 2025

Publication date: August 14, 2025

Inventors: Xingyi Zhou, Anurag Arnab, Shyamal Deep Buch, Shen Yan, Austin Oliver Myers, Xuehan Xiong, Arsha Nagrani, Cordelia Luise Schmid
Pose empowered RGB-flow net

Patent number: 12354304

Abstract: A method includes receiving video data that includes a series of frames of image data. Here, the video data is representative of an actor performing an activity. The method also includes processing the video data to generate a spatial input stream including a series of spatial images representative of spatial features of the actor performing the activity, a temporal input stream representative of motion of the actor performing the activity, and a pose input stream including a series of images representative of a pose of the actor performing the activity. Using at least one neural network, the method also includes processing the temporal input stream, the spatial input stream, and the pose input stream. The method also includes classifying, by the at least one neural network, the activity based on the temporal input stream, the spatial input stream, and the pose input stream.

Type: Grant

Filed: September 11, 2023

Date of Patent: July 8, 2025

Assignee: GOOGLE LLC

Inventors: Yinxiao Li, Zhichao Lu, Xuehan Xiong, Jonathan Huang
METHODS AND SYSTEMS FOR SHORT FORM PREVIEWS OF LONG FORM MEDIA ITEMS

Publication number: 20250054306

Abstract: Aspects of the disclosure are directed to methods and systems for short form previews of long form media items. A server can provide, to an artificial intelligence (AI) model, a long form media item to be shared with users. The server can receive, from the AI model, one or more frames that are predicted to contain content that is of interest to the users. The server can extract a segment of the long form media item that corresponds to the one or more frames, where the extracted segment corresponds to a short form media item preview. The short form media item preview can be provided for presentation to the users.

Type: Application

Filed: August 7, 2024

Publication date: February 13, 2025

Inventors: Daniel S. Cohen, Christopher R. Conover, Emily Rose Smith, Anoop Menon, Benjamin Lehn, Sudheendra Vijayanarasimhan, Bo Hu, Shen Yan, Xuehan Xiong, David Alexander Ross
Virtual object machine learning

Patent number: 12169765

Abstract: A machine learning scheme can be trained on a set of labeled training images of a subject in different poses, with different textures, and with different background environments. The label or marker data of the subject may be stored as metadata to a 3D model of the subject or rendered images of the subject. The machine learning scheme may be implemented as a supervised learning scheme that can automatically identify the labeled data to create a classification model. The classification model can classify a depicted subject in many different environments and arrangements (e.g., poses).

Type: Grant

Filed: September 8, 2023

Date of Patent: December 17, 2024

Assignee: Snap Inc.

Inventors: Xuehan Xiong, Zehao Xue
Modulated image segmentation

Patent number: 12159215

Abstract: A modulated segmentation system can use a modulator network to emphasize spatial prior data of an object to track the object across multiple images. The modulated segmentation system can use a segmentation network that receives spatial prior data as intermediate data that improves segmentation accuracy. The segmentation network can further receive visual guide information from a visual guide network to increase tracking accuracy via segmentation.

Type: Grant

Filed: October 18, 2023

Date of Patent: December 3, 2024

Assignee: Snap Inc.

Inventors: Linjie Yang, Jianchao Yang, Xuehan Xiong, Yanran Wang
GENERATING AN IMAGE MASK USING MACHINE LEARNING

Publication number: 20240372963

Abstract: A machine learning system can generate an image mask (e.g., a pixel mask) comprising pixel assignments for pixels. The pixels can be assigned to classes, including, for example, face, clothes, body skin, or hair. The machine learning system can be implemented using a convolutional neural network that is configured to execute efficiently on computing devices having limited resources, such as mobile phones. The pixel mask can be used to more accurately display video effects interacting with a user or subject depicted in the image.

Type: Application

Filed: July 15, 2024

Publication date: November 7, 2024

Inventors: Lidiia Bogdanovych, William Brendel, Samuel Edward Hare, Fedir Paliakov, Guohui Wang, Xuehan Xiong, Jianchao Yang, Linjie Yang
VIDEO LOCALIZATION USING ARTIFICIAL INTELLIGENCE

Publication number: 20240371164

Abstract: Methods and systems for video localization using artificial intelligence are provided herein. A set of video embeddings representing features of one or more video frames of a media it em and a set of textual embeddings corresponding to an event associated with the media item are obtained. Fused video-textual data is generated based on the set of video embeddings and the set of textual embeddings. The fused video-textual data indicates features of the video frames of the media item and textual data pertaining to the media item. The fused video-textual data is provided as an input to an artificial intelligence (AI) model trained to perform multiple video localization tasks with respect to media items of a platform. One or move outputs of the AI model are obtained. A segment of the media item that depicts the event is determined based on the one or move outputs of the AI model.

Type: Application

Filed: May 1, 2024

Publication date: November 7, 2024

Inventors: Shen Yan, Xuehan Xiong, Arsha Nagrani, Anurag Arnab, David Alexander Ross, Cordelia Schmid
ACTION LOCALIZATION IN VIDEOS USING LEARNED QUERIES

Publication number: 20240346824

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing action localization on an input video. In particular, a system maintains a set of query vectors and uses the input video and the set of query vectors to generate an action localization output for the input video. The action localization output includes, for each of one or more agents depicted in the video, data specifying, for each of one or more video frames in the video, a respective bounding box in the video frame that depicts the agent and a respective action from a set of actions that is being performed by the agent in the video frame.

Type: Application

Filed: April 12, 2024

Publication date: October 17, 2024

Inventors: Alexey Alexeevich Gritsenko, Xuehan Xiong, Josip Djolonga, Mostafa Dehghani, Chen Sun, Mario Lucic, Cordelia Luise Schmid, Anurag Arnab
Generating an image mask using machine learning

Patent number: 12075190

Abstract: A machine learning system can generate an image mask (e.g., a pixel mask) comprising pixel assignments for pixels. The pixels can be assigned to classes, including, for example, face, clothes, body skin, or hair. The machine learning system can be implemented using a convolutional neural network that is configured to execute efficiently on computing devices having limited resources, such as mobile phones. The pixel mask can be used to more accurately display video effects interacting with a user or subject depicted in the image.

Type: Grant

Filed: July 13, 2023

Date of Patent: August 27, 2024

Assignee: Snap Inc.

Inventors: Lidiia Bogdanovych, William Brendel, Samuel Edward Hare, Fedir Poliakov, Guohui Wang, Xuehan Xiong, Jianchao Yang, Linjie Yang
REAL-TIME TRACKING-COMPENSATED IMAGE EFFECTS

Publication number: 20240249522

Abstract: A mobile device can generate real-time complex visual image effects using asynchronous processing pipeline. A first pipeline applies a complex image process, such as a neural network, to keyframes of a live image sequence. A second pipeline generates flow maps that describe feature transformations in the image sequence. The flow maps can be used to process non-keyframes on the fly. The processed keyframes and non-keyframes can be used to display a complex visual effect on the mobile device in real-time or near real-time.

Type: Application

Filed: April 2, 2024

Publication date: July 25, 2024

Inventors: Samuel Edward Hare, Fedir Poliakov, Guohui Wang, Xuehan Xiong, Jianchao Yang, Linjie Yang, Shah Tanmay Anilkumar
Real-time tracking-compensated image effects

Patent number: 11989938

Abstract: A mobile device can generate real-time complex visual image effects using asynchronous processing pipeline. A first pipeline applies a complex image process, such as a neural network, to keyframes of a live image sequence. A second pipeline generates flow maps that describe feature transformations in the image sequence. The flow maps can be used to process non-keyframes on the fly. The processed keyframes and non-keyframes can be used to display a complex visual effect on the mobile device in real-time or near real-time.

Type: Grant

Filed: May 4, 2023

Date of Patent: May 21, 2024

Assignee: Snap Inc.

Inventors: Samuel Edward Hare, Fedir Poliakov, Guohui Wang, Xuehan Xiong, Jianchao Yang, Linjie Yang, Shah Tanmay Anilkumar
MODULATED IMAGE SEGMENTATION

Publication number: 20240046072

Abstract: A modulated segmentation system can use a modulator network to emphasize spatial prior data of an object to track the object across multiple images. The modulated segmentation system can use a segmentation network that receives spatial prior data as intermediate data that improves segmentation accuracy. The segmentation network can further receive visual guide information from a visual guide network to increase tracking accuracy via segmentation.

Type: Application

Filed: October 18, 2023

Publication date: February 8, 2024

Inventors: Linjie Yang, Jianchao Yang, Xuehan Xiong, Yanran Wang
Pose Empowered RGB-Flow Net

Publication number: 20230419538

Abstract: A method includes receiving video data that includes a series of frames of image data. Here, the video data is representative of an actor performing an activity. The method also includes processing the video data to generate a spatial input stream including a series of spatial images representative of spatial features of the actor performing the activity, a temporal input stream representative of motion of the actor performing the activity, and a pose input stream including a series of images representative of a pose of the actor performing the activity. Using at least one neural network, the method also includes processing the temporal input stream, the spatial input stream, and the pose input stream. The method also includes classifying, by the at least one neural network, the activity based on the temporal input stream, the spatial input stream, and the pose input stream.

Type: Application

Filed: September 11, 2023

Publication date: December 28, 2023

Applicant: Google LLC

Inventors: Yinxiao Li, Zhichao Lu, Xuehan Xiong, Jonathan Huang
VIRTUAL OBJECT MACHINE LEARNING

Publication number: 20230419188

Abstract: A machine learning scheme can be trained on a set of labeled training images of a subject in different poses, with different textures, and with different background environments. The label or marker data of the subject may be stored as metadata to a 3D model of the subject or rendered images of the subject. The machine learning scheme may be implemented as a supervised learning scheme that can automatically identify the labeled data to create a classification model. The classification model can classify a depicted subject in many different environments and arrangements (e.g., poses).

Type: Application

Filed: September 8, 2023

Publication date: December 28, 2023

Inventors: Xuehan Xiong, Zehao Xue
Modulated image segmentation

Patent number: 11847528

Abstract: A modulated segmentation system can use a modulator network to emphasize spatial prior data of an object to track the object across multiple images. The modulated segmentation system can use a segmentation network that receives spatial prior data as intermediate data that improves segmentation accuracy. The segmentation network can further receive visual guide information from a visual guide network to increase tracking accuracy via segmentation.

Type: Grant

Filed: December 29, 2022

Date of Patent: December 19, 2023

Assignee: Snap Inc.

Inventors: Linjie Yang, Jianchao Yang, Xuehan Xiong, Yanran Wang
GENERATING AN IMAGE MASK USING MACHINE LEARNING

Publication number: 20230362331

Abstract: A machine learning system can generate an image mask (e.g., a pixel mask) comprising pixel assignments for pixels. The pixels can be assigned to classes, including, for example, face, clothes, body skin, or hair. The machine learning system can be implemented using a convolutional neural network that is configured to execute efficiently on computing devices having limited resources, such as mobile phones. The pixel mask can be used to more accurately display video effects interacting with a user or subject depicted in the image.

Type: Application

Filed: July 13, 2023

Publication date: November 9, 2023

Inventors: Lidiia Bogdanovych, William Brendel, Samuel Edward Hare, Fedir Poliakov, Guohui Wang, Xuehan Xiong, Jianchao Yang, Linjie Yang
Virtual object machine learning

Patent number: 11790276

Abstract: A machine learning scheme can be trained on a set of labeled training images of a subject in different poses, with different textures, and with different background environments. The label or marker data of the subject may be stored as metadata to a 3D model of the subject or rendered images of the subject. The machine learning scheme may be implemented as a supervised learning scheme that can automatically identify the labeled data to create a classification model. The classification model can classify a depicted subject in many different environments and arrangements (e.g., poses).

Type: Grant

Filed: May 17, 2021

Date of Patent: October 17, 2023

Assignee: Snap Inc.

Inventors: Xuehan Xiong, Zehao Xue
Pose empowered RGB-flow net

Patent number: 11776156

Abstract: A method includes receiving video data that includes a series of frames of image data. Here, the video data is representative of an actor performing an activity. The method also includes processing the video data to generate a spatial input stream including a series of spatial images representative of spatial features of the actor performing the activity, a temporal input stream representative of motion of the actor performing the activity, and a pose input stream including a series of images representative of a pose of the actor performing the activity. Using at least one neural network, the method also includes processing the temporal input stream, the spatial input stream, and the pose input stream. The method also includes classifying, by the at least one neural network, the activity based on the temporal input stream, the spatial input stream, and the pose input stream.

Type: Grant

Filed: June 11, 2021

Date of Patent: October 3, 2023

Assignee: Google LLC

Inventors: Yinxiao Li, Zhichao Lu, Xuehan Xiong, Jonathan Huang
REAL-TIME TRACKING-COMPENSATED IMAGE EFFECTS

Publication number: 20230274543

Abstract: A mobile device can generate real-time complex visual image effects using asynchronous processing pipeline. A first pipeline applies a complex image process, such as a neural network, to keyframes of a live image sequence. A second pipeline generates flow maps that describe feature transformations in the image sequence. The flow maps can be used to process non-keyframes on the fly. The processed keyframes and non-keyframes can be used to display a complex visual effect on the mobile device in real-time or near real-time.

Type: Application

Filed: May 4, 2023

Publication date: August 31, 2023

Inventors: Samuel Edward Hare, Fedir Poliakov, Guohui Wang, Xuehan Xiong, Jianchao Yang, Linjie Yang, Shah Tanmay Anilkumar
Generating an image mask using machine learning

Patent number: 11743426

Abstract: A machine learning system can generate an image mask (e.g., a pixel mask) comprising pixel assignments for pixels. The pixels can he assigned to classes, including, for example, face, clothes, body skin, or hair. The machine learning system can be implemented. using a convolutional neural network that is configured to execute efficiently on computing devices having limited resources, such as mobile phones. The pixel mask can be used to more accurately display video effects interacting with a user or subject depicted in the image.

Type: Grant

Filed: August 13, 2020

Date of Patent: August 29, 2023

Assignee: Snap Inc.

Inventors: Lidiia Bogdanovych, William Brendel, Samuel Edward Hare, Fedir Poliakov, Guohui Wang, Xuehan Xiong, Jianchao Yang, Linjie Yang

1 2 3 next