Patents by Inventor Yinxiao Li

Yinxiao Li has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11961298
    Abstract: Systems and methods for detecting objects in a video are provided. A method can include inputting a video comprising a plurality of frames into an interleaved object detection model comprising a plurality of feature extractor networks and a shared memory layer. For each of one or more frames, the operations can include selecting one of the plurality of feature extractor networks to analyze the one or more frames, analyzing the one or more frames by the selected feature extractor network to determine one or more features of the one or more frames, determining an updated set of features based at least in part on the one or more features and one or more previously extracted features extracted from a previous frame stored in the shared memory layer, and detecting an object in the one or more frames based at least in part on the updated set of features.
    Type: Grant
    Filed: February 22, 2019
    Date of Patent: April 16, 2024
    Assignee: GOOGLE LLC
    Inventors: Menglong Zhu, Mason Liu, Marie Charisse White, Dmitry Kalenichenko, Yinxiao Li
  • Publication number: 20240022760
    Abstract: Example aspects of the present disclosure are directed to systems and methods which feature a machine-learned video super-resolution (VSR) model which has been trained using a bi-directional training approach. In particular, the present disclosure provides a compression-informed (e.g., compression-aware) super-resolution model that can perform well on real-world videos with different levels of compression. Specifically, example models described herein can include three modules to robustly restore the missing information caused by video compression. First, a bi-directional recurrent module can be used to reduce the accumulated warping error from the random locations of the intra-frame from compressed video frames. Second, a detail-aware flow estimation module can be added to enable recovery of high resolution (HR) flow from compressed low resolution (LR) frames. Finally, a Laplacian enhancement module can add high-frequency information to the warped HR frames washed out by video encoding.
    Type: Application
    Filed: August 5, 2021
    Publication date: January 18, 2024
    Inventors: Yinxiao Li, Peyman Milanfar, Feng Yang, Ce Liu, Ming-Hsuan Yang, Pengchong Jin
  • Publication number: 20240020788
    Abstract: Systems and methods of the present disclosure are directed to a computing system. The computing system can obtain a message vector and video data comprising a plurality of video frames. The computing system can process the input video with a transformation portion of a machine-learned watermark encoding model to obtain a three-dimensional feature encoding of the input video. The computing system can process the three-dimensional feature encoding of the input video and the message vector with an embedding portion of the machine-learned watermark encoding model to obtain spatial-temporal watermark encoding data descriptive of the message vector. The computing system can generate encoded video data comprising a plurality of encoded video frames, wherein at least one of the plurality of encoded video frames includes the spatial-temporal watermark encoding data.
    Type: Application
    Filed: March 24, 2021
    Publication date: January 18, 2024
    Inventors: Xiyang Luo, Feng Yang, Ce Liu, Huiwen Chang, Peyman Milanfar, Yinxiao Li
  • Publication number: 20230419538
    Abstract: A method includes receiving video data that includes a series of frames of image data. Here, the video data is representative of an actor performing an activity. The method also includes processing the video data to generate a spatial input stream including a series of spatial images representative of spatial features of the actor performing the activity, a temporal input stream representative of motion of the actor performing the activity, and a pose input stream including a series of images representative of a pose of the actor performing the activity. Using at least one neural network, the method also includes processing the temporal input stream, the spatial input stream, and the pose input stream. The method also includes classifying, by the at least one neural network, the activity based on the temporal input stream, the spatial input stream, and the pose input stream.
    Type: Application
    Filed: September 11, 2023
    Publication date: December 28, 2023
    Applicant: Google LLC
    Inventors: Yinxiao Li, Zhichao Lu, Xuehan Xiong, Jonathan Huang
  • Patent number: 11776156
    Abstract: A method includes receiving video data that includes a series of frames of image data. Here, the video data is representative of an actor performing an activity. The method also includes processing the video data to generate a spatial input stream including a series of spatial images representative of spatial features of the actor performing the activity, a temporal input stream representative of motion of the actor performing the activity, and a pose input stream including a series of images representative of a pose of the actor performing the activity. Using at least one neural network, the method also includes processing the temporal input stream, the spatial input stream, and the pose input stream. The method also includes classifying, by the at least one neural network, the activity based on the temporal input stream, the spatial input stream, and the pose input stream.
    Type: Grant
    Filed: June 11, 2021
    Date of Patent: October 3, 2023
    Assignee: Google LLC
    Inventors: Yinxiao Li, Zhichao Lu, Xuehan Xiong, Jonathan Huang
  • Publication number: 20220189170
    Abstract: Systems and methods for detecting objects in a video are provided. A method can include inputting a video comprising a plurality of frames into an interleaved object detection model comprising a plurality of feature extractor networks and a shared memory layer. For each of one or more frames, the operations can include selecting one of the plurality of feature extractor networks to analyze the one or more frames, analyzing the one or more frames by the selected feature extractor network to determine one or more features of the one or more frames, determining an updated set of features based at least in part on the one or more features and one or more previously extracted features extracted from a previous frame stored in the shared memory layer, and detecting an object in the one or more frames based at least in part on the updated set of features.
    Type: Application
    Filed: February 22, 2019
    Publication date: June 16, 2022
    Inventors: Menglong Zhu, Mason Liu, Marie Charisse White, Dmitry Kalenichenko, Yinxiao Li
  • Publication number: 20210390733
    Abstract: A method includes receiving video data that includes a series of frames of image data. Here, the video data is representative of an actor performing an activity. The method also includes processing the video data to generate a spatial input stream including a series of spatial images representative of spatial features of the actor performing the activity, a temporal input stream representative of motion of the actor performing the activity, and a pose input stream including a series of images representative of a pose of the actor performing the activity. Using at least one neural network, the method also includes processing the temporal input stream, the spatial input stream, and the pose input stream. The method also includes classifying, by the at least one neural network, the activity based on the temporal input stream, the spatial input stream, and the pose input stream.
    Type: Application
    Filed: June 11, 2021
    Publication date: December 16, 2021
    Applicant: Google LLC
    Inventors: Yinxiao Li, Zhichao Lu, Xuehan Xiong, Jonathan Huang