Patents by Inventor Yinfei Yang

Yinfei Yang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11769011
    Abstract: The present disclosure provides a novel sentence-level representation learning method Conditional Masked Language Modeling (CMLM) for training on large scale unlabeled corpora. CMLM outperforms the previous state-of-the-art English sentence embedding models, including those trained with (semi-)supervised signals. For multilingual representations learning, it is shown that co-training CMLM with bitext retrieval and cross-lingual natural language inference (NL) fine-tuning achieves state-of-the-art performance. It is also shown that multilingual representations have the same language bias and principal component removal (PCR) can eliminate the bias by separating language identity information from semantics.
    Type: Grant
    Filed: December 18, 2020
    Date of Patent: September 26, 2023
    Assignee: GOOGLE LLC
    Inventors: Yinfei Yang, Ziyi Yang, Daniel Matthew Cer
  • Publication number: 20230081171
    Abstract: A computer-implemented method includes receiving, by a computing device, a particular textual description of a scene. The method also includes applying a neural network for text-to-image generation to generate an output image rendition of the scene, the neural network having been trained to cause two image renditions associated with a same textual description to attract each other and two image renditions associated with different textual descriptions to repel each other based on mutual information between a plurality of corresponding pairs, wherein the plurality of corresponding pairs comprise an image-to-image pair and a text-to-image pair. The method further includes predicting the output image rendition of the scene.
    Type: Application
    Filed: September 7, 2021
    Publication date: March 16, 2023
    Inventors: Han Zhang, Jing Yu Koh, Jason Michael Baldridge, Yinfei Yang, Honglak Lee
  • Publication number: 20230072293
    Abstract: A computing system for generating predicted images along a trajectory of unseen viewpoints. The system can obtain one or more spatial observations of an environment that may be captured from one or more previous camera poses. The system can generate a three-dimensional point cloud for the environment from the one or more spatial observations and the one or more previous camera poses. The system can project the three-dimensional point cloud into two-dimensional space to form one or more guidance spatial observations. The system can process the one or more guidance spatial observations with a machine-learned spatial observation prediction model to generate one or more predicted spatial observations. The system can process the one or more predicted spatial observations and image data with a machine-learned image prediction model to generate one or more predicted images from the target camera pose. The system can output the one or more predicted images.
    Type: Application
    Filed: August 23, 2021
    Publication date: March 9, 2023
    Inventors: Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Michael Baldridge, Peter James Anderson
  • Publication number: 20220198144
    Abstract: The present disclosure provides a novel sentence-level representation learning method Conditional Masked Language Modeling (CMLM) for training on large scale unlabeled corpora. CMLM outperforms the previous state-of-the-art English sentence embedding models, including those trained with (semi-)supervised signals. For multilingual representations learning, it is shown that co-training CMLM with bitext retrieval and cross-lingual NLI fine-tuning achieves state-of-the-art performance. It is also shown that multilingual representations have the same language bias and principal component removal (PCR) can eliminate the bias by separating language identity information from semantics.
    Type: Application
    Filed: December 18, 2020
    Publication date: June 23, 2022
    Inventors: Yinfei Yang, Ziyi Yang, Daniel Matthew Cer
  • Patent number: 11172122
    Abstract: Devices, systems and methods are disclosed for improving facial recognition and/or speaker recognition models by using results obtained from one model to assist in generating results from the other model. For example, a device may perform facial recognition for image data to identify users and may use the results of the facial recognition to assist in speaker recognition for corresponding audio data. Alternatively or additionally, the device may perform speaker recognition for audio data to identify users and may use the results of the speaker recognition to assist in facial recognition for corresponding image data. As a result, the device may identify users in video data that are not included in the facial recognition model and may identify users in audio data that are not included in the speaker recognition model. The facial recognition and/or speaker recognition models may be updated during run-time and/or offline using post-processed data.
    Type: Grant
    Filed: January 7, 2019
    Date of Patent: November 9, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: William Evan Welbourne, Ross David Roessler, Cheng-Hao Kuo, Jim Oommen Thomas, Paul Aksenti Savastinuk, Yinfei Yang
  • Patent number: 10582125
    Abstract: A video capture device may include multiple cameras that simultaneously capture video data. The video capture device and/or one or more remote computing resources may stitch the video data captured by the multiple cameras to generate stitched video data that corresponds to 360° video. The remote computing resources may apply one or more algorithms to the stitched video data to identify one or more frames that depict content that is likely to be of interest to a user. The video capture device and/or the remote computing resources may generate one or more images from the one or more frames, and may send the one or more images to the user.
    Type: Grant
    Filed: June 1, 2015
    Date of Patent: March 3, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Ross David Roessler, Matthew Alan Townsend, Yinfei Yang, Jim Oommen Thomas, Deon Poncini, William Evan Welbourne, Geoff Hunter Donaldson, Paul Aksenti Savastinuk, Cheng-Hao Kuo
  • Publication number: 20190313014
    Abstract: Devices, systems and methods are disclosed for improving facial recognition and/or speaker recognition models by using results obtained from one model to assist in generating results from the other model. For example, a device may perform facial recognition for image data to identify users and may use the results of the facial recognition to assist in speaker recognition for corresponding audio data. Alternatively or additionally, the device may perform speaker recognition for audio data to identify users and may use the results of the speaker recognition to assist in facial recognition for corresponding image data. As a result, the device may identify users in video data that are not included in the facial recognition model and may identify users in audio data that are not included in the speaker recognition model. The facial recognition and/or speaker recognition models may be updated during run-time and/or offline using post-processed data.
    Type: Application
    Filed: January 7, 2019
    Publication date: October 10, 2019
    Inventors: William Evan Welbourne, Ross David Roessler, Cheng-Hao Kuo, Jim Oommen Thomas, Paul Aksenti Savastinuk, Yinfei Yang
  • Patent number: 10277813
    Abstract: A viewing device, such as a virtual reality headset, allows a user to view a panoramic scene captured by one or more video capture devices that may include multiple cameras that simultaneously capture 360° video data. The viewing device may display the panoramic scene in real time and change the display in response to moving the viewing device and/or changing perspectives by switching to video data being captured by a different video capture device within the environment. Moreover, multiple video capture devices located within an environment can be used to create a three-dimensional representation of the environment that allows a user to explore the three-dimensional space while viewing the environment in real time.
    Type: Grant
    Filed: June 25, 2015
    Date of Patent: April 30, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Jim Oommen Thomas, Paul Aksenti Savastinuk, Cheng-Hao Kuo, Tsz Ho Yu, Ross David Roessler, William Evan Welbourne, Yinfei Yang
  • Patent number: 10178301
    Abstract: Devices, systems and methods are disclosed for improving facial recognition and/or speaker recognition models by using results obtained from one model to assist in generating results from the other model. For example, a device may perform facial recognition for image data to identify users and may use the results of the facial recognition to assist in speaker recognition for corresponding audio data. Alternatively or additionally, the device may perform speaker recognition for audio data to identify users and may use the results of the speaker recognition to assist in facial recognition for corresponding image data. As a result, the device may identify users in video data that are not included in the facial recognition model and may identify users in audio data that are not included in the speaker recognition model. The facial recognition and/or speaker recognition models may be updated during run-time and/or offline using post-processed data.
    Type: Grant
    Filed: June 25, 2015
    Date of Patent: January 8, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: William Evan Welbourne, Ross David Roessler, Cheng-Hao Kuo, Jim Oommen Thomas, Paul Aksenti Savastinuk, Yinfei Yang
  • Patent number: 10104286
    Abstract: Systems and methods may be directed to de-blurring panoramic images and/or video. An image processor may receive a frame, where the frame comprises a plurality of pixel values arranged in a grid. The image processor may divide the frame into a first section and a second section. The image processor may determine a first motion kernel for the first section and apply the first motion kernel to the first section. The image processor may also determine a second motion kernel for the second section and apply the second motion kernel to the second section.
    Type: Grant
    Filed: August 27, 2015
    Date of Patent: October 16, 2018
    Assignee: AMAZON TECHNOLOGIES, INC.
    Inventors: Tsz Ho Yu, Paul Aksenti Savastinuk, Yinfei Yang, Cheng-Hao Kuo, Ross David Roessler, William Evan Welbourne
  • Patent number: 10084959
    Abstract: A video capture device may include multiple cameras that simultaneously capture video data. The video capture device and/or one or more remote computing resources may stitch the video data captured by the multiple cameras to generate stitched video data that corresponds to 360° video. The remote computing resources may apply one or more algorithms to the stitched video data to adjust the color characteristics of the stitched video data, such as lighting, exposure, white balance contrast, and saturation. The remote computing resources may further smooth the transition between the video data captured by the multiple cameras to reduce artifacts such as abrupt changes in color as a result of the individual cameras of the video capture device having different video capture settings. The video capture device and/or the remote computing resources may generate a panoramic video that may include up to a 360° field of view.
    Type: Grant
    Filed: June 25, 2015
    Date of Patent: September 25, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: Tsz Ho Yu, Jim Oommen Thomas, Cheng-Hao Kuo, Yinfei Yang, Ross David Roessler, Paul Aksenti Savastinuk, William Evan Welbourne
  • Patent number: 9973711
    Abstract: Devices, systems and methods are disclosed for identifying content in video data and creating content-based zooming and panning effects to emphasize the content. Contents may be detected and analyzed in the video data using computer vision, machine learning algorithms or specified through a user interface. Panning and zooming controls may be associated with the contents, panning or zooming based on a location and size of content within the video data. The device may determine a number of pixels associated with content and may frame the content to be a certain percentage of the edited video data, such as a close-up shot where a subject is displayed as 50% of the viewing frame. The device may identify an event of interest, may determine multiple frames associated with the event of interest and may pan and zoom between the multiple frames based on a size/location of the content within the multiple frames.
    Type: Grant
    Filed: June 29, 2015
    Date of Patent: May 15, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: Yinfei Yang, William Evan Welbourne, Ross David Roessler, Paul Aksenti Savastinuk, Cheng-Hao Kuo, Jim Oommen Thomas, Tsz Ho Yu
  • Publication number: 20160381306
    Abstract: Devices, systems and methods are disclosed for identifying content in video data and creating content-based zooming and panning effects to emphasize the content. Contents may be detected and analyzed in the video data using computer vision, machine learning algorithms or specified through a user interface. Panning and zooming controls may be associated with the contents, panning or zooming based on a location and size of content within the video data. The device may determine a number of pixels associated with content and may frame the content to be a certain percentage of the edited video data, such as a close-up shot where a subject is displayed as 50% of the viewing frame. The device may identify an event of interest, may determine multiple frames associated with the event of interest and may pan and zoom between the multiple frames based on a size/location of the content within the multiple frames.
    Type: Application
    Filed: June 29, 2015
    Publication date: December 29, 2016
    Inventors: Yinfei Yang, William Evan Welbourne, Ross David Roessler, Paul Aksenti Savastinuk, Cheng-Hao Kuo, Jim Oommen Thomas, Tsz Ho Yu