Patents by Inventor Yonghui Wu

Yonghui Wu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250118291
    Abstract: Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for training an audio-processing neural network that includes at least (1) a first encoder network having a first set of encoder network parameters and (2) a decoder network having a set of decoder network parameters. The system obtains a set of un-labeled audio data segments, and generates, from the set of un-labeled audio data segments, a set of encoder training examples. The system performs training of a second encoder neural network that includes at least the first encoder neural network on the set of generated encoder training examples. The system also obtains one or more labeled training examples, and performs training of the audio-processing neural network on the labeled training examples.
    Type: Application
    Filed: January 30, 2023
    Publication date: April 10, 2025
    Inventors: Chung-Cheng CHIU, Weikeng QIN, Jiahui YU, Yonghui WU, Yu ZHANG
  • Publication number: 20250111671
    Abstract: Methods and systems for media item characterization based on multimodal embeddings are provided herein. A media item including a sequence of video frames is identified. A set of video embeddings representing visual features of the sequence of video frames is obtained. A set of audio embeddings representing audio features of the sequence of video frames is obtained. A set of audiovisual embeddings is generated based on the set of video embeddings and the set of audio embeddings. Each of the set of audiovisual embeddings represents a visual feature and an audio feature of a respective video frame of the sequence of video frames. One or more media characteristics associated with the media item are determined based on the set of audiovisual embeddings.
    Type: Application
    Filed: September 27, 2024
    Publication date: April 3, 2025
    Inventors: Tao Zhu, Jiahui Yu, Jingchen Feng, Kai Chen, Pooya Abolghasemi, Gagan Bansal, Jieren Xu, Hui Miao, Yaping Zhang, Shuchao Bi, Yonghui Wu, Claire Cui, Rohan Anil
  • Publication number: 20250111235
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training neural networks through contrastive learning. In particular, the contrastive learning is modified to use a relative margin to adjust a training pair's contribution to optimization.
    Type: Application
    Filed: September 27, 2024
    Publication date: April 3, 2025
    Inventors: Siyuan Qiao, Chenxi Liu, Jiahui Yu, Yonghui Wu
  • Publication number: 20250101866
    Abstract: The present disclosure provides a method for calculating wellbore friction resistance of a foam drainage gas production well. It is based on the conversion of the parameters of the gas-liquid two-phase flow Mukherjee & Brill model, it defines the liquid film reversal point as the zero friction resistance point, ignores the influence of the negative value of friction resistance, and uses this as a starting point to predict the change of friction resistance. The influence of liquid phase parameters and foaming agent concentration is taken into account in the friction resistance coefficient. An effective calculation method is obtained by combining the experimental data fitting and optimization. The present disclosure better characterizes the wellbore flow condition of the foam drainage gas production well, and provides important theoretical support for the prediction of the wellbore pressure drop of the foam drainage gas production well.
    Type: Application
    Filed: December 7, 2024
    Publication date: March 27, 2025
    Applicant: Southwest Petroleum University
    Inventors: Chengcheng Luo, Pengbo Wu, Yonghui Liu, Yu Shi, Yang Liu, Haibin Cai, Jianying Yang, Ziyan Wang, Zhenghao Zhang
  • Publication number: 20250095630
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.
    Type: Application
    Filed: December 2, 2024
    Publication date: March 20, 2025
    Applicant: Google LLC
    Inventors: Ye Jia, Zhifeng Chen, Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Ignacio Lopez Moreno, Fei Ren, Yu Zhang, Quan Wang, Patrick An Phu Nguyen
  • Patent number: 12254865
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer-readable media, for speech recognition using multi-dialect and multilingual models. In some implementations, audio data indicating audio characteristics of an utterance is received. Input features determined based on the audio data are provided to a speech recognition model that has been trained to output score indicating the likelihood of linguistic units for each of multiple different language or dialects. The speech recognition model can be one that has been trained using cluster adaptive training. Output that the speech recognition model generated in response to receiving the input features determined based on the audio data is received. A transcription of the utterance generated based on the output of the speech recognition model is provided.
    Type: Grant
    Filed: January 20, 2024
    Date of Patent: March 18, 2025
    Assignee: Google LLC
    Inventors: Zhifeng Chen, Bo Li, Eugene Weinstein, Yonghui Wu, Pedro J. Moreno Mengibar, Ron J. Weiss, Khe Chai Sim, Tara N. Sainath, Patrick An Phu Nguyen
  • Patent number: 12249315
    Abstract: A method for training a non-autoregressive TTS model includes obtaining a sequence representation of an encoded text sequence concatenated with a variational embedding. The method also includes using a duration model network to predict a phoneme duration for each phoneme represented by the encoded text sequence. Based on the predicted phoneme durations, the method also includes learning an interval representation and an auxiliary attention context representation. The method also includes upsampling, using the interval representation and the auxiliary attention context representation, the sequence representation into an upsampled output specifying a number of frames. The method also includes generating, based on the upsampled output, one or more predicted mel-frequency spectrogram sequences for the encoded text sequence.
    Type: Grant
    Filed: October 31, 2023
    Date of Patent: March 11, 2025
    Assignee: Google LLC
    Inventors: Isaac Elias, Byungha Chun, Jonathan Shen, Ye Jia, Yu Zhang, Yonghui Wu
  • Publication number: 20250078809
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.
    Type: Application
    Filed: November 18, 2024
    Publication date: March 6, 2025
    Inventors: Samuel Bengio, Yuxuan Wang, Zongheng Yang, Zhifeng Chen, Yonghui Wu, Ioannis Agiomyrgiannakis, Ron J. Weiss, Navdeep Jaitly, Ryan M. Rifkin, Robert Andrew James Clark, Quoc V. Le, Russell J. Ryan, Ying Xiao
  • Publication number: 20250053444
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributing machine learning workloads, e.g., computations for training a neural network or computing an inference using a neural network, across multiple hardware accelerators.
    Type: Application
    Filed: August 23, 2024
    Publication date: February 13, 2025
    Inventors: Jeffrey Adgate Dean, Sudip Roy, Michael Acheson Isard, Aakanksha Chowdhery, Brennan Saeta, Chandramohan Amyangot Thekkath, Daniel William Hurt, Hyeontaek Lim, Laurent El Shafey, Parker Edward Schuh, Paul Ronald Barham, Ruoming Pang, Ryan Sepassi, Sanjay Ghemawat, Yonghui Wu
  • Patent number: 12222994
    Abstract: A quick application startup method and a related apparatus are provided. The method includes: An electronic device requests an acceleration script of one or more quick applications from an application server. A first operation for a target quick application is detected. In response to the first operation, the electronic device requests an application package of the target quick application from the application server. An acceleration script of the target quick application is included in the acceleration script of the one or more quick applications. In response to the first operation, the electronic device runs the acceleration script of the target quick application to obtain a first URL, and obtains first data based on the first URL. The electronic device may generate and display a first screen of the target quick application based on the first data.
    Type: Grant
    Filed: August 29, 2020
    Date of Patent: February 11, 2025
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Litao Yu, Yonghui Wu, Fei Sun, Guoqiang Li
  • Publication number: 20250021889
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for performing machine learning tasks. One method includes receiving (i) a model input, and (ii) data identifying a first machine learning task to be performed on the model input to generate a first type of model output for the model input; augmenting the model input with an identifier for the first machine learning task to generate an augmented model input; and processing the augmented model input using a machine learning model, wherein the machine learning model has been trained on training data to perform a plurality of machine learning tasks including the first machine learning task, and wherein the machine learning model has been configured through training to process the augmented model input to generate a machine learning model output of the first type for the model input.
    Type: Application
    Filed: September 26, 2024
    Publication date: January 16, 2025
    Inventors: Zhifeng Chen, Michael Schuster, Melvin Jose Johnson Premkumar, Yonghui Wu, Quoc V. Le, Maxim Krikun, Thorsten Brants
  • Patent number: 12190860
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.
    Type: Grant
    Filed: November 21, 2023
    Date of Patent: January 7, 2025
    Assignee: Google LLC
    Inventors: Samuel Bengio, Yuxuan Wang, Zongheng Yang, Zhifeng Chen, Yonghui Wu, Ioannis Agiomyrgiannakis, Ron J. Weiss, Navdeep Jaitly, Ryan M. Rifkin, Robert Andrew James Clark, Quoc V. Le, Russell J. Ryan, Ying Xiao
  • Patent number: 12175963
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.
    Type: Grant
    Filed: November 30, 2023
    Date of Patent: December 24, 2024
    Assignee: Google LLC
    Inventors: Ye Jia, Zhifeng Chen, Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Ignacio Lopez Moreno, Fei Ren, Yu Zhang, Quan Wang, Patrick An Phu Nguyen
  • Publication number: 20240420686
    Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.
    Type: Application
    Filed: August 26, 2024
    Publication date: December 19, 2024
    Applicant: Google LLC
    Inventors: Rohit Prakash Prabhavalkar, Zhifeng Chen, Bo Li, Chung-Cheng Chiu, Kanury Kanishka Rao, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Michiel A. U. Bacchiani, Tara N. Sainath, Jan Kazimierz Chorowski, Anjuli Patricia Kannan, Ekaterina Gonina, Patrick An Phu Nguyen
  • Patent number: 12170667
    Abstract: A network device for providing a LAN GUI to a client device. The network device receives a request for access by the client device to the LAN GUI. The network device analyzes a LAN GUI access whitelist and determines whether the client device is in the LAN GUI access whitelist. The client device is granted access to the LAN GUI without receiving a password from the client device when the client device is determined to be in the LAN GUI access whitelist. An address entry page may be presented to add the MAC address of the client device to the LAN GUI access whitelist and a password page may be presented to display the LAN GUI password. When the client device is not in the LAN GUI access list, a login page is presented for entering the password to obtain access to the LAN GUI.
    Type: Grant
    Filed: July 21, 2020
    Date of Patent: December 17, 2024
    Assignee: ARRIS ENTERPRISES LLC
    Inventor: Yonghui Wu
  • Publication number: 20240404506
    Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.
    Type: Application
    Filed: August 8, 2024
    Publication date: December 5, 2024
    Applicant: Google LLC
    Inventors: Yu Zhang, Ron J. Weiss, Byungha Chun, Yonghui Wu, Zhifeng Chen, Russell John Wyatt Skerry-Ryan, Ye Jia, Andrew M. Rosenberg, Bhuvana Ramabhadran
  • Publication number: 20240404238
    Abstract: Systems and methods are provided for vector-quantized image modeling using vision transformers and improved codebook handling. In particular, the present disclosure provides a Vector-quantized Image Modeling (VIM) approach that involves pre-training a machine learning model (e.g., Transformer model) to predict rasterized image tokens autoregressively. The discrete image tokens can be encoded from a learned Vision-Transformer-based VQGAN (example implementations of which can be referred to as ViT-VQGAN). The present disclosure proposes multiple improvements over vanilla VQGAN from architecture to codebook learning, yielding better efficiency and reconstruction fidelity. The improved ViT-VQGAN further improves vector-quantized image modeling tasks, including unconditional image generation, conditioned image generation (e.g., class-conditioned image generation), and unsupervised representation learning.
    Type: Application
    Filed: October 5, 2022
    Publication date: December 5, 2024
    Inventors: Jiahui Yu, Vijay Vasudevan, Alexander Yeong-Shiuh Ku, Yonghui Wu, Jason Michael Baldridge, Yuanzhong Xu, Jing Yu Koh, Thang Minh Luong, Gunjan Baid, Zirui Wang, Han Zhang, Xin Li
  • Patent number: 12148444
    Abstract: Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.
    Type: Grant
    Filed: April 5, 2021
    Date of Patent: November 19, 2024
    Assignee: Google LLC
    Inventors: Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Michael Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Russell John Wyatt Skerry-Ryan, Ryan M. Rifkin, Ioannis Agiomyrgiannakis
  • Publication number: 20240378441
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network to perform any one or more of a variety of machine learning tasks. For example, the neural network can be configured as a generative neural network, e.g., an autoregressive generative neural network.
    Type: Application
    Filed: May 10, 2024
    Publication date: November 14, 2024
    Inventors: Slav Petrov, Yonghui Wu, Andrew M. Dai, David Richard So, Dmitry Lepikhin, Erica Ann Moreira, Gaurav Mishra, Jonathan Hudson Clark, Maxim Krikun, Melvin Jose Johnson Premkumar, Nan Du, Orhan Firat, Rohan Anil, Siamak Shakeri, Xavier Garcia, Yanping Huang, Yong Cheng, Yuanzhong Xu, Yujing Zhang, Zachary Alexander Nado, Eric Jun Jie Ni, Kefan Xiao, Vladimir Feinberg, Jin Young Sohn, Aurko Roy
  • Publication number: 20240378427
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network to perform any one or more of a variety of machine learning tasks. For example, the neural network can be configured as a generative neural network, e.g., an autoregressive generative neural network.
    Type: Application
    Filed: May 10, 2024
    Publication date: November 14, 2024
    Inventors: Slav Petrov, Yonghui Wu, Andrew M. Dai, David Richard So, Dmitry Lepikhin, Erica Ann Moreira, Gaurav Mishra, Jonathan Hudson Clark, Maxim Krikun, Melvin Jose Johnson Premkumar, Nan Du, Orhan Firat, Rohan Anil, Siamak Shakeri, Xavier Garcia, Yanping Huang, Yong Cheng, Yuanzhong Xu, Yujing Zhang, Zachary Alexander Nado, Eric Jun Jie Ni, Kefan Xiao, Vladimir Feinberg, Jin Young Sohn, Aurko Roy