Patents by Inventor Weiran Wang

Weiran Wang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250118292
    Abstract: A method includes obtaining labeled training data including a plurality of spoken terms spoken during a conversation. For each respective spoken term, the method includes generating a corresponding sequence of intermediate audio encodings from a corresponding sequence of acoustic frames, generating a corresponding sequence of final audio encodings from the corresponding sequence of intermediate audio encodings, generating a corresponding speech recognition result, and generating a respective speaker token representing a predicted identity of a speaker for each corresponding speech recognition result. The method also includes training the joint speech recognition and speaker diarization model jointly based on a first loss derived from the generated speech recognition results and the corresponding transcriptions and a second loss derived from the generated speaker tokens and the corresponding speaker labels.
    Type: Application
    Filed: September 20, 2024
    Publication date: April 10, 2025
    Applicant: Google LLC
    Inventors: Yiling Huang, Weiran Wang, Quan Wang, Guanlong Zhao, Hank Liao, Han Lu
  • Publication number: 20250078815
    Abstract: A method includes obtaining a plurality of training samples that each include a respective speech utterance and a respective textual utterance representing a transcription of the respective speech utterance. The method also includes fine-tuning, using quantization and sparsity aware training with native integer operations, a pre-trained automatic speech recognition (ASR) model on the plurality of training samples. Here, the pre-trained ASR model includes a plurality of weights and the fine-tuning includes pruning one or more weights of the plurality of weights using a sparsity mask and quantizing each weight of the plurality of weights based on an integer with a fixed-bit width. The method also includes providing the fine-tuned ASR model to a user device.
    Type: Application
    Filed: September 5, 2024
    Publication date: March 6, 2025
    Applicant: Google LLC
    Inventors: Shaojin Ding, David Qiu, David Rim, Amir Yazdanbakhsh, Yanzhang He, Zhonglin Han, Rohit Prakash Prabhavalkar, Weiran Wang, Bo Li, Jian Li, Tara N. Sainath, Shivani Agrawal, Oleg Rybakov
  • Patent number: 12198060
    Abstract: Embodiments described herein combine both masked reconstruction and predictive coding. Specifically, unlike contrastive learning, the mutual information between past states and future states are directly estimated. The context information can also be directly captured via shifted masked reconstruction—unlike standard masked reconstruction, the target reconstructed observations are shifted slightly towards the future to incorporate more predictability. The estimated mutual information and shifted masked reconstruction loss can then be combined as the loss function to update the neural model.
    Type: Grant
    Filed: August 28, 2020
    Date of Patent: January 14, 2025
    Assignee: Salesforce, Inc.
    Inventors: Junwen Bai, Weiran Wang, Yingbo Zhou, Caiming Xiong
  • Patent number: 12190869
    Abstract: A computer-implemented method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. Here, the ASR model includes a causal encoder and a decoder. The method also includes generating, by the causal encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by the decoder, a first probability distribution over possible speech recognition hypotheses. Here, the causal encoder includes a stack of causal encoder layers each including a Recurrent Neural Network (RNN) Attention-Performer module that applies linear attention.
    Type: Grant
    Filed: September 29, 2022
    Date of Patent: January 7, 2025
    Assignee: Google LLC
    Inventors: Tara N. Sainath, Rami Botros, Anmol Gulati, Krzysztof Choromanski, Ruoming Pang, Trevor Strohman, Weiran Wang, Jiahui Yu
  • Patent number: 12106022
    Abstract: The present disclosure provides a method and device for simulating a dynamic digital twin model of dominant operation of a wind turbine generator assembly, by which conventional operational parameters of the wind turbine generator assembly that are acquired in real time are preprocessed to obtain steady-state operational parameters of the wind turbine generator assembly. A pneumatic subsystem-related data black box model, a transmission subsystem model, a tower subsystem model, and an electrical subsystem model are simulated individually using the steady-state operational parameters, and then combined to form a dynamic dominant-operation simulation model for simulating an operation process of the wind turbine generator assembly. Meanwhile, a dynamic deviation compensation model is constructed on the basis of the dynamic dominant-operation simulation model.
    Type: Grant
    Filed: December 28, 2023
    Date of Patent: October 1, 2024
    Assignee: NORTH CHINA ELECTRIC POWER UNIVERSITY
    Inventors: Yang Hu, Fang Fang, Weiran Wang, Jizhen Liu
  • Publication number: 20240304181
    Abstract: A method includes receiving a plurality of training samples spanning multiple different domains. Each corresponding training sample includes audio data characterizing an utterance paired with a corresponding transcription of the utterance. The method also includes re-labeling each corresponding training sample of the plurality of training samples by annotating the corresponding transcription of the utterance with one or more speaker tags. Each speaker tag indicates a respective segment of the transcription for speech that was spoken by a particular type of speaker. The method also includes training a multi-domain speech recognition model on the re-labeled training samples to teach the multi-domain speech recognition model to learn to share parameters for recognizing speech across each of the different multiple different domains.
    Type: Application
    Filed: March 7, 2024
    Publication date: September 12, 2024
    Applicant: Google LLC
    Inventors: Guru Prakash Arumugam, Shuo-yiin Chang, Shaan Jagdeep Patrick Bijwadia, Weiran Wang, Quan Wang, Rohit Prakash Prabhavalkar, Tara N. Sainath
  • Publication number: 20240296840
    Abstract: A joint auxiliary task and ASR model includes an encoder to receive a sequence of acoustic frames and generate, at each of a plurality of output steps, a higher-order feature representation for a corresponding acoustic frame. The model also includes a multi-output HAT decoder to generate at each of the plurality of output steps a probability distribution over possible speech recognition hypotheses, and an indication of whether the output step corresponds to an auxiliary token associated with a particular auxiliary task. The model is trained by a JEIT training process based on: a paired training data set including paired audio data and transcriptions, the transcriptions annotated with ground-truth auxiliary tokens associated with the particular auxiliary task; and an unpaired training data set including textual utterances not paired with any corresponding audio data, the textual utterances annotated with the ground-truth auxiliary tokens associated with the particular auxiliary task.
    Type: Application
    Filed: March 1, 2024
    Publication date: September 5, 2024
    Applicant: Google LLC
    Inventors: Shaan Jagdeep Patrick Bijwadia, Shuo-yiin Chang, Tara N. Sainath, Weiran Wang, Zhong Meng
  • Publication number: 20240265168
    Abstract: The present disclosure provides a method and device for simulating a dynamic digital twin model of dominant operation of a wind turbine generator assembly, by which conventional operational parameters of the wind turbine generator assembly that are acquired in real time are preprocessed to obtain steady-state operational parameters of the wind turbine generator assembly. A pneumatic subsystem-related data black box model, a transmission subsystem model, a tower subsystem model, and an electrical subsystem model are simulated individually using the steady-state operational parameters, and then combined to form a dynamic dominant-operation simulation model for simulating an operation process of the wind turbine generator assembly. Meanwhile, a dynamic deviation compensation model is constructed on the basis of the dynamic dominant-operation simulation model.
    Type: Application
    Filed: December 28, 2023
    Publication date: August 8, 2024
    Inventors: Yang Hu, Fang Fang, Weiran Wang, Jizhen Liu
  • Publication number: 20240153495
    Abstract: A method includes receiving a training dataset that includes one or more spoken training utterances for training an automatic speech recognition (ASR) model. Each spoken training utterance in the training dataset paired with a corresponding transcription and a corresponding target sequence of auxiliary tokens. For each spoken training utterance, the method includes generating a speech recognition hypothesis for a corresponding spoken training utterance, determining a speech recognition loss based on the speech recognition hypothesis and the corresponding transcription, generating a predicted auxiliary token for the corresponding spoken training utterance, and determining an auxiliary task loss based on the predicted auxiliary token and the corresponding target sequence of auxiliary tokens. The method also includes the ASR model jointly on the speech recognition loss and the auxiliary task loss determined for each spoken training utterance.
    Type: Application
    Filed: October 26, 2023
    Publication date: May 9, 2024
    Applicant: Google LLC
    Inventors: Weiran Wang, Ding Zhao, Shaojin Ding, Hao Zhang, Shuo-yiin Chang, David Johannes Rybach, Tara N. Sainath, Yanzhang He, Ian McGraw, Shankar Kumar
  • Publication number: 20240028829
    Abstract: A method includes receiving training data that includes a set of unspoken textual utterances. For each respective unspoken textual utterance, the method includes, tokenizing the respective textual utterance into a sequence of sub-word units, generating a first higher order textual feature representation for a corresponding sub-word unit tokenized from the respective unspoken textual utterance, receiving the first higher order textual feature representation generated by a text encoder, and generating a first probability distribution over possible text units. The method also includes training an encoder based on the first probability distribution over possible text units generated by a first-pass decoder for each respective unspoken textual utterance in the set of unspoken textual utterances.
    Type: Application
    Filed: July 1, 2023
    Publication date: January 25, 2024
    Applicant: Google LLC
    Inventors: Tara N. Sainath, Zhouyuan Huo, Zhehuai Chen, Yu Zhang, Weiran Wang, Trevor Strohman, Rohit Prakash Prabhavalkar, Bo Li, Ankur Bapna
  • Publication number: 20230326461
    Abstract: An automated speech recognition (ASR) model includes a first encoder, a first encoder, a second encoder, and a second decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The first decoder receives, as input, the first higher order feature representation generated by the first encoder, and generates a first probability distribution over possible speech recognition hypotheses. The second encoder receives, as input, the first higher order feature representation generated by the first encoder, and generates a second higher order feature representation for a corresponding first higher order feature frame. The second decoder receives, as input, the second higher order feature representation generated by the second encoder, and generates a second probability distribution over possible speech recognition hypotheses.
    Type: Application
    Filed: March 13, 2023
    Publication date: October 12, 2023
    Applicant: Google LLC
    Inventors: Shaojin Ding, Yangzhang He, Xin Wang, Weiran Wang, Trevor Strohman, Tara N. Sainath, Rohit Parkash Prabhavalkar, Robert David, Rina Panigrahy, Rami Botros, Qiao Liang, Ian Mcgraw, Ding Zhao, Dongseong Hwang
  • Publication number: 20230298570
    Abstract: A method includes generating, using an audio encoder, a higher-order feature representation for each acoustic frame in a sequence of acoustic frames; generating, using a decoder, based on the higher-order feature representation, a plurality of speech recognition hypotheses, each hypotheses corresponding to a candidate transcription of an utterance and having an associated first likelihood score; generating, using an external language model, for each speech recognition hypothesis, a second likelihood score; determining, using a learnable fusion module, for each speech recognition hypothesis, a set of fusion weights based on the higher-order feature representation and the speech recognition hypothesis; and generating, using the learnable fusion module, for each speech recognition hypothesis, a third likelihood score based on the first likelihood score, the second likelihood score, and the set of fusion weights, the audio encoder and decoder trained using minimum additive error rate training in the presence of t
    Type: Application
    Filed: March 21, 2023
    Publication date: September 21, 2023
    Applicant: Google LLC
    Inventors: Weiran Wang, Tongzhou Chen, Tara N. Sainath, Ehsan Variani, Rohit Prakash Prabhavalkar, Ronny Huang, Bhuvana Ramabhadran, Neeraj Gaur, Sepand Mavandadi, Charles Caleb Peyser, Trevor Strohman, Yangzhang He, David Rybach
  • Publication number: 20230298563
    Abstract: A method of text-only and semi-supervised training for deliberation includes receiving training data including unspoken textual utterances that are each not paired with any corresponding spoken utterance of non-synthetic speech, and training a deliberation model that includes a text encoder and a deliberation decoder on the unspoken textual utterances. The method also includes receiving, at the trained deliberation model, first-pass hypotheses and non-causal acoustic embeddings. The first-pass hypotheses is generated by a recurrent neural network-transducer (RNN-T) decoder for the non-causal acoustic embeddings encoded by a non-causal encoder. The method also includes encoding, using the text encoder, the first-pass hypotheses generated by the RNN-T decoder, and generating, using the deliberation decoder attending to both the first-pass hypotheses and the non-causal acoustic embeddings, second-pass hypotheses.
    Type: Application
    Filed: March 18, 2023
    Publication date: September 21, 2023
    Applicant: Google LLC
    Inventors: Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Sepand Mavandadi, Weiran Wang, Trevor Strohman
  • Publication number: 20230130634
    Abstract: A computer-implemented method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. Here, the ASR model includes a causal encoder and a decoder. The method also includes generating, by the causal encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by the decoder, a first probability distribution over possible speech recognition hypotheses. Here, the causal encoder includes a stack of causal encoder layers each including a Recurrent Neural Network (RNN) Attention-Performer module that applies linear attention.
    Type: Application
    Filed: September 29, 2022
    Publication date: April 27, 2023
    Applicant: Google LLC
    Inventors: Tara N. Sainath, Rami Botros, Anmol Gulati, Krzysztof Choromanski, Ruoming Pang, Trevor Strohman, Weiran Wang, Jiahui Yu
  • Publication number: 20230107248
    Abstract: A method includes receiving an initial alignment for a candidate hypothesis generated by a transducer decoder model during a first pass. Here, the candidate hypothesis corresponds to a candidate transcription for an utterance and the initial alignment for the candidate hypothesis includes a sequence of output labels. Each output label corresponds to a blank symbol or a hypothesized sub-word unit. The method also include receiving a subsequent sequence of audio encodings characterizing the utterance. During an initial refinement step, the method also includes generating a new alignment for a rescored sequence of output labels using a non-autoregressive decoder. The non-autoregressive decoder is configured to receive the initial alignment for the candidate hypothesis and the subsequent sequence of audio encodings.
    Type: Application
    Filed: September 16, 2022
    Publication date: April 6, 2023
    Applicant: Google LLC
    Inventors: Weiran Wang, Ke Hu, Tara N. Sainath
  • Publication number: 20230099386
    Abstract: A system of recommending an item, a method of recommending an item by a system of recommending an item, a computer system, and a computer-readable storage medium are provided. The system includes: an item expansion module configured to expand a content input by a user in response to the content input by the user, to generate an item set of interest to the user, the item set includes one or more items; a price radar module configured to monitor a discount information of the item in the item set; and a price monitoring module configured to calculate an actual price of the item based on the discount information of the item, maintain a price change record of the item in the item set, and determine whether to push a prompt information to the user or not according to the calculated actual price of the item and the price change record.
    Type: Application
    Filed: February 24, 2021
    Publication date: March 30, 2023
    Inventors: Wei ZHANG, Xin SHANG, Guangming ZHU, Fan YANG, Xiaoting SI, Hongguang LIU, Weiran WANG, Jiang LAN, Yijun HUANG, Hongkai JIANG, Xuedi QIAN
  • Patent number: 11328731
    Abstract: System and methods for identifying a text word from a spoken utterance are provided. An ensemble BPE system that includes a phone BPE system and a character BPE system receives a spoken utterance. Both BPE systems include a multi-level language model (LM) and an acoustic model. The phone BPE system identifies first words from the spoken utterance and determine a first score for each first word. The first words are converted into character sequences. The character BPE model converts the character sequences into second words and determines a second score for each second word. For each word from the first words that matches a word in the second words the first and second scores are combined. The text word is the word with a highest score.
    Type: Grant
    Filed: June 17, 2020
    Date of Patent: May 10, 2022
    Assignee: salesforce.com, inc.
    Inventors: Weiran Wang, Yingbo Zhou, Caiming Xiong
  • Publication number: 20220067534
    Abstract: Embodiments described herein combine both masked reconstruction and predictive coding. Specifically, unlike contrastive learning, the mutual information between past states and future states are directly estimated. The context information can also be directly captured via shifted masked reconstruction—unlike standard masked reconstruction, the target reconstructed observations are shifted slightly towards the future to incorporate more predictability. The estimated mutual information and shifted masked reconstruction loss can then be combined as the loss function to update the neural model.
    Type: Application
    Filed: August 28, 2020
    Publication date: March 3, 2022
    Inventors: Junwen Bai, Weiran Wang, Yingbo Zhou, Caiming Xiong
  • Publication number: 20210319796
    Abstract: System and methods for identifying a text word from a spoken utterance are provided. An ensemble BPE system that includes a phone BPE system and a character BPE system receives a spoken utterance. Both BPE systems include a multi-level language model (LM) and an acoustic model. The phone BPE system identifies first words from the spoken utterance and determine a first score for each first word. The first words are converted into character sequences. The character BPE model converts the character sequences into second words and determines a second score for each second word. For each word from the first words that matches a word in the second words the first and second scores are combined. The text word is the word with a highest score.
    Type: Application
    Filed: June 17, 2020
    Publication date: October 14, 2021
    Inventors: Weiran Wang, Yingbo Zhou, Caiming Xiong
  • Patent number: 10803885
    Abstract: An audio event detection system that processes audio data into audio feature data and processes the audio feature data using pre-configured candidate interval lengths to identify top candidate regions of the feature data that may include an audio event. The feature data from the top candidate regions are then scored by a classifier, where the score indicates a likelihood that the candidate region corresponds to a desired audio event. The scores are compared to a threshold, and if the threshold is satisfied, the top scoring candidate region is determined to include an audio event.
    Type: Grant
    Filed: June 29, 2018
    Date of Patent: October 13, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Chieh-Chi Kao, Chao Wang, Weiran Wang, Ming Sun