Patents by Inventor Arun Narayanan

Arun Narayanan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230298609
    Abstract: A method for training a generalized automatic speech recognition model for joint acoustic echo cancellation, speech enhancement, and voice separation includes receiving a plurality of training utterances paired with corresponding training contextual signals. The training contextual signals include a training contextual noise signal including noise prior to the corresponding training utterance, a training reference audio signal, and a training speaker vector including voice characteristics of a target speaker that spoke the corresponding training utterance. The operations also include training, using a contextual signal dropout strategy, a contextual frontend processing model on the training utterances to learn how to predict enhanced speech features. Here, the contextual signal dropout strategy uses a predetermined probability to drop out each of the training contextual signals during training of the contextual frontend processing model.
    Type: Application
    Filed: February 19, 2023
    Publication date: September 21, 2023
    Applicant: Google LLC
    Inventors: Tom O'Malley, Quan Wang, Arun Narayanan
  • Publication number: 20230298591
    Abstract: A computer-implemented method includes receiving a sequence of acoustic frames corresponding to an utterance and generating a reference speaker embedding for the utterance. The method also includes receiving a target speaker embedding for a target speaker and generating feature-wise linear modulation (FiLM) parameters including a scaling vector and a shifting vector based on the target speaker embedding. The method also includes generating an affine transformation output that scales and shifts the reference speaker embedding based on the FiLM parameters. The method also includes generating a classification output indicating whether the utterance was spoken by the target speaker based on the affine transformation output.
    Type: Application
    Filed: March 17, 2023
    Publication date: September 21, 2023
    Applicant: Google LLC
    Inventors: Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw
  • Publication number: 20230298612
    Abstract: A multichannel neural frontend speech enhancement model for speech recognition includes a speech cleaner, a stack of self-attention blocks each having a multi-headed self attention mechanism, and a masking layer. The speech cleaner receives, as input, a multichannel noisy input signal and a multichannel contextual noise signal, and generates, as output, a single channel cleaned input signal. The stack of self-attention blocks receives, as input, at an initial block of the stack of self-attention blocks, a stacked input including the single channel cleaned input signal and a single channel noisy input signal, and generates, as output, from a final block of the stack of self-attention blocks, an un-masked output. The masking layer receives, as input, the single channel noisy input signal and the un-masked output, and generates, as output, enhanced input speech features corresponding to a target utterance.
    Type: Application
    Filed: February 20, 2023
    Publication date: September 21, 2023
    Applicant: Google LLC
    Inventors: Joseph Caroselli, Arun Narayanan, Tom O'malley
  • Publication number: 20230119845
    Abstract: In accordance with an example embodiment a clamping device comprising a first clamping device frame element coupled with a portion of a work tool coupled with the utility vehicle, a second clamping device frame element pivotally coupled with the first clamping device frame element, where the second clamping device frame element includes a first section and a second section, where the first section and the second section are positioned at an angle of approximately 90 degrees relative to each other, and a first movement actuator that movably couples the first clamping device frame element and the second clamping device frame element.
    Type: Application
    Filed: October 20, 2021
    Publication date: April 20, 2023
    Inventors: David M. O'Brien, Arun Narayanan, Jason M. Simmons, Christopher P. Kelley, Devendra Thakur, Rajan Kadam, William M. Banish
  • Publication number: 20230109407
    Abstract: A method includes receiving a sequence of acoustic frames and generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by a first pass transducer decoder, a first pass speech recognition hypothesis for a corresponding first higher order feature representation and generating, by a text encoder, a text encoding for a corresponding first pass speech recognition hypothesis. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a second pass transducer decoder, a second pass speech recognition hypothesis using a corresponding second higher order feature representation and a corresponding text encoding.
    Type: Application
    Filed: September 19, 2022
    Publication date: April 6, 2023
    Applicant: Google LLC
    Inventors: Ke Hu, Tara N. Sainath, Arun Narayanan, Ruoming Pang, Trevor Strohman
  • Publication number: 20230079828
    Abstract: A method for Short-Time Fourier Transform-based echo muting includes receiving a microphone signal including acoustic echo captured by a microphone and corresponding to audio content from an acoustic speaker, and receiving a reference signal including a sequence of frames representing the audio content. For each frame in a sequence of frames, the method includes processing, using an acoustic echo canceler configured to receive a respective frame as input to generate a respective output signal frame that cancels the acoustic echo from the respective frame, and determining, using a Double-talk Detector (DTD), based on the respective frame and the respective output signal frame, whether the respective frame includes a double-talk frame or an echo-only frame. For each respective frame that includes the echo-only frame, muting the respective output signal frame, and performing speech processing on the respective output signal frame for each respective frame that includes the double-talk frame.
    Type: Application
    Filed: December 11, 2021
    Publication date: March 16, 2023
    Applicant: Google LLC
    Inventors: Turaj Zakizadeh Shabestary, Arun Narayanan
  • Publication number: 20230038982
    Abstract: A method for automatic speech recognition using joint acoustic echo cancellation, speech enhancement, and voice separation includes receiving, at a contextual frontend processing model, input speech features corresponding to a target utterance. The method also includes receiving, at the contextual frontend processing model, at least one of a reference audio signal, a contextual noise signal including noise prior to the target utterance, or a speaker embedding including voice characteristics of a target speaker that spoke the target utterance. The method further includes processing, using the contextual frontend processing model, the input speech features and the at least one of the reference audio signal, the contextual noise signal, or the speaker embedding vector to generate enhanced speech features.
    Type: Application
    Filed: December 14, 2021
    Publication date: February 9, 2023
    Applicant: Google LLC
    Inventors: Arun Narayanan, Tom O'malley, Quan Wang, Alex Park, James Walker, Nathan David Howard, Yanzhang He, Chung-Cheng Chiu
  • Publication number: 20230038343
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.
    Type: Application
    Filed: October 12, 2022
    Publication date: February 9, 2023
    Inventors: Asaf Aharoni, Arun Narayanan, Nir Shabat, Parisa Haghani, Galen Tsai Chuang, Yaniv Leviathan, Neeraj Gaur, Pedro J. Moreno Mengibar, Rohit Prakash Prabhavalkar, Zhongdi Qu, Austin Severn Waters, Tomer Amiaz, Michiel A.U. Bacchiani
  • Patent number: 11512771
    Abstract: A transfer case includes a primary shaft, a secondary shaft radially offset from the primary shaft, and a torque transfer mechanism. The primary shaft includes a central channel and an inclined fluid channel. The torque transfer mechanism is configured to selectively transfer torque from the primary shaft to the secondary shaft, and includes a primary sprocket coupled to the primary shaft for transferring torque to the secondary shaft. The inclined fluid channel is associated with the primary sprocket, and includes an outlet that axially overlaps the primary sprocket.
    Type: Grant
    Filed: May 17, 2018
    Date of Patent: November 29, 2022
    Assignee: BorgWarner Inc.
    Inventors: Susan Stroope, Arun Narayanan, Yogesh Mehta
  • Patent number: 11495233
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.
    Type: Grant
    Filed: October 20, 2021
    Date of Patent: November 8, 2022
    Assignee: GOOGLE LLC
    Inventors: Asaf Aharoni, Arun Narayanan, Nir Shabat, Parisa Haghani, Galen Tsai Chuang, Yaniv Leviathan, Neeraj Gaur, Pedro J. Moreno Mengibar, Rohit Prakash Prabhavalkar, Zhongdi Qu, Austin Severn Waters, Tomer Amiaz, Michiel A.U. Bacchiani
  • Publication number: 20220343894
    Abstract: A method for training a streaming automatic speech recognition student model includes receiving a plurality of unlabeled student training utterances. The method also includes, for each unlabeled student training utterance, generating a transcription corresponding to the respective unlabeled student training utterance using a plurality of non-streaming automated speech recognition (ASR) teacher models. The method further includes distilling a streaming ASR student model from the plurality of non-streaming ASR teacher models by training the streaming ASR student model using the plurality of unlabeled student training utterances paired with the corresponding transcriptions generated by the plurality of non-streaming ASR teacher models.
    Type: Application
    Filed: June 15, 2021
    Publication date: October 27, 2022
    Applicant: Google LLC
    Inventors: Thibault Doutre, Wei Han, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang, Liangliang Cao
  • Publication number: 20220319498
    Abstract: Implementations disclosed herein are directed to initializing and utilizing a beamformer in processing of audio data received at a computing device. The computing device can: receive audio data that captures a spoken utterance of a user, determine that a first audio data segment of the audio data includes one or more particular words or phrases; obtain a preceding audio data segment that precedes the first audio data segment; estimate a spatial correlation matrix based on the first audio data segment and based on the preceding audio data segment; initialize the beamformer based on the estimated spatial correlation matrix; and cause the initialized beamformer to be utilized in processing of at least a second audio data segment of the audio data. Additionally, or alternatively, the computing device can transmit the spatial correlation matrix to server(s), and the server(s) can transmit the initialized beamformer back to the computing device.
    Type: Application
    Filed: April 2, 2021
    Publication date: October 6, 2022
    Inventors: Joseph Caroselli, JR., Yiteng Huang, Arun Narayanan
  • Publication number: 20220310062
    Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.
    Type: Application
    Filed: May 10, 2021
    Publication date: September 29, 2022
    Applicant: Google LLC
    Inventors: Tara Sainath, Arun Narayanan, Rami Botros, Yangzhang He, Ehsan Variani, Cyrill Allauzen, David Rybach, Ruorning Pang, Trevor Strohman
  • Publication number: 20220238101
    Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.
    Type: Application
    Filed: December 3, 2020
    Publication date: July 28, 2022
    Inventors: Tara N. Sainath, Yanzhang He, Bo Li, Arun Narayanan, Ruoming Pang, Antoine Jean Bruguier, Shuo-yiin Chang, Wei Li
  • Publication number: 20220122622
    Abstract: An automated speech recognition (ASR) model includes a first encoder, a second encoder, and a decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The second encoder receives, as input, the first higher order feature representation generated by the first encoder at each of the plurality of output steps, and generates, at each of the plurality of output steps, a second higher order feature representation for a corresponding first higher order feature frame. The decoder receives, as input, the second higher order feature representation generated by the second encoder at each of the plurality of output steps, and generates, at each of the plurality of time steps, a first probability distribution over possible speech recognition hypotheses.
    Type: Application
    Filed: April 21, 2021
    Publication date: April 21, 2022
    Applicant: Google LLC
    Inventors: Arun Narayanan, Tara Sainath, Chung-Cheng Chiu, Ruoming Pang, Rohit Prabhavalkar, Jiahui Yu, Ehsan Variani, Trevor Strohman
  • Publication number: 20220122586
    Abstract: A computer-implemented method of training a streaming speech recognition model that includes receiving, as input to the streaming speech recognition model, a sequence of acoustic frames. The streaming speech recognition model is configured to learn an alignment probability between the sequence of acoustic frames and an output sequence of vocabulary tokens. The vocabulary tokens include a plurality of label tokens and a blank token. At each output step, the method includes determining a first probability of emitting one of the label tokens and determining a second probability of emitting the blank token. The method also includes generating the alignment probability at a sequence level based on the first probability and the second probability. The method also includes applying a tuning parameter to the alignment probability at the sequence level to maximize the first probability of emitting one of the label tokens.
    Type: Application
    Filed: September 9, 2021
    Publication date: April 21, 2022
    Applicant: Google LLC
    Inventors: Jiahui Yu, Chung-cheng Chiu, Bo Li, Shuo-yiin Chang, Tara Sainath, Wei Han, Anmol Gulati, Yanzhang He, Arun Narayanan, Yonghui Wu, Ruoming Pang
  • Patent number: 11299380
    Abstract: A convertible carriage is provided. The convertible carriage includes a carriage frame, a support arrangement, a shaft, and a retaining arrangement. The support arrangement includes an elongate slot and is positioned on the carriage frame. The shaft may carry a tine and is moveable along the elongate slot at least in a vertical dimension. The retaining arrangement may selectively engage with the elongate slot. When the retaining arrangement is engaged with the elongate slot, a movement of the shaft is limited by the retaining arrangement with a boundary defined at least by the retaining arrangement. When the retaining arrangement is removed from the elongate slot, the shaft is allowed to float in the elongate slot.
    Type: Grant
    Filed: August 31, 2020
    Date of Patent: April 12, 2022
    Assignee: Deere & Company
    Inventors: Arun Narayanan, David M. O'Brien, Jason M. Simmons
  • Publication number: 20220063971
    Abstract: A convertible carriage is provided. The convertible carriage includes a carriage frame, a support arrangement, a shaft, and a retaining arrangement. The support arrangement includes an elongate slot and is positioned on the carriage frame. The shaft may carry a tine and is moveable along the elongate slot at least in a vertical dimension. The retaining arrangement may selectively engage with the elongate slot. When the retaining arrangement is engaged with the elongate slot, a movement of the shaft is limited by the retaining arrangement with a boundary defined at least by the retaining arrangement. When the retaining arrangement is removed from the elongate slot, the shaft is allowed to float in the elongate slot.
    Type: Application
    Filed: August 31, 2020
    Publication date: March 3, 2022
    Inventors: Arun Narayanan, David M. O'Brien, Jason M. Simmons
  • Publication number: 20220044684
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.
    Type: Application
    Filed: October 20, 2021
    Publication date: February 10, 2022
    Inventors: Asaf Aharoni, Arun Narayanan, Nir Shabat, Parisa Haghani, Galen Tsai Chuang, Yaniv LEVIATHAN, Neeraj Gaur, Pedro J. Moreno Mengibar, Rohit Prakash Prabhavalkar, Zhongdi Qu, Austin Severn Waters, Tomer Amiaz, Michiel A.U. Bacchiani
  • Patent number: 11158321
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.
    Type: Grant
    Filed: September 24, 2019
    Date of Patent: October 26, 2021
    Assignee: GOOGLE LLC
    Inventors: Asaf Aharoni, Arun Narayanan, Nir Shabat, Parisa Haghani, Galen Tsai Chuang, Yaniv Leviathan, Neeraj Gaur, Pedro J. Moreno Mengibar, Rohit Prakash Prabhavalkar, Zhongdi Qu, Austin Severn Waters, Tomer Amiaz, Michiel A. U. Bacchiani