Patents by Inventor Rohit Prakash
Rohit Prakash has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12106749Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.Type: GrantFiled: September 20, 2021Date of Patent: October 1, 2024Assignee: Google LLCInventors: Rohit Prakash Prabhavalkar, Zhifeng Chen, Bo Li, Chung-cheng Chiu, Kanury Kanishka Rao, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Michiel A. u. Bacchiani, Tara N. Sainath, Jan Kazimierz Chorowski, Anjuli Patricia Kannan, Ekaterina Gonina, Patrick An Phu Nguyen
-
Publication number: 20240304181Abstract: A method includes receiving a plurality of training samples spanning multiple different domains. Each corresponding training sample includes audio data characterizing an utterance paired with a corresponding transcription of the utterance. The method also includes re-labeling each corresponding training sample of the plurality of training samples by annotating the corresponding transcription of the utterance with one or more speaker tags. Each speaker tag indicates a respective segment of the transcription for speech that was spoken by a particular type of speaker. The method also includes training a multi-domain speech recognition model on the re-labeled training samples to teach the multi-domain speech recognition model to learn to share parameters for recognizing speech across each of the different multiple different domains.Type: ApplicationFiled: March 7, 2024Publication date: September 12, 2024Applicant: Google LLCInventors: Guru Prakash Arumugam, Shuo-yiin Chang, Shaan Jagdeep Patrick Bijwadia, Weiran Wang, Quan Wang, Rohit Prakash Prabhavalkar, Tara N. Sainath
-
Patent number: 12066282Abstract: A lighting stage includes a plurality of lights that project alternating spherical color gradient illumination patterns onto an object or human performer at a predetermined frequency. The lighting stage also includes a plurality of cameras that capture images of an object or human performer corresponding to the alternating spherical color gradient illumination patterns. The lighting stage also includes a plurality of depth sensors that capture depth maps of the object or human performer at the predetermined frequency. The lighting stage also includes (or is associated with) one or more processors that implement a machine learning algorithm to produce a three-dimensional (3D) model of the object or human performer. The 3D model includes relighting parameters used to relight the 3D model under different lighting conditions.Type: GrantFiled: November 11, 2020Date of Patent: August 20, 2024Assignee: GOOGLE LLCInventors: Sean Ryan Francesco Fanello, Kaiwen Guo, Peter Christopher Lincoln, Philip Lindsley Davidson, Jessica L. Busch, Xueming Yu, Geoffrey Harvey, Sergio Orts Escolano, Rohit Kumar Pandey, Jason Dourgarian, Danhang Tang, Adarsh Prakash Murthy Kowdle, Emily B. Cooper, Mingsong Dou, Graham Fyffe, Christoph Rhemann, Jonathan James Taylor, Shahram Izadi, Paul Ernest Debevec
-
Patent number: 12067145Abstract: The disclosure herein describes processing consent data and using the processed consent data in workflows. Customer consent data is accessed, wherein the customer consent data includes subject consent instances including associated consent purpose-value pairs. The customer consent data is mapped to a raw consent data schema based on mapping selections made on a mapping UI, wherein the mapping includes mapping consent purpose-value pairs of the consent instances to data columns of the raw consent data schema. Metadata representing one or more consent rules related to the raw consent data schema is generated based on rule selections made on a rule configuration UI and the consent rules are applied to one or more workflows. The disclosure enables consent data in different formats and/or from different sources to be ingested and standardized in a single platform such that consent checking functionality can be provided for applications in a consistent and comprehensive manner.Type: GrantFiled: December 10, 2021Date of Patent: August 20, 2024Assignee: Microsoft Technology Licensing, LLC.Inventors: Smith Codio, Anubhav Tandon, Patrick Meade Stirrat, Mukesh Pohuja, Gyan Prakash Trivedi, John Michael Bolinder, Rohit Sanka, Rong Zhou, Balasubramanian Shyamsundar, Harsha Bacharaju
-
Publication number: 20240265923Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.Type: ApplicationFiled: April 15, 2024Publication date: August 8, 2024Inventors: Asaf Aharoni, Arun Narayanan, Nir Shabat, Parisa Haghani, Galen Tsai Chuang, Yaniv Leviathan, Neeraj Gaur, Pedro J. Moreno Mengibar, Rohit Prakash Prabhavalkar, Zhongdi Qu, Austin Severn Waters, Tomer Amiaz, Michiel A.U. Bacchiani
-
Patent number: 12051407Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.Type: GrantFiled: July 26, 2022Date of Patent: July 30, 2024Assignee: Google LLCInventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath
-
Publication number: 20240221750Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting utterances of a key phrase in an audio signal. One of the methods includes receiving, by a key phrase spotting system, an audio signal encoding one or more utterances; while continuing to receive the audio signal, generating, by the key phrase spotting system, an attention output using an attention mechanism that is configured to compute the attention output based on a series of encodings generated by an encoder comprising one or more neural network layers; generating, by the key phrase spotting system and using attention output, output that indicates whether the audio signal likely encodes the key phrase; and providing, by the key phrase spotting system, the output that indicates whether the audio signal likely encodes the key phrase.Type: ApplicationFiled: March 19, 2024Publication date: July 4, 2024Applicant: Google LLCInventors: Wei Li, Rohit Prakash Prabhavalkar, Kanury Kanishka Rao, Yanzhang He, Ian C. McGraw, Anton Bakhtin
-
Patent number: 12027158Abstract: A method of performing speech recognition using a two-pass deliberation architecture includes receiving a first-pass hypothesis and an encoded acoustic frame and encoding the first-pass hypothesis at a hypothesis encoder. The first-pass hypothesis is generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame. The method also includes generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector, and generating, using a second attention mechanism attending to the encoded first-pass hypothesis, a second context vector. The method also includes decoding the first context vector and the second context vector at a context vector decoder to form a second-pass hypothesis.Type: GrantFiled: February 6, 2023Date of Patent: July 2, 2024Assignee: Google LLCInventors: Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prakash Prabhavalkar
-
Publication number: 20240185841Abstract: A method includes obtaining an ASR model trained to recognize speech in a first language and receiving transcribed training utterances in a second language. The method also includes integrating the ASR model with an input reprogramming module and a latent reprogramming module. The method also includes adapting the ASR model to learn how to recognize speech in the second language by training the input reprogramming module and the latent reprogramming module while parameters of the ASR model are frozen.Type: ApplicationFiled: October 20, 2023Publication date: June 6, 2024Applicant: Google LLCInventors: Bo Li, Yu Zhang, Nanxin Chen, Rohit Prakash Prabhavalkar, Chao-Han Huck Yang, Tara N. Sainath, Trevor Strohman
-
Patent number: 11990133Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.Type: GrantFiled: July 7, 2023Date of Patent: May 21, 2024Assignee: GOOGLE LLCInventors: Asaf Aharoni, Arun Narayanan, Nir Shabat, Parisa Haghani, Galen Tsai Chuang, Yaniv Leviathan, Neeraj Gaur, Pedro J. Moreno Mengibar, Rohit Prakash Prabhavalkar, Zhongdi Qu, Austin Severn Waters, Tomer Amiaz, Michiel A. U. Bacchiani
-
Publication number: 20240153498Abstract: A method includes receiving context biasing data that includes a set of unspoken textual utterances corresponding to a particular context. The method also includes obtaining a list of carrier phrases associated with the particular context. For each respective unspoken textual utterance, the method includes generating a corresponding training data pair that includes the respective unspoken textual utterance and a carrier phrase. For each respective training data pair, the method includes tokenizing the respective training data pair into a sequence of sub-word units, generating a first higher order textual feature representation for a corresponding sub-word unit, receiving the first higher order textual feature representation, and generating a first probability distribution over possible text units. The method also includes training a speech recognition model based on the first probability distribution over possible text units.Type: ApplicationFiled: October 20, 2023Publication date: May 9, 2024Applicant: Google LLCInventors: Tara N. Sainath, Rohit Prakash Prabhavalkar, Diamantino Antonio Caseiro, Patrick Maxim Rondon, Cyril Allauzen
-
Publication number: 20240144917Abstract: A method includes obtaining a base encoder from a pre-trained model, and receiving training data comprising a sequence of acoustic frames characterizing an utterance paired with a ground-truth transcription of the utterance. At each of a plurality of output steps, the method includes: generating, by the base encoder, a first encoded representation for a corresponding acoustic frame; generating, by an exporter network configured to receive a continuous sequence of first encoded representations generated by the base encoder, a second encoded representation for a corresponding acoustic frame; generating, by an exporter decoder, a probability distribution over possible logits; and determining an exporter decoder loss based on the probability distribution over possible logits generated by the exporter decoder at the corresponding output step and the ground-truth transcription.Type: ApplicationFiled: October 25, 2023Publication date: May 2, 2024Applicant: Google LLCInventors: Rami Magdi Fahmi Botros, Rohit Prakash Prabhavalkar, Johan Schalkwyk, Tara N. Sainath, Ciprian Ioan Chelba, Francoise Beaufays
-
Patent number: 11954503Abstract: The present invention provides for building a knowledgebase of dependencies between Configuration Items(CIs) associated with IT computing environment. In operation, the present invention provides for mapping a plurality of Configuration Items(CI) with respective one or more actions. The present invention further provides for tracking and capturing of one or more actions performed on one or more CIs in relation to resolving an activity related to a reported CI. Further, the present invention provides for identifying dependencies between one or more CIs and the reported CI based on the captured one or more actions. Furthermore, the present invention provides for building a knowledgebase of dependencies between CIs of the computing environment based on the identified dependencies between one or more CIs and the reported CI. Yet further, the present invention provides for generating visual representations of dependencies between CIs.Type: GrantFiled: May 12, 2022Date of Patent: April 9, 2024Assignee: COGNIZANT TECHNOLOGY SOLUTIONS INDIA PVT. LTD.Inventors: Rohit Prakash, Rohan Prakash, Yogesh Sosale Gundurao, Ambarish Poojari, Ragini Suresh, Pooja Jagadish
-
Patent number: 11948570Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting utterances of a key phrase in an audio signal. One of the methods includes receiving, by a key phrase spotting system, an audio signal encoding one or more utterances; while continuing to receive the audio signal, generating, by the key phrase spotting system, an attention output using an attention mechanism that is configured to compute the attention output based on a series of encodings generated by an encoder comprising one or more neural network layers; generating, by the key phrase spotting system and using attention output, output that indicates whether the audio signal likely encodes the key phrase; and providing, by the key phrase spotting system, the output that indicates whether the audio signal likely encodes the key phrase.Type: GrantFiled: March 9, 2022Date of Patent: April 2, 2024Assignee: Google LLCInventors: Wei Li, Rohit Prakash Prabhavalkar, Kanury Kanishka Rao, Yanzhang He, Ian C. Mcgraw, Anton Bakhtin
-
Patent number: 11948062Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing a compressed recurrent neural network (RNN). One of the systems includes a compressed RNN, the compressed RNN comprising a plurality of recurrent layers, wherein each of the recurrent layers has a respective recurrent weight matrix and a respective inter-layer weight matrix, and wherein at least one of recurrent layers is compressed such that a respective recurrent weight matrix of the compressed layer is defined by a first compressed weight matrix and a projection matrix and a respective inter-layer weight matrix of the compressed layer is defined by a second compressed weight matrix and the projection matrix.Type: GrantFiled: December 4, 2020Date of Patent: April 2, 2024Assignee: Google LLCInventors: Ouais Alsharif, Rohit Prakash Prabhavalkar, Ian C. McGraw, Antoine Jean Bruguier
-
Patent number: 11942076Abstract: A method includes receiving audio data encoding an utterance spoken by a native speaker of a first language, and receiving a biasing term list including one or more terms in a second language different than the first language. The method also includes processing, using a speech recognition model, acoustic features derived from the audio data to generate speech recognition scores for both wordpieces and corresponding phoneme sequences in the first language. The method also includes rescoring the speech recognition scores for the phoneme sequences based on the one or more terms in the biasing term list, and executing, using the speech recognition scores for the wordpieces and the rescored speech recognition scores for the phoneme sequences, a decoding graph to generate a transcription for the utterance.Type: GrantFiled: February 16, 2022Date of Patent: March 26, 2024Assignee: Google LLCInventors: Ke Hu, Golan Pundak, Rohit Prakash Prabhavalkar, Antoine Jean Bruguier, Tara N. Sainath
-
Patent number: 11922932Abstract: Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses a set of speech recognition hypothesis samples, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.Type: GrantFiled: March 31, 2023Date of Patent: March 5, 2024Assignee: Google LLCInventors: Rohit Prakash Prabhavalkar, Tara N. Sainath, Yonghui Wu, Patrick An Phu Nguyen, Zhifeng Chen, Chung-Cheng Chiu, Anjuli Patricia Kannan
-
Patent number: 11908461Abstract: A method of performing speech recognition using a two-pass deliberation architecture includes receiving a first-pass hypothesis and an encoded acoustic frame and encoding the first-pass hypothesis at a hypothesis encoder. The first-pass hypothesis is generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame. The method also includes generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector, and generating, using a second attention mechanism attending to the encoded first-pass hypothesis, a second context vector. The method also includes decoding the first context vector and the second context vector at a context vector decoder to form a second-pass hypothesis.Type: GrantFiled: January 14, 2021Date of Patent: February 20, 2024Assignee: Google LLCInventors: Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prakash Prabhavalkar
-
Publication number: 20240028829Abstract: A method includes receiving training data that includes a set of unspoken textual utterances. For each respective unspoken textual utterance, the method includes, tokenizing the respective textual utterance into a sequence of sub-word units, generating a first higher order textual feature representation for a corresponding sub-word unit tokenized from the respective unspoken textual utterance, receiving the first higher order textual feature representation generated by a text encoder, and generating a first probability distribution over possible text units. The method also includes training an encoder based on the first probability distribution over possible text units generated by a first-pass decoder for each respective unspoken textual utterance in the set of unspoken textual utterances.Type: ApplicationFiled: July 1, 2023Publication date: January 25, 2024Applicant: Google LLCInventors: Tara N. Sainath, Zhouyuan Huo, Zhehuai Chen, Yu Zhang, Weiran Wang, Trevor Strohman, Rohit Prakash Prabhavalkar, Bo Li, Ankur Bapna
-
Publication number: 20230352027Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.Type: ApplicationFiled: July 7, 2023Publication date: November 2, 2023Inventors: Asaf Aharoni, Arun Narayanan, Nir Shabat, Parisa Haghani, Galen Tsai Chuang, Yaniv Leviathan, Neeraj Gaur, Pedro J. Moreno Mengibar, Rohit Prakash Prabhavalkar, Zhongdi Qu, Austin Severn Waters, Tomer Amiaz, Michiel A.U. Bacchiani