Patents by Inventor Rohit Prakash
Rohit Prakash has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250118270Abstract: A multi-mode display includes a mode selector to select one of a plurality of modes, each of the modes having a different light configuration, wherein one mode comprises a reduced color space mode, and one or more light sources controlled by the mode selector, the one or more light sources used to display content to a user with the multi-mode display.Type: ApplicationFiled: November 19, 2024Publication date: April 10, 2025Applicant: Avegant Corp.Inventors: Aaron Matthew Eash, Edward Chia Ning Tang, Warren Cornelius Welch, III, Christopher David Westra, Rohit Prakash, William Tze-Tse Chien
-
Patent number: 12254883Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.Type: GrantFiled: April 15, 2024Date of Patent: March 18, 2025Assignee: GOOGLE LLCInventors: Asaf Aharoni, Arun Narayanan, Nir Shabat, Parisa Haghani, Galen Tsai Chuang, Yaniv Leviathan, Neeraj Gaur, Pedro J. Moreno Mengibar, Rohit Prakash Prabhavalkar, Zhongdi Qu, Austin Severn Waters, Tomer Amiaz, Michiel A. U. Bacchiani
-
Publication number: 20250078815Abstract: A method includes obtaining a plurality of training samples that each include a respective speech utterance and a respective textual utterance representing a transcription of the respective speech utterance. The method also includes fine-tuning, using quantization and sparsity aware training with native integer operations, a pre-trained automatic speech recognition (ASR) model on the plurality of training samples. Here, the pre-trained ASR model includes a plurality of weights and the fine-tuning includes pruning one or more weights of the plurality of weights using a sparsity mask and quantizing each weight of the plurality of weights based on an integer with a fixed-bit width. The method also includes providing the fine-tuned ASR model to a user device.Type: ApplicationFiled: September 5, 2024Publication date: March 6, 2025Applicant: Google LLCInventors: Shaojin Ding, David Qiu, David Rim, Amir Yazdanbakhsh, Yanzhang He, Zhonglin Han, Rohit Prakash Prabhavalkar, Weiran Wang, Bo Li, Jian Li, Tara N. Sainath, Shivani Agrawal, Oleg Rybakov
-
Publication number: 20240420686Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.Type: ApplicationFiled: August 26, 2024Publication date: December 19, 2024Applicant: Google LLCInventors: Rohit Prakash Prabhavalkar, Zhifeng Chen, Bo Li, Chung-Cheng Chiu, Kanury Kanishka Rao, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Michiel A. U. Bacchiani, Tara N. Sainath, Jan Kazimierz Chorowski, Anjuli Patricia Kannan, Ekaterina Gonina, Patrick An Phu Nguyen
-
Publication number: 20240411216Abstract: A display system includes a segmented light source including a plurality of separately controlled segments to emit light, optics to direct the light, and a spatial light modulator (SLM) to modulate the light from the light source, wherein the light from one or more of the plurality of segments from the segmented light source are directed to the SLM through the optics, such that spatial mapping from the segments to the SLM is maintained.Type: ApplicationFiled: June 6, 2024Publication date: December 12, 2024Applicant: Avegant Corp.Inventors: Aaron Matthew Eash, Andrew John Gross, Christopher David Westra, Edward Chia Ning Tang, Kripa Bharat Patel, Rohit Prakash, William Tze-Tse Chien
-
Patent number: 12154519Abstract: A multi-mode display includes a mode selector to select one of a plurality of modes, each of the modes having a different light configuration, wherein one mode comprises a reduced color space mode, and one or more light sources controlled by the mode selector, the one or more light sources used to display content to a user with the multi-mode display.Type: GrantFiled: April 4, 2023Date of Patent: November 26, 2024Assignee: Avegant Corp.Inventors: Aaron Matthew Eash, Edward Chia Ning Tang, Warren Cornelius Welch, III, Christopher David Westra, Rohit Prakash, William Tze-Tse Chien
-
Publication number: 20240379095Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.Type: ApplicationFiled: July 23, 2024Publication date: November 14, 2024Applicant: Google LLCInventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath
-
Patent number: 12106749Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.Type: GrantFiled: September 20, 2021Date of Patent: October 1, 2024Assignee: Google LLCInventors: Rohit Prakash Prabhavalkar, Zhifeng Chen, Bo Li, Chung-cheng Chiu, Kanury Kanishka Rao, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Michiel A. u. Bacchiani, Tara N. Sainath, Jan Kazimierz Chorowski, Anjuli Patricia Kannan, Ekaterina Gonina, Patrick An Phu Nguyen
-
Publication number: 20240304181Abstract: A method includes receiving a plurality of training samples spanning multiple different domains. Each corresponding training sample includes audio data characterizing an utterance paired with a corresponding transcription of the utterance. The method also includes re-labeling each corresponding training sample of the plurality of training samples by annotating the corresponding transcription of the utterance with one or more speaker tags. Each speaker tag indicates a respective segment of the transcription for speech that was spoken by a particular type of speaker. The method also includes training a multi-domain speech recognition model on the re-labeled training samples to teach the multi-domain speech recognition model to learn to share parameters for recognizing speech across each of the different multiple different domains.Type: ApplicationFiled: March 7, 2024Publication date: September 12, 2024Applicant: Google LLCInventors: Guru Prakash Arumugam, Shuo-yiin Chang, Shaan Jagdeep Patrick Bijwadia, Weiran Wang, Quan Wang, Rohit Prakash Prabhavalkar, Tara N. Sainath
-
Publication number: 20240265923Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.Type: ApplicationFiled: April 15, 2024Publication date: August 8, 2024Inventors: Asaf Aharoni, Arun Narayanan, Nir Shabat, Parisa Haghani, Galen Tsai Chuang, Yaniv Leviathan, Neeraj Gaur, Pedro J. Moreno Mengibar, Rohit Prakash Prabhavalkar, Zhongdi Qu, Austin Severn Waters, Tomer Amiaz, Michiel A.U. Bacchiani
-
Patent number: 12051407Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.Type: GrantFiled: July 26, 2022Date of Patent: July 30, 2024Assignee: Google LLCInventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath
-
Publication number: 20240221750Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting utterances of a key phrase in an audio signal. One of the methods includes receiving, by a key phrase spotting system, an audio signal encoding one or more utterances; while continuing to receive the audio signal, generating, by the key phrase spotting system, an attention output using an attention mechanism that is configured to compute the attention output based on a series of encodings generated by an encoder comprising one or more neural network layers; generating, by the key phrase spotting system and using attention output, output that indicates whether the audio signal likely encodes the key phrase; and providing, by the key phrase spotting system, the output that indicates whether the audio signal likely encodes the key phrase.Type: ApplicationFiled: March 19, 2024Publication date: July 4, 2024Applicant: Google LLCInventors: Wei Li, Rohit Prakash Prabhavalkar, Kanury Kanishka Rao, Yanzhang He, Ian C. McGraw, Anton Bakhtin
-
Patent number: 12027158Abstract: A method of performing speech recognition using a two-pass deliberation architecture includes receiving a first-pass hypothesis and an encoded acoustic frame and encoding the first-pass hypothesis at a hypothesis encoder. The first-pass hypothesis is generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame. The method also includes generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector, and generating, using a second attention mechanism attending to the encoded first-pass hypothesis, a second context vector. The method also includes decoding the first context vector and the second context vector at a context vector decoder to form a second-pass hypothesis.Type: GrantFiled: February 6, 2023Date of Patent: July 2, 2024Assignee: Google LLCInventors: Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prakash Prabhavalkar
-
Publication number: 20240185841Abstract: A method includes obtaining an ASR model trained to recognize speech in a first language and receiving transcribed training utterances in a second language. The method also includes integrating the ASR model with an input reprogramming module and a latent reprogramming module. The method also includes adapting the ASR model to learn how to recognize speech in the second language by training the input reprogramming module and the latent reprogramming module while parameters of the ASR model are frozen.Type: ApplicationFiled: October 20, 2023Publication date: June 6, 2024Applicant: Google LLCInventors: Bo Li, Yu Zhang, Nanxin Chen, Rohit Prakash Prabhavalkar, Chao-Han Huck Yang, Tara N. Sainath, Trevor Strohman
-
Patent number: 11990133Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.Type: GrantFiled: July 7, 2023Date of Patent: May 21, 2024Assignee: GOOGLE LLCInventors: Asaf Aharoni, Arun Narayanan, Nir Shabat, Parisa Haghani, Galen Tsai Chuang, Yaniv Leviathan, Neeraj Gaur, Pedro J. Moreno Mengibar, Rohit Prakash Prabhavalkar, Zhongdi Qu, Austin Severn Waters, Tomer Amiaz, Michiel A. U. Bacchiani
-
Publication number: 20240153498Abstract: A method includes receiving context biasing data that includes a set of unspoken textual utterances corresponding to a particular context. The method also includes obtaining a list of carrier phrases associated with the particular context. For each respective unspoken textual utterance, the method includes generating a corresponding training data pair that includes the respective unspoken textual utterance and a carrier phrase. For each respective training data pair, the method includes tokenizing the respective training data pair into a sequence of sub-word units, generating a first higher order textual feature representation for a corresponding sub-word unit, receiving the first higher order textual feature representation, and generating a first probability distribution over possible text units. The method also includes training a speech recognition model based on the first probability distribution over possible text units.Type: ApplicationFiled: October 20, 2023Publication date: May 9, 2024Applicant: Google LLCInventors: Tara N. Sainath, Rohit Prakash Prabhavalkar, Diamantino Antonio Caseiro, Patrick Maxim Rondon, Cyril Allauzen
-
Publication number: 20240144917Abstract: A method includes obtaining a base encoder from a pre-trained model, and receiving training data comprising a sequence of acoustic frames characterizing an utterance paired with a ground-truth transcription of the utterance. At each of a plurality of output steps, the method includes: generating, by the base encoder, a first encoded representation for a corresponding acoustic frame; generating, by an exporter network configured to receive a continuous sequence of first encoded representations generated by the base encoder, a second encoded representation for a corresponding acoustic frame; generating, by an exporter decoder, a probability distribution over possible logits; and determining an exporter decoder loss based on the probability distribution over possible logits generated by the exporter decoder at the corresponding output step and the ground-truth transcription.Type: ApplicationFiled: October 25, 2023Publication date: May 2, 2024Applicant: Google LLCInventors: Rami Magdi Fahmi Botros, Rohit Prakash Prabhavalkar, Johan Schalkwyk, Tara N. Sainath, Ciprian Ioan Chelba, Francoise Beaufays
-
Patent number: 11954503Abstract: The present invention provides for building a knowledgebase of dependencies between Configuration Items(CIs) associated with IT computing environment. In operation, the present invention provides for mapping a plurality of Configuration Items(CI) with respective one or more actions. The present invention further provides for tracking and capturing of one or more actions performed on one or more CIs in relation to resolving an activity related to a reported CI. Further, the present invention provides for identifying dependencies between one or more CIs and the reported CI based on the captured one or more actions. Furthermore, the present invention provides for building a knowledgebase of dependencies between CIs of the computing environment based on the identified dependencies between one or more CIs and the reported CI. Yet further, the present invention provides for generating visual representations of dependencies between CIs.Type: GrantFiled: May 12, 2022Date of Patent: April 9, 2024Assignee: COGNIZANT TECHNOLOGY SOLUTIONS INDIA PVT. LTD.Inventors: Rohit Prakash, Rohan Prakash, Yogesh Sosale Gundurao, Ambarish Poojari, Ragini Suresh, Pooja Jagadish
-
Patent number: 11948062Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing a compressed recurrent neural network (RNN). One of the systems includes a compressed RNN, the compressed RNN comprising a plurality of recurrent layers, wherein each of the recurrent layers has a respective recurrent weight matrix and a respective inter-layer weight matrix, and wherein at least one of recurrent layers is compressed such that a respective recurrent weight matrix of the compressed layer is defined by a first compressed weight matrix and a projection matrix and a respective inter-layer weight matrix of the compressed layer is defined by a second compressed weight matrix and the projection matrix.Type: GrantFiled: December 4, 2020Date of Patent: April 2, 2024Assignee: Google LLCInventors: Ouais Alsharif, Rohit Prakash Prabhavalkar, Ian C. McGraw, Antoine Jean Bruguier
-
Patent number: 11948570Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting utterances of a key phrase in an audio signal. One of the methods includes receiving, by a key phrase spotting system, an audio signal encoding one or more utterances; while continuing to receive the audio signal, generating, by the key phrase spotting system, an attention output using an attention mechanism that is configured to compute the attention output based on a series of encodings generated by an encoder comprising one or more neural network layers; generating, by the key phrase spotting system and using attention output, output that indicates whether the audio signal likely encodes the key phrase; and providing, by the key phrase spotting system, the output that indicates whether the audio signal likely encodes the key phrase.Type: GrantFiled: March 9, 2022Date of Patent: April 2, 2024Assignee: Google LLCInventors: Wei Li, Rohit Prakash Prabhavalkar, Kanury Kanishka Rao, Yanzhang He, Ian C. Mcgraw, Anton Bakhtin