Patents by Inventor Rohit Prakash

Rohit Prakash has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Multi-Mode Display

Publication number: 20250118270

Abstract: A multi-mode display includes a mode selector to select one of a plurality of modes, each of the modes having a different light configuration, wherein one mode comprises a reduced color space mode, and one or more light sources controlled by the mode selector, the one or more light sources used to display content to a user with the multi-mode display.

Type: Application

Filed: November 19, 2024

Publication date: April 10, 2025

Applicant: Avegant Corp.

Inventors: Aaron Matthew Eash, Edward Chia Ning Tang, Warren Cornelius Welch, III, Christopher David Westra, Rohit Prakash, William Tze-Tse Chien
Automated calling system

Patent number: 12254883

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.

Type: Grant

Filed: April 15, 2024

Date of Patent: March 18, 2025

Assignee: GOOGLE LLC

Inventors: Asaf Aharoni, Arun Narayanan, Nir Shabat, Parisa Haghani, Galen Tsai Chuang, Yaniv Leviathan, Neeraj Gaur, Pedro J. Moreno Mengibar, Rohit Prakash Prabhavalkar, Zhongdi Qu, Austin Severn Waters, Tomer Amiaz, Michiel A. U. Bacchiani
QUANTIZATION AND SPARSITY AWARE FINE-TUNING FOR SPEECH RECOGNITION WITH UNIVERSAL SPEECH MODELS

Publication number: 20250078815

Abstract: A method includes obtaining a plurality of training samples that each include a respective speech utterance and a respective textual utterance representing a transcription of the respective speech utterance. The method also includes fine-tuning, using quantization and sparsity aware training with native integer operations, a pre-trained automatic speech recognition (ASR) model on the plurality of training samples. Here, the pre-trained ASR model includes a plurality of weights and the fine-tuning includes pruning one or more weights of the plurality of weights using a sparsity mask and quantizing each weight of the plurality of weights based on an integer with a fixed-bit width. The method also includes providing the fine-tuned ASR model to a user device.

Type: Application

Filed: September 5, 2024

Publication date: March 6, 2025

Applicant: Google LLC

Inventors: Shaojin Ding, David Qiu, David Rim, Amir Yazdanbakhsh, Yanzhang He, Zhonglin Han, Rohit Prakash Prabhavalkar, Weiran Wang, Bo Li, Jian Li, Tara N. Sainath, Shivani Agrawal, Oleg Rybakov
SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS

Publication number: 20240420686

Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

Type: Application

Filed: August 26, 2024

Publication date: December 19, 2024

Applicant: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Zhifeng Chen, Bo Li, Chung-Cheng Chiu, Kanury Kanishka Rao, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Michiel A. U. Bacchiani, Tara N. Sainath, Jan Kazimierz Chorowski, Anjuli Patricia Kannan, Ekaterina Gonina, Patrick An Phu Nguyen
Selective Illumination of a Spatial Light Modulator

Publication number: 20240411216

Abstract: A display system includes a segmented light source including a plurality of separately controlled segments to emit light, optics to direct the light, and a spatial light modulator (SLM) to modulate the light from the light source, wherein the light from one or more of the plurality of segments from the segmented light source are directed to the SLM through the optics, such that spatial mapping from the segments to the SLM is maintained.

Type: Application

Filed: June 6, 2024

Publication date: December 12, 2024

Applicant: Avegant Corp.

Inventors: Aaron Matthew Eash, Andrew John Gross, Christopher David Westra, Edward Chia Ning Tang, Kripa Bharat Patel, Rohit Prakash, William Tze-Tse Chien
Multi-mode display

Patent number: 12154519

Abstract: A multi-mode display includes a mode selector to select one of a plurality of modes, each of the modes having a different light configuration, wherein one mode comprises a reduced color space mode, and one or more light sources controlled by the mode selector, the one or more light sources used to display content to a user with the multi-mode display.

Type: Grant

Filed: April 4, 2023

Date of Patent: November 26, 2024

Assignee: Avegant Corp.

Inventors: Aaron Matthew Eash, Edward Chia Ning Tang, Warren Cornelius Welch, III, Christopher David Westra, Rohit Prakash, William Tze-Tse Chien
CONTEXTUAL BIASING FOR SPEECH RECOGNITION

Publication number: 20240379095

Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.

Type: Application

Filed: July 23, 2024

Publication date: November 14, 2024

Applicant: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath
Speech recognition with sequence-to-sequence models

Patent number: 12106749

Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

Type: Grant

Filed: September 20, 2021

Date of Patent: October 1, 2024

Assignee: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Zhifeng Chen, Bo Li, Chung-cheng Chiu, Kanury Kanishka Rao, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Michiel A. u. Bacchiani, Tara N. Sainath, Jan Kazimierz Chorowski, Anjuli Patricia Kannan, Ekaterina Gonina, Patrick An Phu Nguyen
CONNECTING DIFFERENT ASR APPLICATION DOMAINS WITH SPEAKER-TAGS

Publication number: 20240304181

Abstract: A method includes receiving a plurality of training samples spanning multiple different domains. Each corresponding training sample includes audio data characterizing an utterance paired with a corresponding transcription of the utterance. The method also includes re-labeling each corresponding training sample of the plurality of training samples by annotating the corresponding transcription of the utterance with one or more speaker tags. Each speaker tag indicates a respective segment of the transcription for speech that was spoken by a particular type of speaker. The method also includes training a multi-domain speech recognition model on the re-labeled training samples to teach the multi-domain speech recognition model to learn to share parameters for recognizing speech across each of the different multiple different domains.

Type: Application

Filed: March 7, 2024

Publication date: September 12, 2024

Applicant: Google LLC

Inventors: Guru Prakash Arumugam, Shuo-yiin Chang, Shaan Jagdeep Patrick Bijwadia, Weiran Wang, Quan Wang, Rohit Prakash Prabhavalkar, Tara N. Sainath
AUTOMATED CALLING SYSTEM

Publication number: 20240265923

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.

Type: Application

Filed: April 15, 2024

Publication date: August 8, 2024

Inventors: Asaf Aharoni, Arun Narayanan, Nir Shabat, Parisa Haghani, Galen Tsai Chuang, Yaniv Leviathan, Neeraj Gaur, Pedro J. Moreno Mengibar, Rohit Prakash Prabhavalkar, Zhongdi Qu, Austin Severn Waters, Tomer Amiaz, Michiel A.U. Bacchiani
Contextual biasing for speech recognition

Patent number: 12051407

Abstract: A method includes receiving audio data encoding an utterance and obtaining a set of bias phrases corresponding to a context of the utterance. Each bias phrase includes one or more words. The method also includes processing, using a speech recognition model, acoustic features derived from the audio to generate an output from the speech recognition model. The speech recognition model includes a first encoder configured to receive the acoustic features, a bias encoder configured to receive data indicating the obtained set of bias phrases, a bias encoder, and a decoder configured to determine likelihoods of sequences of speech elements based on output of the first attention module and output of the bias attention module. The method also includes determining a transcript for the utterance based on the likelihoods of sequences of speech elements.

Type: Grant

Filed: July 26, 2022

Date of Patent: July 30, 2024

Assignee: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Golan Pundak, Tara N. Sainath
KEY PHRASE SPOTTING

Publication number: 20240221750

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting utterances of a key phrase in an audio signal. One of the methods includes receiving, by a key phrase spotting system, an audio signal encoding one or more utterances; while continuing to receive the audio signal, generating, by the key phrase spotting system, an attention output using an attention mechanism that is configured to compute the attention output based on a series of encodings generated by an encoder comprising one or more neural network layers; generating, by the key phrase spotting system and using attention output, output that indicates whether the audio signal likely encodes the key phrase; and providing, by the key phrase spotting system, the output that indicates whether the audio signal likely encodes the key phrase.

Type: Application

Filed: March 19, 2024

Publication date: July 4, 2024

Applicant: Google LLC

Inventors: Wei Li, Rohit Prakash Prabhavalkar, Kanury Kanishka Rao, Yanzhang He, Ian C. McGraw, Anton Bakhtin
Deliberation model-based two-pass end-to-end speech recognition

Patent number: 12027158

Abstract: A method of performing speech recognition using a two-pass deliberation architecture includes receiving a first-pass hypothesis and an encoded acoustic frame and encoding the first-pass hypothesis at a hypothesis encoder. The first-pass hypothesis is generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame. The method also includes generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector, and generating, using a second attention mechanism attending to the encoded first-pass hypothesis, a second context vector. The method also includes decoding the first context vector and the second context vector at a context vector decoder to form a second-pass hypothesis.

Type: Grant

Filed: February 6, 2023

Date of Patent: July 2, 2024

Assignee: Google LLC

Inventors: Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prakash Prabhavalkar
PARAMETER-EFFICIENT MODEL REPROGRAMMING FOR CROSS-LINGUAL SPEECH RECOGNITION

Publication number: 20240185841

Abstract: A method includes obtaining an ASR model trained to recognize speech in a first language and receiving transcribed training utterances in a second language. The method also includes integrating the ASR model with an input reprogramming module and a latent reprogramming module. The method also includes adapting the ASR model to learn how to recognize speech in the second language by training the input reprogramming module and the latent reprogramming module while parameters of the ASR model are frozen.

Type: Application

Filed: October 20, 2023

Publication date: June 6, 2024

Applicant: Google LLC

Inventors: Bo Li, Yu Zhang, Nanxin Chen, Rohit Prakash Prabhavalkar, Chao-Han Huck Yang, Tara N. Sainath, Trevor Strohman
Automated calling system

Patent number: 11990133

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for an automated calling system are disclosed. In one aspect, a method includes the actions of receiving audio data of an utterance spoken by a user who is having a telephone conversation with a bot. The actions further include determining a context of the telephone conversation. The actions further include determining a user intent of a first previous portion of the telephone conversation spoken by the user and a bot intent of a second previous portion of the telephone conversation outputted by a speech synthesizer of the bot. The actions further include, based on the audio data of the utterance, the context of the telephone conversation, the user intent, and the bot intent, generating synthesized speech of a reply by the bot to the utterance. The actions further include, providing, for output, the synthesized speech.

Type: Grant

Filed: July 7, 2023

Date of Patent: May 21, 2024

Assignee: GOOGLE LLC

Inventors: Asaf Aharoni, Arun Narayanan, Nir Shabat, Parisa Haghani, Galen Tsai Chuang, Yaniv Leviathan, Neeraj Gaur, Pedro J. Moreno Mengibar, Rohit Prakash Prabhavalkar, Zhongdi Qu, Austin Severn Waters, Tomer Amiaz, Michiel A. U. Bacchiani
Contextual Biasing With Text Injection

Publication number: 20240153498

Abstract: A method includes receiving context biasing data that includes a set of unspoken textual utterances corresponding to a particular context. The method also includes obtaining a list of carrier phrases associated with the particular context. For each respective unspoken textual utterance, the method includes generating a corresponding training data pair that includes the respective unspoken textual utterance and a carrier phrase. For each respective training data pair, the method includes tokenizing the respective training data pair into a sequence of sub-word units, generating a first higher order textual feature representation for a corresponding sub-word unit, receiving the first higher order textual feature representation, and generating a first probability distribution over possible text units. The method also includes training a speech recognition model based on the first probability distribution over possible text units.

Type: Application

Filed: October 20, 2023

Publication date: May 9, 2024

Applicant: Google LLC

Inventors: Tara N. Sainath, Rohit Prakash Prabhavalkar, Diamantino Antonio Caseiro, Patrick Maxim Rondon, Cyril Allauzen
EXPORTING MODULAR ENCODER FEATURES FOR STREAMING AND DELIBERATION ASR

Publication number: 20240144917

Abstract: A method includes obtaining a base encoder from a pre-trained model, and receiving training data comprising a sequence of acoustic frames characterizing an utterance paired with a ground-truth transcription of the utterance. At each of a plurality of output steps, the method includes: generating, by the base encoder, a first encoded representation for a corresponding acoustic frame; generating, by an exporter network configured to receive a continuous sequence of first encoded representations generated by the base encoder, a second encoded representation for a corresponding acoustic frame; generating, by an exporter decoder, a probability distribution over possible logits; and determining an exporter decoder loss based on the probability distribution over possible logits generated by the exporter decoder at the corresponding output step and the ground-truth transcription.

Type: Application

Filed: October 25, 2023

Publication date: May 2, 2024

Applicant: Google LLC

Inventors: Rami Magdi Fahmi Botros, Rohit Prakash Prabhavalkar, Johan Schalkwyk, Tara N. Sainath, Ciprian Ioan Chelba, Francoise Beaufays
System and method for building knowledgebase of dependencies between configuration items

Patent number: 11954503

Abstract: The present invention provides for building a knowledgebase of dependencies between Configuration Items(CIs) associated with IT computing environment. In operation, the present invention provides for mapping a plurality of Configuration Items(CI) with respective one or more actions. The present invention further provides for tracking and capturing of one or more actions performed on one or more CIs in relation to resolving an activity related to a reported CI. Further, the present invention provides for identifying dependencies between one or more CIs and the reported CI based on the captured one or more actions. Furthermore, the present invention provides for building a knowledgebase of dependencies between CIs of the computing environment based on the identified dependencies between one or more CIs and the reported CI. Yet further, the present invention provides for generating visual representations of dependencies between CIs.

Type: Grant

Filed: May 12, 2022

Date of Patent: April 9, 2024

Assignee: COGNIZANT TECHNOLOGY SOLUTIONS INDIA PVT. LTD.

Inventors: Rohit Prakash, Rohan Prakash, Yogesh Sosale Gundurao, Ambarish Poojari, Ragini Suresh, Pooja Jagadish
Key phrase spotting

Patent number: 11948570

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting utterances of a key phrase in an audio signal. One of the methods includes receiving, by a key phrase spotting system, an audio signal encoding one or more utterances; while continuing to receive the audio signal, generating, by the key phrase spotting system, an attention output using an attention mechanism that is configured to compute the attention output based on a series of encodings generated by an encoder comprising one or more neural network layers; generating, by the key phrase spotting system and using attention output, output that indicates whether the audio signal likely encodes the key phrase; and providing, by the key phrase spotting system, the output that indicates whether the audio signal likely encodes the key phrase.

Type: Grant

Filed: March 9, 2022

Date of Patent: April 2, 2024

Assignee: Google LLC

Inventors: Wei Li, Rohit Prakash Prabhavalkar, Kanury Kanishka Rao, Yanzhang He, Ian C. Mcgraw, Anton Bakhtin
Compressed recurrent neural network models

Patent number: 11948062

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing a compressed recurrent neural network (RNN). One of the systems includes a compressed RNN, the compressed RNN comprising a plurality of recurrent layers, wherein each of the recurrent layers has a respective recurrent weight matrix and a respective inter-layer weight matrix, and wherein at least one of recurrent layers is compressed such that a respective recurrent weight matrix of the compressed layer is defined by a first compressed weight matrix and a projection matrix and a respective inter-layer weight matrix of the compressed layer is defined by a second compressed weight matrix and the projection matrix.

Type: Grant

Filed: December 4, 2020

Date of Patent: April 2, 2024

Assignee: Google LLC

Inventors: Ouais Alsharif, Rohit Prakash Prabhavalkar, Ian C. McGraw, Antoine Jean Bruguier

1 2 3 4 next