Patents by Inventor Rami Botros

Rami Botros has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

TIED AND REDUCED RNN-T

Publication number: 20230352006

Abstract: A RNN-T model includes a prediction network configured to, at each of a plurality of times steps subsequent to an initial time step, receive a sequence of non-blank symbols. For each non-blank symbol the prediction network is also configured to generate, using a shared embedding matrix, an embedding of the corresponding non-blank symbol, assign a respective position vector to the corresponding non-blank symbol, and weight the embedding proportional to a similarity between the embedding and the respective position vector. The prediction network is also configured to generate a single embedding vector at the corresponding time step. The RNN-T model also includes a joint network configured to, at each of the plurality of time steps subsequent to the initial time step, receive the single embedding vector generated as output from the prediction network at the corresponding time step and generate a probability distribution over possible speech recognition hypotheses.

Type: Application

Filed: July 6, 2023

Publication date: November 2, 2023

Applicant: Google LLC

Inventors: Rami Botros, Tara Sainath
EFFICIENT STREAMING NON-RECURRENT ON-DEVICE END-TO-END MODEL

Publication number: 20230343328

Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

Type: Application

Filed: June 16, 2023

Publication date: October 26, 2023

Applicant: Google LLC

Inventors: Tara Sainath, Arun Narayanan, Rami Botros, Yanzhang He, Ehsan Variani, Cyril Allauzen, David Rybach, Ruoming Pang, Trevor Strohman
Unified Cascaded Encoder ASR model for Dynamic Model Sizes

Publication number: 20230326461

Abstract: An automated speech recognition (ASR) model includes a first encoder, a first encoder, a second encoder, and a second decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The first decoder receives, as input, the first higher order feature representation generated by the first encoder, and generates a first probability distribution over possible speech recognition hypotheses. The second encoder receives, as input, the first higher order feature representation generated by the first encoder, and generates a second higher order feature representation for a corresponding first higher order feature frame. The second decoder receives, as input, the second higher order feature representation generated by the second encoder, and generates a second probability distribution over possible speech recognition hypotheses.

Type: Application

Filed: March 13, 2023

Publication date: October 12, 2023

Applicant: Google LLC

Inventors: Shaojin Ding, Yangzhang He, Xin Wang, Weiran Wang, Trevor Strohman, Tara N. Sainath, Rohit Parkash Prabhavalkar, Robert David, Rina Panigrahy, Rami Botros, Qiao Liang, Ian Mcgraw, Ding Zhao, Dongseong Hwang
Tied and reduced RNN-T

Patent number: 11727920

Abstract: A RNN-T model includes a prediction network configured to, at each of a plurality of times steps subsequent to an initial time step, receive a sequence of non-blank symbols. For each non-blank symbol the prediction network is also configured to generate, using a shared embedding matrix, an embedding of the corresponding non-blank symbol, assign a respective position vector to the corresponding non-blank symbol, and weight the embedding proportional to a similarity between the embedding and the respective position vector. The prediction network is also configured to generate a single embedding vector at the corresponding time step. The RNN-T model also includes a joint network configured to, at each of the plurality of time steps subsequent to the initial time step, receive the single embedding vector generated as output from the prediction network at the corresponding time step and generate a probability distribution over possible speech recognition hypotheses.

Type: Grant

Filed: May 26, 2021

Date of Patent: August 15, 2023

Assignee: Google LLC

Inventors: Rami Botros, Tara Sainath
Efficient streaming non-recurrent on-device end-to-end model

Patent number: 11715458

Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

Type: Grant

Filed: May 10, 2021

Date of Patent: August 1, 2023

Assignee: Google LLC

Inventors: Tara Sainath, Arun Narayanan, Rami Botros, Yanzhang He, Ehsan Variani, Cyril Allauzen, David Rybach, Ruoming Pang, Trevor Strohman
Optimizing Inference Performance for Conformer

Publication number: 20230130634

Abstract: A computer-implemented method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. Here, the ASR model includes a causal encoder and a decoder. The method also includes generating, by the causal encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by the decoder, a first probability distribution over possible speech recognition hypotheses. Here, the causal encoder includes a stack of causal encoder layers each including a Recurrent Neural Network (RNN) Attention-Performer module that applies linear attention.

Type: Application

Filed: September 29, 2022

Publication date: April 27, 2023

Applicant: Google LLC

Inventors: Tara N. Sainath, Rami Botros, Anmol Gulati, Krzysztof Choromanski, Ruoming Pang, Trevor Strohman, Weiran Wang, Jiahui Yu
Tied and Reduced RNN-T

Publication number: 20220310071

Abstract: A RNN-T model includes a prediction network configured to, at each of a plurality of times steps subsequent to an initial time step, receive a sequence of non-blank symbols. For each non-blank symbol the prediction network is also configured to generate, using a shared embedding matrix, an embedding of the corresponding non-blank symbol, assign a respective position vector to the corresponding non-blank symbol, and weight the embedding proportional to a similarity between the embedding and the respective position vector. The prediction network is also configured to generate a single embedding vector at the corresponding time step. The RNN-T model also includes a joint network configured to, at each of the plurality of time steps subsequent to the initial time step, receive the single embedding vector generated as output from the prediction network at the corresponding time step and generate a probability distribution over possible speech recognition hypotheses.

Type: Application

Filed: May 26, 2021

Publication date: September 29, 2022

Applicant: Google LLC

Inventors: Rami Botros, Tara Sainath
Efficient Streaming Non-Recurrent On-Device End-to-End Model

Publication number: 20220310062

Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

Type: Application

Filed: May 10, 2021

Publication date: September 29, 2022

Applicant: Google LLC

Inventors: Tara Sainath, Arun Narayanan, Rami Botros, Yangzhang He, Ehsan Variani, Cyrill Allauzen, David Rybach, Ruorning Pang, Trevor Strohman
System and method for ranking of hybrid speech recognition results with neural networks

Patent number: 10170110

Abstract: A method for ranking candidate speech recognition results includes generating, with a controller, a plurality of feature vectors for the candidate speech recognition results, each feature vector including one or more of trigger pair features, a confidence score feature, and word-level features. The method further includes providing the plurality of feature vectors as inputs to a neural network, generating a plurality of ranking scores corresponding to the plurality of feature vectors for the plurality of candidate speech recognition results based on an output layer of the neural network, and operating the automated system using the candidate speech recognition result in the plurality of candidate speech recognition results corresponding to a highest ranking score in the plurality of ranking scores as input.

Type: Grant

Filed: November 17, 2016

Date of Patent: January 1, 2019

Assignee: Robert Bosch GmbH

Inventors: Zhengyu Zhou, Rami Botros
System And Method For Ranking of Hybrid Speech Recognition Results With Neural Networks

Publication number: 20180137857

Abstract: A method for ranking candidate speech recognition results includes generating, with a controller, a plurality of feature vectors for the candidate speech recognition results, each feature vector including one or more of trigger pair features, a confidence score feature, and word-level features. The method further includes providing the plurality of feature vectors as inputs to a neural network, generating a plurality of ranking scores corresponding to the plurality of feature vectors for the plurality of candidate speech recognition results based on an output layer of the neural network, and operating the automated system using the candidate speech recognition result in the plurality of candidate speech recognition results corresponding to a highest ranking score in the plurality of ranking scores as input.

Type: Application

Filed: November 17, 2016

Publication date: May 17, 2018

Applicant: Robert Bosch GmbH

Inventors: Zhengyu Zhou, Rami Botros