Patents by Inventor Trevor Strohman

Trevor Strohman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Optimizing Inference Performance for Conformer

Publication number: 20230130634

Abstract: A computer-implemented method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. Here, the ASR model includes a causal encoder and a decoder. The method also includes generating, by the causal encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by the decoder, a first probability distribution over possible speech recognition hypotheses. Here, the causal encoder includes a stack of causal encoder layers each including a Recurrent Neural Network (RNN) Attention-Performer module that applies linear attention.

Type: Application

Filed: September 29, 2022

Publication date: April 27, 2023

Applicant: Google LLC

Inventors: Tara N. Sainath, Rami Botros, Anmol Gulati, Krzysztof Choromanski, Ruoming Pang, Trevor Strohman, Weiran Wang, Jiahui Yu
Transducer-Based Streaming Deliberation for Cascaded Encoders

Publication number: 20230109407

Abstract: A method includes receiving a sequence of acoustic frames and generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by a first pass transducer decoder, a first pass speech recognition hypothesis for a corresponding first higher order feature representation and generating, by a text encoder, a text encoding for a corresponding first pass speech recognition hypothesis. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a second pass transducer decoder, a second pass speech recognition hypothesis using a corresponding second higher order feature representation and a corresponding text encoding.

Type: Application

Filed: September 19, 2022

Publication date: April 6, 2023

Applicant: Google LLC

Inventors: Ke Hu, Tara N. Sainath, Arun Narayanan, Ruoming Pang, Trevor Strohman
Disfluency Detection Models for Natural Conversational Voice Systems

Publication number: 20230107450

Abstract: A method includes receiving a sequence of acoustic frames characterizing one or more utterances. At each of a plurality of output steps, the method also includes generating, by an encoder network of a speech recognition model, a higher order feature representation for a corresponding acoustic frame of the sequence of acoustic frames, generating, by a prediction network of the speech recognition model, a hidden representation for a corresponding sequence of non-blank symbols output by a final softmax layer of the speech recognition model, and generating, by a first joint network of the speech recognition model that receives the higher order feature representation generated by the encoder network and the dense representation generated by the prediction network, a probability distribution that the corresponding time step corresponds to a pause and an end of speech.

Type: Application

Filed: August 26, 2022

Publication date: April 6, 2023

Applicant: Google LLC

Inventors: Shuo-yiin Chang, Bo Li, Tara N. Sainath, Trevor Strohman, Chao Zhang
Language Agnostic Multilingual End-To-End Streaming On-Device ASR System

Publication number: 20230108275

Abstract: A method includes receiving a sequence of acoustic frames characterizing one or more utterances as input to a multilingual automated speech recognition (ASR) model. The method also includes generating a higher order feature representation for a corresponding acoustic frame. The method also includes generating a hidden representation based on a sequence of non-blank symbols output by a final softmax layer. The method also includes generating a probability distribution over possible speech recognition hypotheses based on the hidden representation generated by the prediction network at each of the plurality of output steps and the higher order feature representation generated by the encoder at each of the plurality of output steps. The method also includes predicting an end of utterance (EOU) token at an end of each utterance. The method also includes classifying each acoustic frame as either speech, initial silence, intermediate silence, or final silence.

Type: Application

Filed: September 22, 2022

Publication date: April 6, 2023

Applicant: Google LLC

Inventors: Bo Li, Tara N. Sainath, Ruoming Pang, Shuo-yin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani
TRAINING FOR LONG-FORM SPEECH RECOGNITION

Publication number: 20230103382

Abstract: A method includes obtaining a set of training samples, wherein each training sample includes a corresponding sequence of speech segments corresponding to a training utterance and a corresponding sequence of ground-truth transcriptions for the sequence of speech segments, and wherein each ground-truth transcription includes a start time and an end time of a corresponding speech segment. For each training sample in the set of training samples, the method includes processing, using a speech recognition model, the corresponding sequence of speech segments to obtain one or more speech recognition hypotheses for the training utterance; and, for each speech recognition hypothesis obtained for the training utterance, identifying a respective number of word errors relative to the corresponding sequence of ground-truth transcriptions.

Type: Application

Filed: September 27, 2022

Publication date: April 6, 2023

Applicant: Google LLC

Inventors: Zhiyun Lu, Thibault Doutre, Yanwei Pan, Liangliang Cao, Rohit Prabhavalkar, Trevor Strohman, Chao Zhang
Attention-based joint acoustic and text on-device end-to-end model

Patent number: 11594212

Abstract: A method includes receiving a training example for a listen-attend-spell (LAS) decoder of a two-pass streaming neural network model and determining whether the training example corresponds to a supervised audio-text pair or an unpaired text sequence. When the training example corresponds to an unpaired text sequence, the method also includes determining a cross entropy loss based on a log probability associated with a context vector of the training example. The method also includes updating the LAS decoder and the context vector based on the determined cross entropy loss.

Type: Grant

Filed: January 21, 2021

Date of Patent: February 28, 2023

Assignee: Google LLC

Inventors: Tara N. Sainath, Ruoming Pang, Ron Weiss, Yanzhang He, Chung-Cheng Chiu, Trevor Strohman
ENABLING NATURAL CONVERSATIONS WITH SOFT ENDPOINTING FOR AN AUTOMATED ASSISTANT

Publication number: 20230053341

Abstract: As part of a dialog session between a user and an automated assistant, implementations can process, using a streaming ASR model, a stream of audio data that captures a portion of a spoken utterance to generate ASR output, process, using an NLU model, the ASR output to generate NLU output, and cause, based on the NLU output, a stream of fulfillment data to be generated. Further, implementations can further determine, based on processing the stream of audio data, audio-based characteristics associated with the portion of the spoken utterance captured in the stream of audio data. Based on the audio-based characteristics and/the stream of NLU output, implementations can determine whether the user has paused in providing the spoken utterance or has completed providing of the spoken utterance. If the user has paused, implementations can cause natural conversation output to be provided for presentation to the user.

Type: Application

Filed: November 22, 2021

Publication date: February 23, 2023

Inventors: Jaclyn Konzelmann, Trevor Strohman, Jonathan Bloom, Johan Schalkwyk, Joseph Smarr
Emitting word timings with end-to-end models

Patent number: 11580956

Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

Type: Grant

Filed: March 17, 2021

Date of Patent: February 14, 2023

Assignee: Google LLC

Inventors: Tara N. Sainath, Basi Garcia, David Rybach, Trevor Strohman, Ruoming Pang
METHODS AND SYSTEMS FOR REDUCING LATENCY IN AUTOMATED ASSISTANT INTERACTIONS

Publication number: 20220351720

Abstract: Implementations described herein relate to reducing latency in automated assistant interactions. In some implementations, a client device can receive audio data that captures a spoken utterance of a user. The audio data can be processed to determine an assistant command to be performed by an automated assistant. The assistant command can be processed, using a latency prediction model, to generate a predicted latency to fulfill the assistant command. Further, the client device (or the automated assistant) can determine, based on the predicted latency, whether to audibly render pre-cached content for presentation to the user prior to audibly rendering content that is responsive to the spoken utterance. The pre-cached content can be tailored to the assistant command and audibly rendered for presentation to the user while the content is being obtained, and the content can be audibly rendered for presentation to the user subsequent to the pre-cached content.

Type: Application

Filed: April 28, 2021

Publication date: November 3, 2022

Inventors: Lior Alon, Rafael Goldfarb, Dekel Auster, Dan Rasin, Michael Andrew Goodman, Trevor Strohman, Nino Tasca, Valerie Nygaard, Jaclyn Konzelmann
Efficient Streaming Non-Recurrent On-Device End-to-End Model

Publication number: 20220310062

Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

Type: Application

Filed: May 10, 2021

Publication date: September 29, 2022

Applicant: Google LLC

Inventors: Tara Sainath, Arun Narayanan, Rami Botros, Yangzhang He, Ehsan Variani, Cyrill Allauzen, David Rybach, Ruorning Pang, Trevor Strohman
TWO-PASS END TO END SPEECH RECOGNITION

Publication number: 20220310072

Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

Type: Application

Filed: June 3, 2020

Publication date: September 29, 2022

Inventors: Tara N. Sainath, Ruoming Pang, David Rybach, Yanzhang He, Rohit Prabhavalkar, Wei Li, Mirkó Visontai, Qiao Liang, Trevor Strohman, Yonghui Wu, Ian C. McGraw, Chung-Cheng Chiu
Lookup-Table Recurrent Language Model

Publication number: 20220310067

Abstract: A computer-implemented method includes receiving audio data that corresponds to an utterance spoken by a user and captured by a user device. The method also includes processing the audio data to determine a candidate transcription that includes a sequence of tokens for the spoken utterance. Tor each token in the sequence of tokens, the method includes determining a token embedding for corresponding token, determining a n-gram token embedding for a previous sequence of n-gram tokens, and concatenating the token embedding and the n-gram token embedding to generate a concatenated output for the corresponding token. The method also includes rescoring the candidate transcription for the spoken utterance by processing the concatenated output generated for each corresponding token in the sequence of tokens.

Type: Application

Filed: February 10, 2022

Publication date: September 29, 2022

Applicant: Google LLC

Inventors: Ronny Huang, Tara N. Sainath, Trevor Strohman, Shankar Kumar
Cascaded Encoders for Simplified Streaming and Non-Streaming ASR

Publication number: 20220122622

Abstract: An automated speech recognition (ASR) model includes a first encoder, a second encoder, and a decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The second encoder receives, as input, the first higher order feature representation generated by the first encoder at each of the plurality of output steps, and generates, at each of the plurality of output steps, a second higher order feature representation for a corresponding first higher order feature frame. The decoder receives, as input, the second higher order feature representation generated by the second encoder at each of the plurality of output steps, and generates, at each of the plurality of time steps, a first probability distribution over possible speech recognition hypotheses.

Type: Application

Filed: April 21, 2021

Publication date: April 21, 2022

Applicant: Google LLC

Inventors: Arun Narayanan, Tara Sainath, Chung-Cheng Chiu, Ruoming Pang, Rohit Prabhavalkar, Jiahui Yu, Ehsan Variani, Trevor Strohman
Emitting Word Timings with End-to-End Models

Publication number: 20210350794

Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

Type: Application

Filed: March 17, 2021

Publication date: November 11, 2021

Applicant: Google LLC

Inventors: Tara N. Sainath, Basi Garcia, David Rybach, Trevor Strohman, Ruoming Pang
Attention-Based Joint Acoustic and Text On-Device End-to-End Model

Publication number: 20210225362

Abstract: A method includes receiving a training example for a listen-attend-spell (LAS) decoder of a two-pass streaming neural network model and determining whether the training example corresponds to a supervised audio-text pair or an unpaired text sequence. When the training example corresponds to an unpaired text sequence, the method also includes determining a cross entropy loss based on a log probability associated with a context vector of the training example. The method also includes updating the LAS decoder and the context vector based on the determined cross entropy loss.

Type: Application

Filed: January 21, 2021

Publication date: July 22, 2021

Applicant: Google LLC

Inventors: Tara N. Sainath, Ruorning Pang, Ron Weiss, Yanzhang He, Chung-Cheng Chiu, Trevor Strohman
Selecting data collections to search based on the query

Patent number: 8892597

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for receiving a new query that is not in a query cache, the new query including one or more query terms; determining that the new query is a variant of a cached query in the query cache; in response to determining that the new query is a variant of the cached query, obtaining a first probability, the first probability indicating a likelihood that a collection of resources includes resources that satisfy the cached query; calculating a second probability, the second probability being a probability that the one or more query terms in the new query name an entity or are a phrase; calculating a third probability, the third probability being a probability that the cached query is a specific query; and determining, based on the first, second, and third probabilities, whether to search the collection of resources.

Type: Grant

Filed: December 21, 2012

Date of Patent: November 18, 2014

Assignee: Google Inc.

Inventor: Trevor Strohman
Automated resource selection process evaluation

Patent number: 8489604

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for evaluating resource selection processes. One method includes receiving test queries and generating a first group of resources corresponding to a first automated resource selection process and generating a second group of resources corresponding to a second automated resource selection process for each query. Another method includes generating a query results table for use in generating the groups of resources. The query results table maps queries to resources matched to the queries, and maps each resource to a respective score for the resource and the query, and one or more index selection signals for the resource.

Type: Grant

Filed: October 26, 2010

Date of Patent: July 16, 2013

Assignee: Google Inc.

Inventors: Adam Sadovsky, Paul Haahr, Trevor Strohman, Per Bjornsson, Jun Xu, Gabriel Schine, Jay Shrauner
QUERY STEM ADVERTISING

Publication number: 20120047025

Abstract: Methods, systems, and apparatus, including computer program products, for serving advertisements responsive to partial queries. In an aspect, a method includes receiving stem bids for words stems, each stem bid being a bid for a corresponding word stem and corresponding to a price an advertiser pays for display of an advertisement targeted to the corresponding word stem, and wherein the targeting to the corresponding word stem is independent of keyword targeting; receiving a query stem from a client device; in response to receiving the query stem: identifying word stems that match the query stem, providing the corresponding stem bids of the matching word stems as bids to an advertisement auction for advertisement slots for displaying advertisements, and receiving selected advertisements that are determined to have won an advertisement slot in the auction; and providing the selected advertisements for display in the advertisement slots on the client device.

Type: Application

Filed: August 19, 2011

Publication date: February 23, 2012

Applicant: Google Inc.

Inventor: Trevor Strohman
PREDICTIVE SEARCHING AND ASSOCIATED CACHE MANAGEMENT

Publication number: 20100318538

Abstract: A computer system including instructions stored on a computer-readable medium, may include a query manager configured to manage a query corpus including at least one predictive query, and a document manager configured to receive a plurality of documents from at least one document source, and configured to manage a document corpus including at least one document obtained from the at least one document source. The computer system also may include a predictive result manager configured to associate the at least one document with the at least one predictive query to obtain a predictive search result, and configured to update a predictive cache using the predictive search result, and may include a search engine configured to access the predictive cache to associate a received query with the predictive search result, and configured to provide the predictive search result as a search result of the received query, the search result including the at least one document.

Type: Application

Filed: June 12, 2009

Publication date: December 16, 2010

Applicant: Google Inc.

Inventors: Robert M. Wyman, Trevor Strohman, Paul Haahr, Laramie Leavitt, John Sarapata

prev 1 2 3