Patents by Inventor Trevor Strohman
Trevor Strohman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230107450Abstract: A method includes receiving a sequence of acoustic frames characterizing one or more utterances. At each of a plurality of output steps, the method also includes generating, by an encoder network of a speech recognition model, a higher order feature representation for a corresponding acoustic frame of the sequence of acoustic frames, generating, by a prediction network of the speech recognition model, a hidden representation for a corresponding sequence of non-blank symbols output by a final softmax layer of the speech recognition model, and generating, by a first joint network of the speech recognition model that receives the higher order feature representation generated by the encoder network and the dense representation generated by the prediction network, a probability distribution that the corresponding time step corresponds to a pause and an end of speech.Type: ApplicationFiled: August 26, 2022Publication date: April 6, 2023Applicant: Google LLCInventors: Shuo-yiin Chang, Bo Li, Tara N. Sainath, Trevor Strohman, Chao Zhang
-
Publication number: 20230103382Abstract: A method includes obtaining a set of training samples, wherein each training sample includes a corresponding sequence of speech segments corresponding to a training utterance and a corresponding sequence of ground-truth transcriptions for the sequence of speech segments, and wherein each ground-truth transcription includes a start time and an end time of a corresponding speech segment. For each training sample in the set of training samples, the method includes processing, using a speech recognition model, the corresponding sequence of speech segments to obtain one or more speech recognition hypotheses for the training utterance; and, for each speech recognition hypothesis obtained for the training utterance, identifying a respective number of word errors relative to the corresponding sequence of ground-truth transcriptions.Type: ApplicationFiled: September 27, 2022Publication date: April 6, 2023Applicant: Google LLCInventors: Zhiyun Lu, Thibault Doutre, Yanwei Pan, Liangliang Cao, Rohit Prabhavalkar, Trevor Strohman, Chao Zhang
-
Patent number: 11594212Abstract: A method includes receiving a training example for a listen-attend-spell (LAS) decoder of a two-pass streaming neural network model and determining whether the training example corresponds to a supervised audio-text pair or an unpaired text sequence. When the training example corresponds to an unpaired text sequence, the method also includes determining a cross entropy loss based on a log probability associated with a context vector of the training example. The method also includes updating the LAS decoder and the context vector based on the determined cross entropy loss.Type: GrantFiled: January 21, 2021Date of Patent: February 28, 2023Assignee: Google LLCInventors: Tara N. Sainath, Ruoming Pang, Ron Weiss, Yanzhang He, Chung-Cheng Chiu, Trevor Strohman
-
Publication number: 20230053341Abstract: As part of a dialog session between a user and an automated assistant, implementations can process, using a streaming ASR model, a stream of audio data that captures a portion of a spoken utterance to generate ASR output, process, using an NLU model, the ASR output to generate NLU output, and cause, based on the NLU output, a stream of fulfillment data to be generated. Further, implementations can further determine, based on processing the stream of audio data, audio-based characteristics associated with the portion of the spoken utterance captured in the stream of audio data. Based on the audio-based characteristics and/the stream of NLU output, implementations can determine whether the user has paused in providing the spoken utterance or has completed providing of the spoken utterance. If the user has paused, implementations can cause natural conversation output to be provided for presentation to the user.Type: ApplicationFiled: November 22, 2021Publication date: February 23, 2023Inventors: Jaclyn Konzelmann, Trevor Strohman, Jonathan Bloom, Johan Schalkwyk, Joseph Smarr
-
Patent number: 11580956Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.Type: GrantFiled: March 17, 2021Date of Patent: February 14, 2023Assignee: Google LLCInventors: Tara N. Sainath, Basi Garcia, David Rybach, Trevor Strohman, Ruoming Pang
-
Publication number: 20220351720Abstract: Implementations described herein relate to reducing latency in automated assistant interactions. In some implementations, a client device can receive audio data that captures a spoken utterance of a user. The audio data can be processed to determine an assistant command to be performed by an automated assistant. The assistant command can be processed, using a latency prediction model, to generate a predicted latency to fulfill the assistant command. Further, the client device (or the automated assistant) can determine, based on the predicted latency, whether to audibly render pre-cached content for presentation to the user prior to audibly rendering content that is responsive to the spoken utterance. The pre-cached content can be tailored to the assistant command and audibly rendered for presentation to the user while the content is being obtained, and the content can be audibly rendered for presentation to the user subsequent to the pre-cached content.Type: ApplicationFiled: April 28, 2021Publication date: November 3, 2022Inventors: Lior Alon, Rafael Goldfarb, Dekel Auster, Dan Rasin, Michael Andrew Goodman, Trevor Strohman, Nino Tasca, Valerie Nygaard, Jaclyn Konzelmann
-
Publication number: 20220310062Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.Type: ApplicationFiled: May 10, 2021Publication date: September 29, 2022Applicant: Google LLCInventors: Tara Sainath, Arun Narayanan, Rami Botros, Yangzhang He, Ehsan Variani, Cyrill Allauzen, David Rybach, Ruorning Pang, Trevor Strohman
-
Publication number: 20220310072Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.Type: ApplicationFiled: June 3, 2020Publication date: September 29, 2022Inventors: Tara N. Sainath, Ruoming Pang, David Rybach, Yanzhang He, Rohit Prabhavalkar, Wei Li, Mirkó Visontai, Qiao Liang, Trevor Strohman, Yonghui Wu, Ian C. McGraw, Chung-Cheng Chiu
-
Publication number: 20220310067Abstract: A computer-implemented method includes receiving audio data that corresponds to an utterance spoken by a user and captured by a user device. The method also includes processing the audio data to determine a candidate transcription that includes a sequence of tokens for the spoken utterance. Tor each token in the sequence of tokens, the method includes determining a token embedding for corresponding token, determining a n-gram token embedding for a previous sequence of n-gram tokens, and concatenating the token embedding and the n-gram token embedding to generate a concatenated output for the corresponding token. The method also includes rescoring the candidate transcription for the spoken utterance by processing the concatenated output generated for each corresponding token in the sequence of tokens.Type: ApplicationFiled: February 10, 2022Publication date: September 29, 2022Applicant: Google LLCInventors: Ronny Huang, Tara N. Sainath, Trevor Strohman, Shankar Kumar
-
Publication number: 20220122622Abstract: An automated speech recognition (ASR) model includes a first encoder, a second encoder, and a decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The second encoder receives, as input, the first higher order feature representation generated by the first encoder at each of the plurality of output steps, and generates, at each of the plurality of output steps, a second higher order feature representation for a corresponding first higher order feature frame. The decoder receives, as input, the second higher order feature representation generated by the second encoder at each of the plurality of output steps, and generates, at each of the plurality of time steps, a first probability distribution over possible speech recognition hypotheses.Type: ApplicationFiled: April 21, 2021Publication date: April 21, 2022Applicant: Google LLCInventors: Arun Narayanan, Tara Sainath, Chung-Cheng Chiu, Ruoming Pang, Rohit Prabhavalkar, Jiahui Yu, Ehsan Variani, Trevor Strohman
-
Publication number: 20210350794Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.Type: ApplicationFiled: March 17, 2021Publication date: November 11, 2021Applicant: Google LLCInventors: Tara N. Sainath, Basi Garcia, David Rybach, Trevor Strohman, Ruoming Pang
-
Publication number: 20210225362Abstract: A method includes receiving a training example for a listen-attend-spell (LAS) decoder of a two-pass streaming neural network model and determining whether the training example corresponds to a supervised audio-text pair or an unpaired text sequence. When the training example corresponds to an unpaired text sequence, the method also includes determining a cross entropy loss based on a log probability associated with a context vector of the training example. The method also includes updating the LAS decoder and the context vector based on the determined cross entropy loss.Type: ApplicationFiled: January 21, 2021Publication date: July 22, 2021Applicant: Google LLCInventors: Tara N. Sainath, Ruorning Pang, Ron Weiss, Yanzhang He, Chung-Cheng Chiu, Trevor Strohman
-
Patent number: 8892597Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for receiving a new query that is not in a query cache, the new query including one or more query terms; determining that the new query is a variant of a cached query in the query cache; in response to determining that the new query is a variant of the cached query, obtaining a first probability, the first probability indicating a likelihood that a collection of resources includes resources that satisfy the cached query; calculating a second probability, the second probability being a probability that the one or more query terms in the new query name an entity or are a phrase; calculating a third probability, the third probability being a probability that the cached query is a specific query; and determining, based on the first, second, and third probabilities, whether to search the collection of resources.Type: GrantFiled: December 21, 2012Date of Patent: November 18, 2014Assignee: Google Inc.Inventor: Trevor Strohman
-
Patent number: 8489604Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for evaluating resource selection processes. One method includes receiving test queries and generating a first group of resources corresponding to a first automated resource selection process and generating a second group of resources corresponding to a second automated resource selection process for each query. Another method includes generating a query results table for use in generating the groups of resources. The query results table maps queries to resources matched to the queries, and maps each resource to a respective score for the resource and the query, and one or more index selection signals for the resource.Type: GrantFiled: October 26, 2010Date of Patent: July 16, 2013Assignee: Google Inc.Inventors: Adam Sadovsky, Paul Haahr, Trevor Strohman, Per Bjornsson, Jun Xu, Gabriel Schine, Jay Shrauner
-
Publication number: 20120047025Abstract: Methods, systems, and apparatus, including computer program products, for serving advertisements responsive to partial queries. In an aspect, a method includes receiving stem bids for words stems, each stem bid being a bid for a corresponding word stem and corresponding to a price an advertiser pays for display of an advertisement targeted to the corresponding word stem, and wherein the targeting to the corresponding word stem is independent of keyword targeting; receiving a query stem from a client device; in response to receiving the query stem: identifying word stems that match the query stem, providing the corresponding stem bids of the matching word stems as bids to an advertisement auction for advertisement slots for displaying advertisements, and receiving selected advertisements that are determined to have won an advertisement slot in the auction; and providing the selected advertisements for display in the advertisement slots on the client device.Type: ApplicationFiled: August 19, 2011Publication date: February 23, 2012Applicant: Google Inc.Inventor: Trevor Strohman
-
Publication number: 20100318538Abstract: A computer system including instructions stored on a computer-readable medium, may include a query manager configured to manage a query corpus including at least one predictive query, and a document manager configured to receive a plurality of documents from at least one document source, and configured to manage a document corpus including at least one document obtained from the at least one document source. The computer system also may include a predictive result manager configured to associate the at least one document with the at least one predictive query to obtain a predictive search result, and configured to update a predictive cache using the predictive search result, and may include a search engine configured to access the predictive cache to associate a received query with the predictive search result, and configured to provide the predictive search result as a search result of the received query, the search result including the at least one document.Type: ApplicationFiled: June 12, 2009Publication date: December 16, 2010Applicant: Google Inc.Inventors: Robert M. Wyman, Trevor Strohman, Paul Haahr, Laramie Leavitt, John Sarapata