Patents by Inventor Raziel Alvarez Guevara

Raziel Alvarez Guevara has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

End-to-End Streaming Keyword Spotting

Publication number: 20240242711

Abstract: A method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. The method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. Each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (SVDF) layers. The method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. For each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.

Type: Application

Filed: March 27, 2024

Publication date: July 18, 2024

Applicant: Google LLC

Inventors: Raziel Alvarez Guevara, Hyun Jin Park, Patrick Violette
END-TO-END STREAMING KEYWORD SPOTTING

Publication number: 20240177708

Abstract: A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.

Type: Application

Filed: February 5, 2024

Publication date: May 30, 2024

Applicant: Google LLC

Inventors: Raziel Alvarez Guevara, Hyun Jin Park
End-to-end streaming keyword spotting

Patent number: 11967310

Abstract: A method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. The method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. Each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (SVDF) layers. The method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. For each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.

Type: Grant

Filed: May 23, 2023

Date of Patent: April 23, 2024

Assignee: Google LLC

Inventors: Raziel Alvarez Guevara, Hyun Jin Park, Patrick Violette
Speaker verification using co-location information

Patent number: 11942095

Abstract: A computer-implemented method that includes receiving audio data corresponding to an utterance of a voice command captured by a user device. The user device has a plurality of different users. The method includes determining a particular user among the plurality of different users of the user device as a speaker of the utterance based on a comparison between the audio data and corresponding speaker verification data stored on memory hardware for each user of the plurality of different users of the user device. The method further includes, based on determining the particular user among the plurality of different users of the user device as the speaker of the utterance, providing, for output from the user device, a message comprising a speaker identifier associated with the particular user.

Type: Grant

Filed: May 1, 2023

Date of Patent: March 26, 2024

Assignee: Google LLC

Inventors: Raziel Alvarez Guevara, Othar Hansson
End-to-end streaming keyword spotting

Patent number: 11929064

Abstract: A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.

Type: Grant

Filed: January 9, 2023

Date of Patent: March 12, 2024

Assignee: Google LLC

Inventors: Raziel Alvarez Guevara, Hyun Jin Park
End-to-End Streaming Keyword Spotting

Publication number: 20230298576

Abstract: A method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. The method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. Each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (SVDF) layers. The method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. For each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.

Type: Application

Filed: May 23, 2023

Publication date: September 21, 2023

Applicant: Google LLC

Inventors: Raziel Alvarez Guevara, Hyun Jin Park, Patrick Violette
SPEAKER VERIFICATION USING CO-LOCATION INFORMATION

Publication number: 20230267935

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying a user in a multi-user environment. One of the methods includes receiving, by a first user device, an audio signal encoding an utterance, obtaining, by the first user device, a first speaker model for a first user of the first user device, obtaining, by the first user device for a second user of a second user device that is co-located with the first user device, a second speaker model for the second user or a second score that indicates a respective likelihood that the utterance was spoken by the second user, and determining, by the first user device, that the utterance was spoken by the first user using (i) the first speaker model and the second speaker model or (ii) the first speaker model and the second score.

Type: Application

Filed: May 1, 2023

Publication date: August 24, 2023

Applicant: Google LLC

Inventors: Raziel Alvarez Guevara, Othar Hansson
End-to-end streaming keyword spotting

Patent number: 11682385

Abstract: A method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. The method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. Each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (SVDF) layers. The method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. For each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.

Type: Grant

Filed: June 15, 2021

Date of Patent: June 20, 2023

Assignee: Google LLC

Inventors: Raziel Alvarez Guevara, Hyun Jin Park, Patrick Violette
Speaker verification using co-location information

Patent number: 11676608

Abstract: A method includes generating an audio signal encoding an utterance captured by a microphone of a user device and transmitting the audio signal encoding the utterance to a server. The server is configured to determine a speaker of the utterance from one of a plurality of different users of the user device based on a comparison between the audio signal encoding the utterance and corresponding speaker verification data, and process the audio signal encoding the utterance using a speech recognition module to identify a particular action. The method also includes executing the particular action identified by the server to cause a particular application to launch on the user device based on user permissions associated with the speaker determined by the server to access the particular data.

Type: Grant

Filed: April 2, 2021

Date of Patent: June 13, 2023

Assignee: Google LLC

Inventors: Raziel Alvarez Guevara, Othar Hansson
End-to-end streaming keyword spotting

Patent number: 11557282

Abstract: A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.

Type: Grant

Filed: January 21, 2021

Date of Patent: January 17, 2023

Assignee: Google LLC

Inventors: Raziel Alvarez Guevara, Hyun Jin Park
SPEAKER VERIFICATION USING CO-LOCATION INFORMATION

Publication number: 20220319522

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying a user in a multi-user environment. One of the methods includes receiving, by a first user device, an audio signal encoding an utterance, obtaining, by the first user device, a first speaker model for a first user of the first user device, obtaining, by the first user device for a second user of a second user device that is co-located with the first user device, a second speaker model for the second user or a second score that indicates a respective likelihood that the utterance was spoken by the second user, and determining, by the first user device, that the utterance was spoken by the first user using (i) the first speaker model and the second speaker model or (ii) the first speaker model and the second score.

Type: Application

Filed: April 2, 2021

Publication date: October 6, 2022

Applicant: Google LLC

Inventors: Raziel Alvarez Guevara, Othar Hansson
QUANTIZING TRAINED LONG SHORT-TERM MEMORY NEURAL NETWORKS

Publication number: 20220036155

Abstract: Method for quantizing a trained long short-term memory (LSTM) neural network having a plurality of weights, the method comprising: obtaining data specifying trained floating-point values for each of the weights of the trained LSTM neural network, the trained LSTM neural network comprising one or more LSTM layers, each LSTM layer having a plurality of gates and each of the plurality of gates being associated with an input weight matrix and a recurrent weight matrix; quantizing the trained LSTM neural network, comprising: for each gate, quantizing the elements of the input weight matrix to a target fixed bit-width; for each gate, quantizing the elements of the recurrent weight matrix to the target fixed bit-width; and providing data specifying a quantized LSTM neural network for use in performing quantized inference.

Type: Application

Filed: October 30, 2019

Publication date: February 3, 2022

Applicant: Google LLC

Inventor: Raziel Alvarez Guevara
End-to-End Streaming Keyword Spotting

Publication number: 20210312913

Abstract: A method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. The method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. Each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (SVDF) layers. The method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. For each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.

Type: Application

Filed: June 15, 2021

Publication date: October 7, 2021

Applicant: Google LLC

Inventors: Raziel Alvarez Guevara, Hyun Jin Park, Patrick Violette
End-to-end streaming keyword spotting

Patent number: 11056101

Abstract: A method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. The method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. Each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (SVDF) layers. The method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. For each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.

Type: Grant

Filed: December 10, 2019

Date of Patent: July 6, 2021

Assignee: Google LLC

Inventors: Raziel Alvarez Guevara, Hyun Jin Park, Patrick Violette
End-to-End Streaming Keyword Spotting

Publication number: 20210142790

Abstract: A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.

Type: Application

Filed: January 21, 2021

Publication date: May 13, 2021

Applicant: Google LLC

Inventors: Raziel Alvarez Guevara, Hyun Jin Park
Speaker verification using co-location information

Patent number: 10986498

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying a user in a multi-user environment. One of the methods includes receiving, by a first user device, an audio signal encoding an utterance, obtaining, by the first user device, a first speaker model for a first user of the first user device, obtaining, by the first user device for a second user of a second user device that is co-located with the first user device, a second speaker model for the second user or a second score that indicates a respective likelihood that the utterance was spoken by the second user, and determining, by the first user device, that the utterance was spoken by the first user using (i) the first speaker model and the second speaker model or (ii) the first speaker model and the second score.

Type: Grant

Filed: September 17, 2019

Date of Patent: April 20, 2021

Assignee: Google LLC

Inventors: Raziel Alvarez Guevara, Othar Hansson
End-to-end streaming keyword spotting

Patent number: 10930269

Abstract: A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.

Type: Grant

Filed: June 13, 2019

Date of Patent: February 23, 2021

Assignee: Google LLC

Inventors: Raziel Alvarez Guevara, Hyun Jin Park
End-to-End Streaming Keyword Spotting

Publication number: 20200126537

Abstract: A method for training hotword detection includes receiving a training input audio sequence including a sequence of input frames that define a hotword that initiates a wake-up process on a device. The method also includes feeding the training input audio sequence into an encoder and a decoder of a memorized neural network. Each of the encoder and the decoder of the memorized neural network include sequentially-stacked single value decomposition filter (SVDF) layers. The method further includes generating a logit at each of the encoder and the decoder based on the training input audio sequence. For each of the encoder and the decoder, the method includes smoothing each respective logit generated from the training input audio sequence, determining a max pooling loss from a probability distribution based on each respective logit, and optimizing the encoder and the decoder based on all max pooling losses associated with the training input audio sequence.

Type: Application

Filed: December 10, 2019

Publication date: April 23, 2020

Applicant: Google LLC

Inventors: Raziel Alvarez Guevara, Hyun Jin Park, Patrick Violette
End-to-End Streaming Keyword Spotting

Publication number: 20200020322

Abstract: A method for detecting a hotword includes receiving a sequence of input frames that characterize streaming audio captured by a user device and generating a probability score indicating a presence of a hotword in the streaming audio using a memorized neural network. The network includes sequentially-stacked single value decomposition filter (SVDF) layers and each SVDF layer includes at least one neuron. Each neuron includes a respective memory component, a first stage configured to perform filtering on audio features of each input frame individually and output to the memory component, and a second stage configured to perform filtering on all the filtered audio features residing in the respective memory component. The method also includes determining whether the probability score satisfies a hotword detection threshold and initiating a wake-up process on the user device for processing additional terms.

Type: Application

Filed: June 13, 2019

Publication date: January 16, 2020

Applicant: Google LLC

Inventors: Raziel Alvarez Guevara, Hyun Jin Park
Individualized hotword detection models

Patent number: 10535354

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for presenting notifications in an enterprise system. In one aspect, a method include actions of obtaining enrollment acoustic data representing an enrollment utterance spoken by a user, obtaining a set of candidate acoustic data representing utterances spoken by other users, determining, for each candidate acoustic data of the set of candidate acoustic data, a similarity score that represents a similarity between the enrollment acoustic data and the candidate acoustic data, selecting a subset of candidate acoustic data from the set of candidate acoustic data based at least on the similarity scores, generating a detection model based on the subset of candidate acoustic data, and providing the detection model for use in detecting an utterance spoken by the user.

Type: Grant

Filed: June 29, 2016

Date of Patent: January 14, 2020

Assignee: Google LLC

Inventor: Raziel Alvarez Guevara

1 2 next