Patents by Inventor Guanlong Zhao

Guanlong Zhao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

WORD-LEVEL END-TO-END NEURAL SPEAKER DIARIZATION WITH AUXNET

Publication number: 20250118292

Abstract: A method includes obtaining labeled training data including a plurality of spoken terms spoken during a conversation. For each respective spoken term, the method includes generating a corresponding sequence of intermediate audio encodings from a corresponding sequence of acoustic frames, generating a corresponding sequence of final audio encodings from the corresponding sequence of intermediate audio encodings, generating a corresponding speech recognition result, and generating a respective speaker token representing a predicted identity of a speaker for each corresponding speech recognition result. The method also includes training the joint speech recognition and speaker diarization model jointly based on a first loss derived from the generated speech recognition results and the corresponding transcriptions and a second loss derived from the generated speaker tokens and the corresponding speaker labels.

Type: Application

Filed: September 20, 2024

Publication date: April 10, 2025

Applicant: Google LLC

Inventors: Yiling Huang, Weiran Wang, Quan Wang, Guanlong Zhao, Hank Liao, Han Lu
TARGET SPEAKER KEYWORD SPOTTING

Publication number: 20250078840

Abstract: A method includes receiving audio data corresponding to an utterance spoken by a particular user and captured in streaming audio by a user device. The method also includes performing speaker identification on the audio data to identify an identity of the particular user that spoke the utterance. The method also includes obtaining a keyword detection model personalized for the particular user based on the identity of the particular user that spoke the utterance. The keyword detection model is conditioned on speaker characteristic information associated with the particular user to adapt the keyword detection model to detect a presence of a keyword in audio for the particular user. The method also includes determining that the utterance includes the keyword using the keyword detection model personalized for the particular user.

Type: Application

Filed: August 22, 2024

Publication date: March 6, 2025

Applicant: Google LLC

Inventors: Pai Zhu, Beltrán Labrador Serrano, Guanlong Zhao, Angelo Alfredo Scorza Scarpati, Quan Wang, Alex Seungryong Park, Ignacio Lopez Moreno
EVALUATION-BASED SPEAKER CHANGE DETECTION EVALUATION METRICS

Publication number: 20240135934

Abstract: A method includes obtaining a multi-utterance training sample that includes audio data characterizing utterances spoken by two or more different speakers and obtaining ground-truth speaker change intervals indicating time intervals in the audio data where speaker changes among the two or more different speakers occur. The method also includes processing the audio data to generate a sequence of predicted speaker change tokens using a sequence transduction model. For each corresponding predicted speaker change token, the method includes labeling the corresponding predicted speaker change token as correct when the predicted speaker change token overlaps with one of the ground-truth speaker change intervals. The method also includes determining a precision metric of the sequence transduction model based on a number of the predicted speaker change tokens labeled as correct and a total number of the predicted speaker change tokens in the sequence of predicted speaker change tokens.

Type: Application

Filed: October 9, 2023

Publication date: April 25, 2024

Applicant: Google LLC

Inventors: Guanlong Zhao, Quan Wang, Han Lu, Yiling Huang, Jason Pelecanos
Reference-Free Foreign Accent Conversion System and Method

Publication number: 20230335107

Abstract: Provided herein is a reference-free foreign accent conversion (FAC) computer system and methods for training models, utilizing a library of algorithms, to directly transform utterances from a foreign, non-native speaker (L2) or second language (L2) speaker to have the accent of a native (L1) speaker. The models in the reference-free FAC computer system are a speech-independent acoustic model to extract speaker independent speech embeddings from an L1 speaker utterance and/or the L2 speaker, a speech synthesizer to generate L1 speaker reference-based golden-speaker utterances and a pronunciation correction model to generate a L2 speaker reference-free golden speaker utterances.

Type: Application

Filed: August 24, 2021

Publication date: October 19, 2023

Applicant: The Texas A&M University

Inventors: Guanlong Zhao, Shaojin Ding, Ricardo Gutierrez-Osuna

WORD-LEVEL END-TO-END NEURAL SPEAKER DIARIZATION WITH AUXNET

TARGET SPEAKER KEYWORD SPOTTING

EVALUATION-BASED SPEAKER CHANGE DETECTION EVALUATION METRICS

Reference-Free Foreign Accent Conversion System and Method