Use Of Particular Distance Or Distortion Metric Between Probe Pattern And Reference Templates (epo) Patents (Class 704/E17.008)

Speech based user recognition

Patent number: 11893999

Abstract: Techniques for enrolling a user in a system's user recognition functionality without requiring the user speak particular speech are described. The system may determine characteristics unique to a user input. The system may generate an implicit voice profile from user inputs having similar characteristics. After an implicit voice profile is generated, the system may receive a user input having speech characteristics similar to that of the implicit voice profile. The system may ask the user if the user wants the system to associate the implicit voice profile with a particular user identifier. If the user responds affirmatively, the system may request an identifier of a user profile (e.g., a user name). In response to receiving the user's name, the system may identify a user profile associated with the name and associate the implicit voice profile with the user profile, thereby converting the implicit voice profile into an explicit voice profile.

Type: Grant

Filed: August 6, 2018

Date of Patent: February 6, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Sai Sailesh Kopuri, John Moore, Sundararajan Srinivasan, Aparna Khare, Arindam Mandal, Spyridon Matsoukas, Rohit Prasad
Audio-visual speech separation

Patent number: 11894014

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

Type: Grant

Filed: September 22, 2022

Date of Patent: February 6, 2024

Assignee: Google LLC

Inventors: Inbar Mosseri, Michael Rubinstein, Ariel Ephrat, William Freeman, Oran Lang, Kevin William Wilson, Tali Dekel, Avinatan Hassidim
Generating and/or utilizing voice authentication biasing parameters for assistant devices

Patent number: 11823684

Abstract: Implementations are directed to biasing speaker authentication on a per-user basis and on a device-by-device basis and/or contextual feature(s) basis. In some of those implementations, in performing speaker authentication based on a spoken utterance, different biasing parameters are determined for each of multiple different registered users of an assistant device at which the spoken utterance was detected. In those implementations, each of the biasing parameters can be used to make it more likely or less likely (in dependence of the biasing parameter) that a corresponding registered user will be verified using the speaker authentication. Through utilization of biasing parameter(s) in performing speaker authentication, accuracy and/or robustness of speaker authentication can be increased.

Type: Grant

Filed: November 19, 2020

Date of Patent: November 21, 2023

Assignee: GOOGLE LLC

Inventors: Matthew Sharifi, Victor Carbune
TEXT-DEPENDENT SPEAKER VERIFICATION

Publication number: 20080195389

Abstract: A text-dependent speaker verification technique that uses a generic speaker-independent speech recognizer for robust speaker verification, and uses the acoustical model of a speaker-independent speech recognizer as a background model. Instead of using a likelihood ratio test (LRT) at the utterance level (e.g., the sentence level), which is typical of most speaker verification systems, the present text-dependent speaker verification technique uses weighted sum of likelihood ratios at the sub-unit level (word, tri-phone, or phone) as well as at the utterance level.

Type: Application

Filed: February 12, 2007

Publication date: August 14, 2008

Applicant: Microsoft Corporation

Inventors: Zhengyou Zhang, Amarnag Subramaya

Speech based user recognition

Audio-visual speech separation

Generating and/or utilizing voice authentication biasing parameters for assistant devices

TEXT-DEPENDENT SPEAKER VERIFICATION