Use Of Particular Distance Or Distortion Metric Between Probe Pattern And Reference Templates (epo) Patents (Class 704/E17.008)
  • Patent number: 11893999
    Abstract: Techniques for enrolling a user in a system's user recognition functionality without requiring the user speak particular speech are described. The system may determine characteristics unique to a user input. The system may generate an implicit voice profile from user inputs having similar characteristics. After an implicit voice profile is generated, the system may receive a user input having speech characteristics similar to that of the implicit voice profile. The system may ask the user if the user wants the system to associate the implicit voice profile with a particular user identifier. If the user responds affirmatively, the system may request an identifier of a user profile (e.g., a user name). In response to receiving the user's name, the system may identify a user profile associated with the name and associate the implicit voice profile with the user profile, thereby converting the implicit voice profile into an explicit voice profile.
    Type: Grant
    Filed: August 6, 2018
    Date of Patent: February 6, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Sai Sailesh Kopuri, John Moore, Sundararajan Srinivasan, Aparna Khare, Arindam Mandal, Spyridon Matsoukas, Rohit Prasad
  • Patent number: 11894014
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.
    Type: Grant
    Filed: September 22, 2022
    Date of Patent: February 6, 2024
    Assignee: Google LLC
    Inventors: Inbar Mosseri, Michael Rubinstein, Ariel Ephrat, William Freeman, Oran Lang, Kevin William Wilson, Tali Dekel, Avinatan Hassidim
  • Patent number: 11823684
    Abstract: Implementations are directed to biasing speaker authentication on a per-user basis and on a device-by-device basis and/or contextual feature(s) basis. In some of those implementations, in performing speaker authentication based on a spoken utterance, different biasing parameters are determined for each of multiple different registered users of an assistant device at which the spoken utterance was detected. In those implementations, each of the biasing parameters can be used to make it more likely or less likely (in dependence of the biasing parameter) that a corresponding registered user will be verified using the speaker authentication. Through utilization of biasing parameter(s) in performing speaker authentication, accuracy and/or robustness of speaker authentication can be increased.
    Type: Grant
    Filed: November 19, 2020
    Date of Patent: November 21, 2023
    Assignee: GOOGLE LLC
    Inventors: Matthew Sharifi, Victor Carbune
  • Publication number: 20080195389
    Abstract: A text-dependent speaker verification technique that uses a generic speaker-independent speech recognizer for robust speaker verification, and uses the acoustical model of a speaker-independent speech recognizer as a background model. Instead of using a likelihood ratio test (LRT) at the utterance level (e.g., the sentence level), which is typical of most speaker verification systems, the present text-dependent speaker verification technique uses weighted sum of likelihood ratios at the sub-unit level (word, tri-phone, or phone) as well as at the utterance level.
    Type: Application
    Filed: February 12, 2007
    Publication date: August 14, 2008
    Applicant: Microsoft Corporation
    Inventors: Zhengyou Zhang, Amarnag Subramaya