Patents by Inventor Donald R. McAllaster

Donald R. McAllaster has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Low-latency multi-speaker speech recognition

Patent number: 11475898

Abstract: Systems and processes for operating an intelligent automated assistant are provided. In one example, a method includes receiving mixed speech data representing utterances of a target speaker and utterances of one or more interfering audio sources. The method further includes obtaining a target speaker representation, which represents speech characteristics of the target speaker; and determining, using a learning network, probability distributions of phonetic elements directly from the mixed speech data. The inputs of the learning network include the mixed speech data and the target speaker representation. An output of the learning network includes the probability distributions of phonetic elements. The method further includes generating text corresponding to the utterances of the target speaker based on the probability distributions of the phonetic elements; and providing a response to the target speaker based on the text corresponding to the utterances of the target speaker.

Type: Grant

Filed: August 7, 2019

Date of Patent: October 18, 2022

Assignee: Apple Inc.

Inventors: Masood Delfarah, Ossama A. Abdelhamid, Kyuyeon Hwang, Donald R. McAllaster, Sabato Marco Siniscalchi
LOW-LATENCY MULTI-SPEAKER SPEECH RECOGNITION

Publication number: 20200135209

Abstract: Systems and processes for operating an intelligent automated assistant are provided. In one example, a method includes receiving mixed speech data representing utterances of a target speaker and utterances of one or more interfering audio sources. The method further includes obtaining a target speaker representation, which represents speech characteristics of the target speaker; and determining, using a learning network, probability distributions of phonetic elements directly from the mixed speech data. The inputs of the learning network include the mixed speech data and the target speaker representation. An output of the learning network includes the probability distributions of phonetic elements. The method further includes generating text corresponding to the utterances of the target speaker based on the probability distributions of the phonetic elements; and providing a response to the target speaker based on the text corresponding to the utterances of the target speaker.

Type: Application

Filed: August 7, 2019

Publication date: April 30, 2020

Inventors: Masood DELFARAH, Ossama A. ABDELHAMID, Kyuyeon HWANG, Donald R. MCALLASTER, Sabato Marco SINISCALCHI
Inverse text normalization for automatic speech recognition

Patent number: 10592604

Abstract: Techniques for inverse text normalization are provided. In some examples, speech input is received and a spoken-form text representation of the speech input is generated. The spoken-form text representation includes a token sequence. A feature representation is determined for the spoken-form text representation and a sequence of labels is determined based on the feature representation. The sequence of labels is assigned to the token sequence and specifies a plurality of edit operations to perform on the token sequence. Each edit operation of the plurality of edit operations corresponds to one of a plurality of predetermined types of edit operations. A written-form text representation of the speech input is generated by applying the plurality of edit operations to the token sequence in accordance with the sequence of labels. A task responsive to the speech input is performed using the generated written-form text representation.

Type: Grant

Filed: June 29, 2018

Date of Patent: March 17, 2020

Assignee: Apple Inc.

Inventors: Ernest J. Pusateri, Bharat Ram Ambati, Elizabeth S. Brooks, Donald R. McAllaster, Venkatesh Nagesha, Ondrej Platek
INVERSE TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20190278841

Abstract: Techniques for inverse text normalization are provided. In some examples, speech input is received and a spoken-form text representation of the speech input is generated. The spoken-form text representation includes a token sequence. A feature representation is determined for the spoken-form text representation and a sequence of labels is determined based on the feature representation. The sequence of labels is assigned to the token sequence and specifies a plurality of edit operations to perform on the token sequence. Each edit operation of the plurality of edit operations corresponds to one of a plurality of predetermined types of edit operations. A written-form text representation of the speech input is generated by applying the plurality of edit operations to the token sequence in accordance with the sequence of labels. A task responsive to the speech input is performed using the generated written-form text representation.

Type: Application

Filed: June 29, 2018

Publication date: September 12, 2019

Inventors: Ernest J. PUSATERI, Bharat Ram AMBATI, Elizabeth S. BROOKS, Donald R. MCALLASTER, Venkatesh NAGESHA, Ondrej PLATEK
SPEAKER RECOGNITION

Publication number: 20170092278

Abstract: A non-transitory computer-readable storage medium stores one or more programs including instructions, which when executed by an electronic device, cause the electronic device to receive natural-language speech input from one of a plurality of users, the natural-language speech input having a set of acoustic properties; and determine whether the natural-language speech input corresponds to both a user-customizable lexical trigger and a set of acoustic properties associated with the user; where in accordance with a determination that the natural language speech input corresponds to both a user-customizable lexical trigger and a set of acoustic properties associated with the user, invoke a virtual assistant; and in accordance with a determination that either the natural language speech input fails to correspond to a user-customizable lexical trigger or the natural-language speech input fails to have a set of acoustic properties associated with the user, forego invocation of a virtual assistant.

Type: Application

Filed: May 24, 2016

Publication date: March 30, 2017

Inventors: Gunnar EVERMANN, Donald R. MCALLASTER
Training speech recognition word models from word samples synthesized by Monte Carlo techniques

Patent number: 7133827

Abstract: A new word model is trained from synthetic word samples derived by Monte Carlo techniques from one or more prior word models. The prior word model can be a phonetic word model and the new word model can be a non-phonetic, whole-word, word model. The prior word model can be trained from data that has undergone a first channel normalization and the synthesized word samples from which the new word model is trained can undergo a different channel normalization similar to that to be used in a given speech recognition context. The prior word model can have a first model structure and the new word model can have a second, different, model structure. These differences in model structure can include, for example, differences of model topology; differences of model complexity; and differences in the type of basis function used in a description of such probability distributions.

Type: Grant

Filed: February 6, 2003

Date of Patent: November 7, 2006

Assignee: Voice Signal Technologies, Inc.

Inventors: Laurence S. Gillick, Donald R. McAllaster, Daniel L. Roth

Low-latency multi-speaker speech recognition

LOW-LATENCY MULTI-SPEAKER SPEECH RECOGNITION

Inverse text normalization for automatic speech recognition

INVERSE TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION

SPEAKER RECOGNITION

Training speech recognition word models from word samples synthesized by Monte Carlo techniques