Patents by Inventor Alistair D. Conkie

Alistair D. Conkie has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SYSTEM AND METHOD FOR HANDLING MISSING SPEECH DATA

Publication number: 20160180841

Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for handling missing speech data. The computer-implemented method includes receiving speech with a missing segment, generating a plurality of hypotheses for the missing segment, identifying a best hypothesis for the missing segment, and recognizing the received speech by inserting the identified best hypothesis for the missing segment. In another method embodiment, the final step is replaced with synthesizing the received speech by inserting the identified best hypothesis for the missing segment. In one aspect, the method further includes identifying a duration for the missing segment and generating the plurality of hypotheses of the identified duration for the missing segment. The step of identifying the best hypothesis for the missing segment can be based on speech context, a pronouncing lexicon, and/or a language model. Each hypothesis can have an identical acoustic score.

Type: Application

Filed: March 2, 2016

Publication date: June 23, 2016

Inventors: Andrej Ljolje, ALISTAIR D. CONKIE
System and Method of Synthetic Voice Generation and Modification

Publication number: 20160171972

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.

Type: Application

Filed: February 22, 2016

Publication date: June 16, 2016

Inventors: Alistair D. CONKIE, Ann K. SYRDAL
SYSTEM AND METHOD FOR ADAPTING AUTOMATIC SPEECH RECOGNITION PRONUNCIATION BY ACOUSTIC MODEL RESTRUCTURING

Publication number: 20160171984

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

Type: Application

Filed: February 23, 2016

Publication date: June 16, 2016

Inventors: ANDREJ LJOLJE, ALISTAIR D. CONKIE, ANN K. SYRDAL
SYSTEM AND METHOD FOR AUTOMATIC DETECTION OF ABNORMAL STRESS PATTERNS IN UNIT SELECTION SYNTHESIS

Publication number: 20160171970

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for detecting and correcting abnormal stress patterns in unit-selection speech synthesis. A system practicing the method detects incorrect stress patterns in selected acoustic units representing speech to be synthesized, and corrects the incorrect stress patterns in the selected acoustic units to yield corrected stress patterns. The system can further synthesize speech based on the corrected stress patterns. In one aspect, the system also classifies the incorrect stress patterns using a machine learning algorithm such as a classification and regression tree, adaptive boosting, support vector machine, and maximum entropy. In this way a text-to-speech unit selection speech synthesizer can produce more natural sounding speech with suitable stress patterns regardless of the stress of units in a unit selection database.

Type: Application

Filed: February 22, 2016

Publication date: June 16, 2016

Inventors: Yeon-Jun KIM, Mark Charles BEUTNAGEL, Alistair D. CONKIE, Ann K. Syrdal
SYSTEM AND METHOD FOR TEXT NORMALIZATION USING ATOMIC TOKENS

Publication number: 20160125872

Abstract: A system, method and computer-readable storage devices are for normalizing text for ASR and TTS in a language-neutral way. The system described herein divides Unicode text into meaningful chunks called “atomic tokens.” The atomic tokens strongly correlate to their actual pronunciation, and not to their meaning The system combines the tokenization with a data-driven classification scheme, followed by class-determined actions to convert text to normalized form. The classification labels are based on pronunciation, unlike alternative approaches that typically employ Named Entity-based categories. Thus, this approach is relatively simple to adapt to new languages. Non-experts can easily annotate training data because the tokens are based on pronunciation alone.

Type: Application

Filed: November 5, 2014

Publication date: May 5, 2016

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Ladan GOLIPOUR, Alistair D. CONKIE
SYSTEM AND METHOD FOR LOW-LATENCY WEB-BASED TEXT-TO-SPEECH WITHOUT PLUGINS

Publication number: 20160098985

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for reducing latency in web-browsing TTS systems without the use of a plug-in or Flash® module. A system configured according to the disclosed methods allows the browser to send prosodically meaningful sections of text to a web server. A TTS server then converts intonational phrases of the text into audio and responds to the browser with the audio file. The system saves the audio file in a cache, with the file indexed by a unique identifier. As the system continues converting text into speech, when identical text appears the system uses the cached audio corresponding to the identical text without the need for re-synthesis via the TTS server.

Type: Application

Filed: December 14, 2015

Publication date: April 7, 2016

Inventors: Alistair D. CONKIE, Mark Charles BEUTNAGEL, Taniya MISHRA
System and method for adapting automatic speech recognition pronunciation by acoustic model restructuring

Patent number: 9305547

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

Type: Grant

Filed: April 28, 2015

Date of Patent: April 5, 2016

Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
System and method for handling missing speech data

Patent number: 9305546

Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for handling missing speech data. The computer-implemented method includes receiving speech with a missing segment, generating a plurality of hypotheses for the missing segment, identifying a best hypothesis for the missing segment, and recognizing the received speech by inserting the identified best hypothesis for the missing segment. In another method embodiment, the final step is replaced with synthesizing the received speech by inserting the identified best hypothesis for the missing segment. In one aspect, the method further includes identifying a duration for the missing segment and generating the plurality of hypotheses of the identified duration for the missing segment. The step of identifying the best hypothesis for the missing segment can be based on speech context, a pronouncing lexicon, and/or a language model. Each hypothesis can have an identical acoustic score.

Type: Grant

Filed: June 9, 2014

Date of Patent: April 5, 2016

Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventors: Andrej Ljolje, Alistair D. Conkie
SYSTEM AND METHOD FOR DISTRIBUTED VOICE MODELS ACROSS CLOUD AND DEVICE FOR EMBEDDED TEXT-TO-SPEECH

Publication number: 20160086598

Abstract: Systems, methods, and computer-readable storage media for intelligent caching of concatenative speech units for use in speech synthesis. A system configured to practice the method can identify a speech synthesis context, and determine, based on a local cache of text-to-speech units for a text-to-speech voice and based on the speech synthesis context, additional text-to-speech units which are not in the local cache. The system can request from a server the additional text-to-speech units, and store the additional text-to-speech units in the local cache. The system can then synthesize speech using the text-to-speech units and the additional text-to-speech units in the local cache. The system can prune the cache as the context changes, based on availability of local storage, or after synthesizing the speech. The local cache can store a core set of text-to-speech units associated with the text-to-speech voice that cannot be pruned from the local cache.

Type: Application

Filed: November 30, 2015

Publication date: March 24, 2016

Inventors: Benjamin J. STERN, Mark Charles BEUTNAGEL, Alistair D. CONKIE, Horst J. SCHROETER, Amanda Joy STENT
System and method for synthetic voice generation and modification

Patent number: 9269346

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.

Type: Grant

Filed: February 16, 2015

Date of Patent: February 23, 2016

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Alistair D. Conkie, Ann K. Syrdal
System and method for automatic detection of abnormal stress patterns in unit selection synthesis

Patent number: 9269348

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for detecting and correcting abnormal stress patterns in unit-selection speech synthesis. A system practicing the method detects incorrect stress patterns in selected acoustic units representing speech to be synthesized, and corrects the incorrect stress patterns in the selected acoustic units to yield corrected stress patterns. The system can further synthesize speech based on the corrected stress patterns. In one aspect, the system also classifies the incorrect stress patterns using a machine learning algorithm such as a classification and regression tree, adaptive boosting, support vector machine, and maximum entropy. In this way a text-to-speech unit selection speech synthesizer can produce more natural sounding speech with suitable stress patterns regardless of the stress of units in a unit selection database.

Type: Grant

Filed: February 23, 2015

Date of Patent: February 23, 2016

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Yeon-Jun Kim, Mark Charles Beutnagel, Alistair D. Conkie, Ann K. Syrdal
SYSTEM AND METHOD FOR UNIFIED NORMALIZATION IN TEXT-TO-SPEECH AND AUTOMATIC SPEECH RECOGNITION

Publication number: 20160049144

Abstract: A system, method and computer-readable storage devices are for using a single set of normalization protocols and a single language lexica (or dictionary) for both TTS and ASR. The system receives input (which is either text to be converted to speech or ASR training text), then normalizes the input. The system produces, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output which is either phonemes corresponding to the input or text corresponding to the input for training the ASR system. When the output is phonemes corresponding to the input, the system generates speech by performing prosody generation and unit selection synthesis using the phonemes. When the output is text corresponding to the input, the system trains both an acoustic model and a language model for use in future speech recognition.

Type: Application

Filed: August 18, 2014

Publication date: February 18, 2016

Inventors: Alistair D. CONKIE, Ladan GOLIPOUR
System and method for low-latency web-based text-to-speech without plugins

Patent number: 9240180

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for reducing latency in web-browsing TTS systems without the use of a plug-in or Flash® module. A system configured according to the disclosed methods allows the browser to send prosodically meaningful sections of text to a web server. A TTS server then converts intonational phrases of the text into audio and responds to the browser with the audio file. The system saves the audio file in a cache, with the file indexed by a unique identifier. As the system continues converting text into speech, when identical text appears the system uses the cached audio corresponding to the identical text without the need for re-synthesis via the TTS server.

Type: Grant

Filed: December 1, 2011

Date of Patent: January 19, 2016

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Alistair D. Conkie, Mark Charles Beutnagel, Taniya Mishra
System and method for distributed voice models across cloud and device for embedded text-to-speech

Patent number: 9218804

Abstract: Systems, methods, and computer-readable storage media for intelligent caching of concatenative speech units for use in speech synthesis. A system configured to practice the method can identify a speech synthesis context, and determine, based on a local cache of text-to-speech units for a text-to-speech voice and based on the speech synthesis context, additional text-to-speech units which are not in the local cache. The system can request from a server the additional text-to-speech units, and store the additional text-to-speech units in the local cache. The system can then synthesize speech using the text-to-speech units and the additional text-to-speech units in the local cache. The system can prune the cache as the context changes, based on availability of local storage, or after synthesizing the speech. The local cache can store a core set of text-to-speech units associated with the text-to-speech voice that cannot be pruned from the local cache.

Type: Grant

Filed: September 12, 2013

Date of Patent: December 22, 2015

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Benjamin J. Stern, Mark Charles Beutnagel, Alistair D. Conkie, Horst J. Schroeter, Amanda Joy Stent
Method and system for enhancing a speech database

Patent number: 9218803

Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

Type: Grant

Filed: March 4, 2015

Date of Patent: December 22, 2015

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Alistair D. Conkie, Ann K. Syrdal
SYSTEM AND METHOD FOR DATA-DRIVEN SOCIALLY CUSTOMIZED MODELS FOR LANGUAGE GENERATION

Publication number: 20150332665

Abstract: Systems, methods, and computer-readable storage devices for generating speech using a presentation style specific to a user, and in particular the user's social group. Systems configured according to this disclosure can then use the resulting, personalized, text and/or speech in a spoken dialogue or presentation system to communicate with the user. For example, a system practicing the disclosed method can receive speech from a user, identify the user, and respond to the received speech by applying a personalized natural language generation model. The personalized natural language generation model provides communications which can be specific to the identified user.

Type: Application

Filed: May 13, 2014

Publication date: November 19, 2015

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Taniya MISHRA, Alistair D. CONKIE, Svetlana STOYANCHEV
SYSTEM AND METHOD FOR PROSODICALLY MODIFIED UNIT SELECTION DATABASES

Publication number: 20150325248

Abstract: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.

Type: Application

Filed: May 12, 2014

Publication date: November 12, 2015

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Alistair D. CONKIE, Ladan GOLIPOUR, Ann K. SYRDAL
System And Method For Audibly Presenting Selected Text

Publication number: 20150324169

Abstract: Disclosed herein are methods for presenting speech from a selected text that is on a computing device. This method includes presenting text on a touch-sensitive display and having that text size within a threshold level so that the computing device can accurately determine the intent of the user when the user touches the touch screen. Once the user touch has been received, the computing device identifies and interprets the portion of text that is to be selected, and subsequently presents the text audibly to the user.

Type: Application

Filed: July 21, 2015

Publication date: November 12, 2015

Inventors: Alistair D. Conkie, Horst Schroeter
System and Method for Personalization of Acoustic Models for Automatic Speech Recognition

Publication number: 20150248884

Abstract: Disclosed herein are methods, systems, and computer-readable storage media for automatic speech recognition. The method includes selecting a speaker independent model, and selecting a quantity of speaker dependent models, the quantity of speaker dependent models being based on available computing resources, the selected models including the speaker independent model and the quantity of speaker dependent models. The method also includes recognizing an utterance using each of the selected models in parallel, and selecting a dominant speech model from the selected models based on recognition accuracy using the group of selected models. The system includes a processor and modules configured to control the processor to perform the method. The computer-readable storage medium includes instructions for causing a computing device to perform the steps of the method.

Type: Application

Filed: April 30, 2015

Publication date: September 3, 2015

Inventors: Andrej LJOLJE, Diamantino Antonio CASEIRO, Alistair D. CONKIE
System and Method for Adapting Automatic Speech Recognition Pronunciation by Acoustic Model Restructuring

Publication number: 20150243282

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

Type: Application

Filed: April 28, 2015

Publication date: August 27, 2015

Inventors: Andrej LJOLJE, Alistair D. CONKIE, Ann K. Syrdal

prev 1 2 3 4 5 6 7 8 … next