Patents by Inventor Alistair D. Conkie

Alistair D. Conkie has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and method for audibly presenting selected text

Patent number: 9117445

Abstract: Disclosed herein are methods for presenting speech from a selected text that is on a computing device. This method includes presenting text on a touch-sensitive display and having that text size within a threshold level so that the computing device can accurately determine the intent of the user when the user touches the touch screen. Once the user touch has been received, the computing device identifies and interprets the portion of text that is to be selected, and subsequently presents the text audibly to the user.

Type: Grant

Filed: July 16, 2013

Date of Patent: August 25, 2015

Assignee: Interactions LLC

Inventors: Alistair D. Conkie, Horst Schroeter
System and Method for Cloud-Based Text-to-Speech Web Services

Publication number: 20150221298

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating speech. One variation of the method is from a server side, and another variation of the method is from a client side. The server side method, as implemented by a network-based automatic speech processing system, includes first receiving, from a network client independent of knowledge of internal operations of the system, a request to generate a text-to-speech voice. The request can include speech samples, transcriptions of the speech samples, and metadata describing the speech samples. The system extracts sound units from the speech samples based on the transcriptions and generates an interactive demonstration of the text-to-speech voice based on the sound units, the transcriptions, and the metadata, wherein the interactive demonstration hides a back end processing implementation from the network client. The system provides access to the interactive demonstration to the network client.

Type: Application

Filed: April 13, 2015

Publication date: August 6, 2015

Inventors: Mark Charles BEUTNAGEL, Alistair D. CONKIE, Yeon-Jun KIM, Horst Juergen SCHROETER
SYSTEM AND METHOD FOR SPEECH PERSONALIZATION BY NEED

Publication number: 20150213794

Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for speaker recognition personalization. The method recognizes speech received from a speaker interacting with a speech interface using a set of allocated resources, the set of allocated resources including bandwidth, processor time, memory, and storage. The method records metrics associated with the recognized speech, and after recording the metrics, modifies at least one of the allocated resources in the set of allocated resources commensurate with the recorded metrics. The method recognizes additional speech from the speaker using the modified set of allocated resources. Metrics can include a speech recognition confidence score, processing speed, dialog behavior, requests for repeats, negative responses to confirmations, and task completions.

Type: Application

Filed: April 6, 2015

Publication date: July 30, 2015

Inventors: Andrej LJOLJE, Alistair D. CONKIE, Ann K. SYRDAL
System and Method for Synthetic Voice Generation and Modification

Publication number: 20150179163

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.

Type: Application

Filed: February 16, 2015

Publication date: June 25, 2015

Inventors: Alistair D. Conkie, Ann K. Syrdal
Method and System for Enhancing a Speech Database

Publication number: 20150179162

Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

Type: Application

Filed: March 4, 2015

Publication date: June 25, 2015

Inventors: Alistair D. Conkie, Ann K. Syrdal
SYSTEM AND METHOD FOR AUTOMATIC DETECTION OF ABNORMAL STRESS PATTERNS IN UNIT SELECTION SYNTHESIS

Publication number: 20150170637

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for detecting and correcting abnormal stress patterns in unit-selection speech synthesis. A system practicing the method detects incorrect stress patterns in selected acoustic units representing speech to be synthesized, and corrects the incorrect stress patterns in the selected acoustic units to yield corrected stress patterns. The system can further synthesize speech based on the corrected stress patterns. In one aspect, the system also classifies the incorrect stress patterns using a machine learning algorithm such as a classification and regression tree, adaptive boosting, support vector machine, and maximum entropy. In this way a text-to-speech unit selection speech synthesizer can produce more natural sounding speech with suitable stress patterns regardless of the stress of units in a unit selection database.

Type: Application

Filed: February 23, 2015

Publication date: June 18, 2015

Inventors: Yeon-Jun KIM, Mark Charles BEUTNAGEL, Alistair D. CONKIE, Ann K. Syrdal
SYSTEM AND METHOD FOR DATA-DRIVEN INTONATION GENERATION

Publication number: 20150149178

Abstract: Systems, methods, and computer-readable storage media for text-to-speech processing having an improved intonation. The system first receives text to be converted to speech, the text having a first segment and a second segment. The system then compares the text to a database of stored utterances, identifying in the database a first utterance corresponding to the first segment and determining an intonation of the first utterance. When the database does not contain a second utterance corresponding to the second segment, the system generates the speech corresponding to the text by combining the first utterance with a generated second utterance corresponding to the second segment, the generated second utterance having the intonation matching, or based on, the first utterance. These actions lead to an improved, smoother, more human-like synthetic speech output from the system.

Type: Application

Filed: November 22, 2013

Publication date: May 28, 2015

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Yeon-Jun KIM, Mark Charles BEUTNAGEL, Alistair D. CONKIE, Taniya MISHRA
System and method for personalization of acoustic models for automatic speech recognition

Patent number: 9026444

Abstract: Disclosed herein are methods, systems, and computer-readable storage media for automatic speech recognition. The method includes selecting a speaker independent model, and selecting a quantity of speaker dependent models, the quantity of speaker dependent models being based on available computing resources, the selected models including the speaker independent model and the quantity of speaker dependent models. The method also includes recognizing an utterance using each of the selected models in parallel, and selecting a dominant speech model from the selected models based on recognition accuracy using the group of selected models. The system includes a processor and modules configured to control the processor to perform the method. The computer-readable storage medium includes instructions for causing a computing device to perform the steps of the method.

Type: Grant

Filed: September 16, 2009

Date of Patent: May 5, 2015

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Diamantino Antonio Caseiro, Alistair D. Conkie
System and method for adapting automatic speech recognition pronunciation by acoustic model restructuring

Patent number: 9026442

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

Type: Grant

Filed: August 14, 2014

Date of Patent: May 5, 2015

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
SYSTEM AND METHOD FOR MANAGING MODELS FOR EMBEDDED SPEECH AND LANGUAGE PROCESSING

Publication number: 20150120287

Abstract: Disclosed herein are systems, methods, and computer-readable storage devices for fetching speech processing models based on context changes in advance of speech requests using the speech processing models. An example local device configured to practice the method, having a local speech processor, and having access to remote speech models, detects a change in context. The change in context can be based on geographical location, language translation, speech in a different language, user language settings, installing or removing an app, and so forth. The local device can determine a speech processing model that is likely to be needed based on the change in context, and that is not stored on the local device. Independently of an explicit request to process speech, the local device can retrieve, from a remote server, the speech processing model for use on the mobile device.

Type: Application

Filed: October 28, 2013

Publication date: April 30, 2015

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Benjamin J. STERN, Enrico Luigi BOCCHIERI, Alistair D. CONKIE, Danilo GIULIANELLI
System and method for cloud-based text-to-speech web services

Patent number: 9009050

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating speech. One variation of the method is from a server side, and another variation of the method is from a client side. The server side method, as implemented by a network-based automatic speech processing system, includes first receiving, from a network client independent of knowledge of internal operations of the system, a request to generate a text-to-speech voice. The request can include speech samples, transcriptions of the speech samples, and metadata describing the speech samples. The system extracts sound units from the speech samples based on the transcriptions and generates an interactive demonstration of the text-to-speech voice based on the sound units, the transcriptions, and the metadata, wherein the interactive demonstration hides a back end processing implementation from the network client. The system provides access to the interactive demonstration to the network client.

Type: Grant

Filed: November 30, 2010

Date of Patent: April 14, 2015

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Mark Charles Beutnagel, Alistair D. Conkie, Yeon-Jun Kim, Horst Juergen Schroeter
System and method for speech personalization by need

Patent number: 9002713

Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for speaker recognition personalization. The method recognizes speech received from a speaker interacting with a speech interface using a set of allocated resources, the set of allocated resources including bandwidth, processor time, memory, and storage. The method records metrics associated with the recognized speech, and after recording the metrics, modifies at least one of the allocated resources in the set of allocated resources commensurate with the recorded metrics. The method recognizes additional speech from the speaker using the modified set of allocated resources. Metrics can include a speech recognition confidence score, processing speed, dialog behavior, requests for repeats, negative responses to confirmations, and task completions.

Type: Grant

Filed: June 9, 2009

Date of Patent: April 7, 2015

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
SYSTEM AND METHOD FOR CROWDSOURCING OF WORD PRONUNCIATION VERIFICATION

Publication number: 20150095031

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for crowdsourcing verification of word pronunciations. A system performing word pronunciation crowdsourcing identifies spoken words, or word pronunciations in a dictionary of words, for review by a turker. The identified words are assigned to one or more turkers for review. Assigned turkers listen to the word pronunciations, providing feedback on the correctness/incorrectness of the machine made pronunciation. The feedback can then be used to modify the lexicon, or can be stored for use in configuring future lexicons.

Type: Application

Filed: September 30, 2013

Publication date: April 2, 2015

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Alistair D. CONKIE, Ladan GOLIPOUR, Taniya MISHRA
SYSTEM AND METHOD FOR DISTRIBUTED VOICE MODELS ACROSS CLOUD AND DEVICE FOR EMBEDDED TEXT-TO-SPEECH

Publication number: 20150073805

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for intelligent caching of concatenative speech units for use in speech synthesis. A system configured to practice the method can identify a speech synthesis context, and determine, based on a local cache of text-to-speech units for a text-to-speech voice and based on the speech synthesis context, additional text-to-speech units which are not in the local cache. The system can request from a server the additional text-to-speech units, and store the additional text-to-speech units in the local cache. The system can then synthesize speech using the text-to-speech units and the additional text-to-speech units in the local cache. The system can prune the cache as the context changes, based on availability of local storage, or after synthesizing the speech. The local cache can store a core set of text-to-speech units associated with the text-to-speech voice that cannot be pruned from the local cache.

Type: Application

Filed: September 12, 2013

Publication date: March 12, 2015

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Benjamin J. Stern, Mark Charles Beutnagel, Alistair D. Conkie, Horst J. Schroeter, Amanda Joy Stent
System and Method for Increasing Recognition Rates of In-Vocabulary Words By Improving Pronunciation Modeling

Publication number: 20150073797

Abstract: The present disclosure relates to systems, methods, and computer-readable media for generating a lexicon for use with speech recognition. The method includes overgenerating potential pronunciations based on symbolic input, identifying potential pronunciations in a speech recognition context, and storing the identified potential pronunciations in a lexicon. Overgenerating potential pronunciations can include establishing a set of conversion rules for short sequences of letters, converting portions of the symbolic input into a number of possible lexical pronunciation variants based on the set of conversion rules, modeling the possible lexical pronunciation variants in one of a weighted network and a list of phoneme lists, and iteratively retraining the set of conversion rules based on improved pronunciations. Symbolic input can include multiple examples of a same spoken word. Speech data can be labeled explicitly or implicitly and can include words as text and recorded audio.

Type: Application

Filed: November 12, 2014

Publication date: March 12, 2015

Inventors: Alistair D. CONKIE, Mazin GILBERT, Andrej LJOLJE
Method and system for enhancing a speech database

Patent number: 8977552

Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

Type: Grant

Filed: May 28, 2014

Date of Patent: March 10, 2015

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Alistair D. Conkie, Ann K. Syrdal
System and method for synthetic voice generation and modification

Patent number: 8965767

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.

Type: Grant

Filed: May 20, 2014

Date of Patent: February 24, 2015

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Alistair D. Conkie, Ann K. Syrdal
System and method for automatic detection of abnormal stress patterns in unit selection synthesis

Patent number: 8965768

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for detecting and correcting abnormal stress patterns in unit-selection speech synthesis. A system practicing the method detects incorrect stress patterns in selected acoustic units representing speech to be synthesized, and corrects the incorrect stress patterns in the selected acoustic units to yield corrected stress patterns. The system can further synthesize speech based on the corrected stress patterns. In one aspect, the system also classifies the incorrect stress patterns using a machine learning algorithm such as a classification and regression tree, adaptive boosting, support vector machine, and maximum entropy. In this way a text-to-speech unit selection speech synthesizer can produce more natural sounding speech with suitable stress patterns regardless of the stress of units in a unit selection database.

Type: Grant

Filed: August 6, 2010

Date of Patent: February 24, 2015

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Yeon-Jun Kim, Mark Charles Beutnagel, Alistair D. Conkie, Ann K. Syrdal
Partitioning of Markup Language Documents

Publication number: 20150052423

Abstract: A hybrid markup language document (or “HMLD”) is scanned for a partition boundary. Content in the HMLD that precedes the partition boundary is discarded for simpler and faster processing.

Type: Application

Filed: October 30, 2014

Publication date: February 19, 2015

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Mark C. Beutnagel, Alistair D. Conkie
System and Method for Pronunciation Modeling

Publication number: 20150006179

Abstract: Systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes.

Type: Application

Filed: September 17, 2014

Publication date: January 1, 2015

Inventors: Andrej LJOLJE, Alistair D. CONKIE, Ann K. SYRDAL

prev 1 2 3 4 5 6 7 8 9 next