Patents by Inventor Alistair D. Conkie

Alistair D. Conkie has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9117445
    Abstract: Disclosed herein are methods for presenting speech from a selected text that is on a computing device. This method includes presenting text on a touch-sensitive display and having that text size within a threshold level so that the computing device can accurately determine the intent of the user when the user touches the touch screen. Once the user touch has been received, the computing device identifies and interprets the portion of text that is to be selected, and subsequently presents the text audibly to the user.
    Type: Grant
    Filed: July 16, 2013
    Date of Patent: August 25, 2015
    Assignee: Interactions LLC
    Inventors: Alistair D. Conkie, Horst Schroeter
  • Publication number: 20150221298
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating speech. One variation of the method is from a server side, and another variation of the method is from a client side. The server side method, as implemented by a network-based automatic speech processing system, includes first receiving, from a network client independent of knowledge of internal operations of the system, a request to generate a text-to-speech voice. The request can include speech samples, transcriptions of the speech samples, and metadata describing the speech samples. The system extracts sound units from the speech samples based on the transcriptions and generates an interactive demonstration of the text-to-speech voice based on the sound units, the transcriptions, and the metadata, wherein the interactive demonstration hides a back end processing implementation from the network client. The system provides access to the interactive demonstration to the network client.
    Type: Application
    Filed: April 13, 2015
    Publication date: August 6, 2015
    Inventors: Mark Charles BEUTNAGEL, Alistair D. CONKIE, Yeon-Jun KIM, Horst Juergen SCHROETER
  • Publication number: 20150213794
    Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for speaker recognition personalization. The method recognizes speech received from a speaker interacting with a speech interface using a set of allocated resources, the set of allocated resources including bandwidth, processor time, memory, and storage. The method records metrics associated with the recognized speech, and after recording the metrics, modifies at least one of the allocated resources in the set of allocated resources commensurate with the recorded metrics. The method recognizes additional speech from the speaker using the modified set of allocated resources. Metrics can include a speech recognition confidence score, processing speed, dialog behavior, requests for repeats, negative responses to confirmations, and task completions.
    Type: Application
    Filed: April 6, 2015
    Publication date: July 30, 2015
    Inventors: Andrej LJOLJE, Alistair D. CONKIE, Ann K. SYRDAL
  • Publication number: 20150179163
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.
    Type: Application
    Filed: February 16, 2015
    Publication date: June 25, 2015
    Inventors: Alistair D. Conkie, Ann K. Syrdal
  • Publication number: 20150179162
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Application
    Filed: March 4, 2015
    Publication date: June 25, 2015
    Inventors: Alistair D. Conkie, Ann K. Syrdal
  • Publication number: 20150170637
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for detecting and correcting abnormal stress patterns in unit-selection speech synthesis. A system practicing the method detects incorrect stress patterns in selected acoustic units representing speech to be synthesized, and corrects the incorrect stress patterns in the selected acoustic units to yield corrected stress patterns. The system can further synthesize speech based on the corrected stress patterns. In one aspect, the system also classifies the incorrect stress patterns using a machine learning algorithm such as a classification and regression tree, adaptive boosting, support vector machine, and maximum entropy. In this way a text-to-speech unit selection speech synthesizer can produce more natural sounding speech with suitable stress patterns regardless of the stress of units in a unit selection database.
    Type: Application
    Filed: February 23, 2015
    Publication date: June 18, 2015
    Inventors: Yeon-Jun KIM, Mark Charles BEUTNAGEL, Alistair D. CONKIE, Ann K. Syrdal
  • Publication number: 20150149178
    Abstract: Systems, methods, and computer-readable storage media for text-to-speech processing having an improved intonation. The system first receives text to be converted to speech, the text having a first segment and a second segment. The system then compares the text to a database of stored utterances, identifying in the database a first utterance corresponding to the first segment and determining an intonation of the first utterance. When the database does not contain a second utterance corresponding to the second segment, the system generates the speech corresponding to the text by combining the first utterance with a generated second utterance corresponding to the second segment, the generated second utterance having the intonation matching, or based on, the first utterance. These actions lead to an improved, smoother, more human-like synthetic speech output from the system.
    Type: Application
    Filed: November 22, 2013
    Publication date: May 28, 2015
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Yeon-Jun KIM, Mark Charles BEUTNAGEL, Alistair D. CONKIE, Taniya MISHRA
  • Patent number: 9026444
    Abstract: Disclosed herein are methods, systems, and computer-readable storage media for automatic speech recognition. The method includes selecting a speaker independent model, and selecting a quantity of speaker dependent models, the quantity of speaker dependent models being based on available computing resources, the selected models including the speaker independent model and the quantity of speaker dependent models. The method also includes recognizing an utterance using each of the selected models in parallel, and selecting a dominant speech model from the selected models based on recognition accuracy using the group of selected models. The system includes a processor and modules configured to control the processor to perform the method. The computer-readable storage medium includes instructions for causing a computing device to perform the steps of the method.
    Type: Grant
    Filed: September 16, 2009
    Date of Patent: May 5, 2015
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Diamantino Antonio Caseiro, Alistair D. Conkie
  • Patent number: 9026442
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
    Type: Grant
    Filed: August 14, 2014
    Date of Patent: May 5, 2015
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
  • Publication number: 20150120287
    Abstract: Disclosed herein are systems, methods, and computer-readable storage devices for fetching speech processing models based on context changes in advance of speech requests using the speech processing models. An example local device configured to practice the method, having a local speech processor, and having access to remote speech models, detects a change in context. The change in context can be based on geographical location, language translation, speech in a different language, user language settings, installing or removing an app, and so forth. The local device can determine a speech processing model that is likely to be needed based on the change in context, and that is not stored on the local device. Independently of an explicit request to process speech, the local device can retrieve, from a remote server, the speech processing model for use on the mobile device.
    Type: Application
    Filed: October 28, 2013
    Publication date: April 30, 2015
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Benjamin J. STERN, Enrico Luigi BOCCHIERI, Alistair D. CONKIE, Danilo GIULIANELLI
  • Patent number: 9009050
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating speech. One variation of the method is from a server side, and another variation of the method is from a client side. The server side method, as implemented by a network-based automatic speech processing system, includes first receiving, from a network client independent of knowledge of internal operations of the system, a request to generate a text-to-speech voice. The request can include speech samples, transcriptions of the speech samples, and metadata describing the speech samples. The system extracts sound units from the speech samples based on the transcriptions and generates an interactive demonstration of the text-to-speech voice based on the sound units, the transcriptions, and the metadata, wherein the interactive demonstration hides a back end processing implementation from the network client. The system provides access to the interactive demonstration to the network client.
    Type: Grant
    Filed: November 30, 2010
    Date of Patent: April 14, 2015
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Mark Charles Beutnagel, Alistair D. Conkie, Yeon-Jun Kim, Horst Juergen Schroeter
  • Patent number: 9002713
    Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for speaker recognition personalization. The method recognizes speech received from a speaker interacting with a speech interface using a set of allocated resources, the set of allocated resources including bandwidth, processor time, memory, and storage. The method records metrics associated with the recognized speech, and after recording the metrics, modifies at least one of the allocated resources in the set of allocated resources commensurate with the recorded metrics. The method recognizes additional speech from the speaker using the modified set of allocated resources. Metrics can include a speech recognition confidence score, processing speed, dialog behavior, requests for repeats, negative responses to confirmations, and task completions.
    Type: Grant
    Filed: June 9, 2009
    Date of Patent: April 7, 2015
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
  • Publication number: 20150095031
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for crowdsourcing verification of word pronunciations. A system performing word pronunciation crowdsourcing identifies spoken words, or word pronunciations in a dictionary of words, for review by a turker. The identified words are assigned to one or more turkers for review. Assigned turkers listen to the word pronunciations, providing feedback on the correctness/incorrectness of the machine made pronunciation. The feedback can then be used to modify the lexicon, or can be stored for use in configuring future lexicons.
    Type: Application
    Filed: September 30, 2013
    Publication date: April 2, 2015
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Alistair D. CONKIE, Ladan GOLIPOUR, Taniya MISHRA
  • Publication number: 20150073805
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for intelligent caching of concatenative speech units for use in speech synthesis. A system configured to practice the method can identify a speech synthesis context, and determine, based on a local cache of text-to-speech units for a text-to-speech voice and based on the speech synthesis context, additional text-to-speech units which are not in the local cache. The system can request from a server the additional text-to-speech units, and store the additional text-to-speech units in the local cache. The system can then synthesize speech using the text-to-speech units and the additional text-to-speech units in the local cache. The system can prune the cache as the context changes, based on availability of local storage, or after synthesizing the speech. The local cache can store a core set of text-to-speech units associated with the text-to-speech voice that cannot be pruned from the local cache.
    Type: Application
    Filed: September 12, 2013
    Publication date: March 12, 2015
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Benjamin J. Stern, Mark Charles Beutnagel, Alistair D. Conkie, Horst J. Schroeter, Amanda Joy Stent
  • Publication number: 20150073797
    Abstract: The present disclosure relates to systems, methods, and computer-readable media for generating a lexicon for use with speech recognition. The method includes overgenerating potential pronunciations based on symbolic input, identifying potential pronunciations in a speech recognition context, and storing the identified potential pronunciations in a lexicon. Overgenerating potential pronunciations can include establishing a set of conversion rules for short sequences of letters, converting portions of the symbolic input into a number of possible lexical pronunciation variants based on the set of conversion rules, modeling the possible lexical pronunciation variants in one of a weighted network and a list of phoneme lists, and iteratively retraining the set of conversion rules based on improved pronunciations. Symbolic input can include multiple examples of a same spoken word. Speech data can be labeled explicitly or implicitly and can include words as text and recorded audio.
    Type: Application
    Filed: November 12, 2014
    Publication date: March 12, 2015
    Inventors: Alistair D. CONKIE, Mazin GILBERT, Andrej LJOLJE
  • Patent number: 8977552
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: May 28, 2014
    Date of Patent: March 10, 2015
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 8965767
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.
    Type: Grant
    Filed: May 20, 2014
    Date of Patent: February 24, 2015
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 8965768
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for detecting and correcting abnormal stress patterns in unit-selection speech synthesis. A system practicing the method detects incorrect stress patterns in selected acoustic units representing speech to be synthesized, and corrects the incorrect stress patterns in the selected acoustic units to yield corrected stress patterns. The system can further synthesize speech based on the corrected stress patterns. In one aspect, the system also classifies the incorrect stress patterns using a machine learning algorithm such as a classification and regression tree, adaptive boosting, support vector machine, and maximum entropy. In this way a text-to-speech unit selection speech synthesizer can produce more natural sounding speech with suitable stress patterns regardless of the stress of units in a unit selection database.
    Type: Grant
    Filed: August 6, 2010
    Date of Patent: February 24, 2015
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Yeon-Jun Kim, Mark Charles Beutnagel, Alistair D. Conkie, Ann K. Syrdal
  • Publication number: 20150052423
    Abstract: A hybrid markup language document (or “HMLD”) is scanned for a partition boundary. Content in the HMLD that precedes the partition boundary is discarded for simpler and faster processing.
    Type: Application
    Filed: October 30, 2014
    Publication date: February 19, 2015
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Mark C. Beutnagel, Alistair D. Conkie
  • Publication number: 20150006179
    Abstract: Systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes.
    Type: Application
    Filed: September 17, 2014
    Publication date: January 1, 2015
    Inventors: Andrej LJOLJE, Alistair D. CONKIE, Ann K. SYRDAL