Patents by Inventor Andrej Ljolje

Andrej Ljolje has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20120227078
    Abstract: A method includes receiving a command to provide media content configured to be sent to a display device for display at a particular scan rate. The media content includes audio data and video data. The method includes identifying high priority segments of the media content based on the audio data. The high priority segments are to be displayed by the display device at a presentation rate such that the high priority segments displayed at the presentation rate correspond to the media content displayed at the particular scan rate. The method also includes sending the high priority segments to the display device to provide video content and audio content of the requested media content for display.
    Type: Application
    Filed: May 15, 2012
    Publication date: September 6, 2012
    Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventor: Andrej Ljolje
  • Patent number: 8214213
    Abstract: A system and method for performing speech recognition is disclosed. The method comprises receiving an utterance, applying the utterance to a recognizer with a language model having pronunciation probabilities associated with unique word identifiers for words given their pronunciations and presenting a recognition result for the utterance. Recognition improvement is found by moving a pronunciation model from a dictionary to the langue model.
    Type: Grant
    Filed: April 27, 2006
    Date of Patent: July 3, 2012
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Andrej Ljolje
  • Patent number: 8204359
    Abstract: In an embodiment, a method of providing modified media content is disclosed and includes receiving media content that includes audio data and video data having a first number of video frames. The method also includes generating abstracted media content that includes portions of the video data and audio elements of the audio data, where the abstracted media content includes less than all of the video data and includes fewer video frames than the first number of video frames.
    Type: Grant
    Filed: March 20, 2007
    Date of Patent: June 19, 2012
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Andrej Ljolje
  • Publication number: 20120101820
    Abstract: A method is disclosed for applying a multi-state barge-in acoustic model in a spoken dialogue system. The method includes receiving an audio speech input from the user during the presentation of a prompt, accumulating the audio speech input from the user, applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, determining whether the audio speech input is a barge-in-speech input from the user, and if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.
    Type: Application
    Filed: October 24, 2011
    Publication date: April 26, 2012
    Applicant: AT&T Intellectual Property I, L.P.
    Inventor: Andrej Ljolje
  • Publication number: 20120078617
    Abstract: The present disclosure relates to systems, methods, and computer-readable media for generating a lexicon for use with speech recognition. The method includes receiving symbolic input as labeled speech data, overgenerating potential pronunciations based on the symbolic input, identifying potential pronunciations in a speech recognition context, and storing the identified potential pronunciations in a lexicon. Overgenerating potential pronunciations can include establishing a set of conversion rules for short sequences of letters, converting portions of the symbolic input into a number of possible lexical pronunciation variants based on the set of conversion rules, modeling the possible lexical pronunciation variants in one of a weighted network and a list of phoneme lists, and iteratively retraining the set of conversion rules based on improved pronunciations. Symbolic input can include multiple examples of a same spoken word.
    Type: Application
    Filed: December 5, 2011
    Publication date: March 29, 2012
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Alistair D. Conkie, Mazin Gilbert, Andrej Ljolje
  • Publication number: 20120065975
    Abstract: Systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes.
    Type: Application
    Filed: November 22, 2011
    Publication date: March 15, 2012
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 8095365
    Abstract: The present disclosure relates to systems, methods, and computer-readable media for generating a lexicon for use with speech recognition. The method includes receiving symbolic input as labeled speech data, overgenerating potential pronunciations based on the symbolic input, identifying potential pronunciations in a speech recognition context, and storing the identified potential pronunciations in a lexicon. Overgenerating potential pronunciations can include establishing a set of conversion rules for short sequences of letters, converting portions of the symbolic input into a number of possible lexical pronunciation variants based on the set of conversion rules, modeling the possible lexical pronunciation variants in one of a weighted network and a list of phoneme lists, and iteratively retraining the set of conversion rules based on improved pronunciations. Symbolic input can include multiple examples of a same spoken word.
    Type: Grant
    Filed: December 4, 2008
    Date of Patent: January 10, 2012
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Alistair D. Conkie, Mazin Gilbert, Andrej Ljolje
  • Patent number: 8073693
    Abstract: Systems, computer-implemented methods, and tangible computer-readable media for generating a pronunciation model. The method includes identifying a generic model of speech composed of phonemes, identifying a family of interchangeable phonemic alternatives for a phoneme in the generic model of speech, labeling the family of interchangeable phonemic alternatives as referring to the same phoneme, and generating a pronunciation model which substitutes each family for each respective phoneme. In one aspect, the generic model of speech is a vocal tract length normalized acoustic model. Interchangeable phonemic alternatives can represent a same phoneme for different dialectal classes. An interchangeable phonemic alternative can include a string of phonemes.
    Type: Grant
    Filed: December 4, 2008
    Date of Patent: December 6, 2011
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 8046221
    Abstract: Disclosed are systems, methods and computer readable media for applying a multi-state barge-in acoustic model in a spoken dialogue system comprising the steps of (1) presenting a prompt to a user from the spoken dialog system. (2) receiving an audio speech input from the user during the presentation of the prompt, (3) accumulating the audio speech input from the user, (4) applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, (5) applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, (6) determining whether the audio speech input is a barge-in-speech input from the user, and (7) if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: October 25, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Andrej Ljolje
  • Patent number: 8024191
    Abstract: Systems and methods are provided for recognizing speech in a spoken dialogue system. The method includes receiving input speech having a pre-vocalic consonant or a post-vocalic consonant, generating at least one output lattice that calculates a first score by comparing the input speech to a training model to provide a result and distinguishing between the pre-vocalic consonant and the post-vocalic consonant in the input speech. A second score is calculated by measuring a similarity between the pre-vocalic consonant or the post vocalic consonant in the input speech and the first score. At least one category is determined for the pre-vocalic match or mismatch or the post-vocalic match or mismatch by using the second score and the results of the an automated speech recognition (ASR) system are refined by using the at least one category for the pre-vocalic match or mismatch or the post-vocalic match or mismatch.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: September 20, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Yeon-Jun Kim, Alistair Conkie, Andrej Ljolje, Ann K. Syrdal
  • Patent number: 8015008
    Abstract: Disclosed are systems, methods and computer readable media for training acoustic models for an automatic speech recognition systems (ASR) system. The method includes receiving a speech signal, defining at least one syllable boundary position in the received speech signal, based on the at least one syllable boundary position, generating for each consonant in a consonant phoneme inventory a pre-vocalic position label and a post-vocalic position label to expand the consonant phoneme inventory, reformulating a lexicon to reflect an expanded consonant phoneme inventory, and training a language model for an automated speech recognition (ASR) system based on the reformulated lexicon.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: September 6, 2011
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Yeon-Jun Kim, Alistair Conkie, Andrej Ljolje, Ann K. Syrdal
  • Patent number: 8000971
    Abstract: Disclosed are systems and methods for training a barge-in-model for speech processing in a spoken dialogue system comprising the steps of (1) receiving an input having at least one speech segment and at least one non-speech segment, (2) establishing a restriction of recognizing only speech states during speech segments of the input and non-speech states during non-speech segments of the input, (2) generating a hypothesis lattice by allowing any sequence of speech Hidden Markov Models (HMMs) and non-speech HMMs, (4) generating a reference lattice by only allowing speech HMMs for at least one speech segment and non-speech HMMs for at least one non-speech segment, wherein different iterations of training generates at least one different reference lattice and at least one reference transcription, and (5) employing the generated reference lattice as the barge-in-model for speech processing.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: August 16, 2011
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Andrej Ljolje
  • Publication number: 20110137648
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for improving automatic speech recognition performance. A system practicing the method identifies idle speech recognition resources and establishes a supplemental speech recognizer on the idle resources based on overall speech recognition demand. The supplemental speech recognizer can differ from a main speech recognizer, and, along with the main speech recognizer, can be associated with a particular speaker. The system performs speech recognition on speech received from the particular speaker in parallel with the main speech recognizer and the supplemental speech recognizer and combines results from the main and supplemental speech recognizer. The system recognizes the received speech based on the combined results. The system can use beam adjustment in place of or in combination with a supplemental speech recognizer.
    Type: Application
    Filed: December 4, 2009
    Publication date: June 9, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Andrej LJOLJE, Mazin GILBERT
  • Publication number: 20110137653
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for performing speech recognition based on a masked language model. A system configured to practice the method receives a masked language model including a plurality of words, wherein a bit mask identifies whether each of the plurality of words is allowed or disallowed with regard to an adaptation subset, receives input speech, generates a speech recognition lattice based on the received input speech using the masked language model, removes from the generated lattice words identified as disallowed by the bit mask for the adaptation subset, and recognizes the received speech based on the lattice. Alternatively during the generation step, the system can only add words indicated as allowed by the bit mask. The bit mask can be separate from or incorporated as part of the masked language model. The system can dynamically update the adaptation subset and bit mask.
    Type: Application
    Filed: December 4, 2009
    Publication date: June 9, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Andrej LJOLJE, Mazin Gilbert
  • Publication number: 20110137650
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for training adaptation-specific acoustic models. A system practicing the method receives speech and generates a full size model and a reduced size model, the reduced size model starting with a single distribution for each speech sound in the received speech. The system finds speech segment boundaries in the speech using the full size model and adapts features of the speech data using the reduced size model based on the speech segment boundaries and an overall centroid for each speech sound. The system then recognizes speech using the adapted features of the speech. The model can be a Hidden Markov Model (HMM). The reduced size model can also be of a reduced complexity, such as having fewer mixture components than a model of full complexity. Adapting features of speech can include moving the features closer to an overall feature distribution center.
    Type: Application
    Filed: December 8, 2009
    Publication date: June 9, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventor: Andrej LJOLJE
  • Publication number: 20110119059
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.
    Type: Application
    Filed: November 13, 2009
    Publication date: May 19, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Andrej LJOLJE, Bernard S. RENGER, Steven Neil TISCHER
  • Publication number: 20110077942
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for handling expected repeat speech queries or other inputs. The method causes a computing device to detect a misrecognized speech query from a user, determine a tendency of the user to repeat speech queries based on previous user interactions, and adapt a speech recognition model based on the determined tendency before an expected repeat speech query. The method can further include recognizing the expected repeat speech query from the user based on the adapted speech recognition model. Adapting the speech recognition model can include modifying an acoustic model, a language model, and/or a semantic model. Adapting the speech recognition model can also include preparing a personalized search speech recognition model for the expected repeat query based on usage history and entries in a recognition lattice. The method can include retaining unmodified speech recognition models with adapted speech recognition models.
    Type: Application
    Filed: September 30, 2009
    Publication date: March 31, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Andrej LJOLJE, Diamantino Antonio Caseiro
  • Publication number: 20110066433
    Abstract: Disclosed herein are methods, systems, and computer-readable storage media for automatic speech recognition. The method includes selecting a speaker independent model, and selecting a quantity of speaker dependent models, the quantity of speaker dependent models being based on available computing resources, the selected models including the speaker independent model and the quantity of speaker dependent models. The method also includes recognizing an utterance using each of the selected models in parallel, and selecting a dominant speech model from the selected models based on recognition accuracy using the group of selected models. The system includes a processor and modules configured to control the processor to perform the method. The computer-readable storage medium includes instructions for causing a computing device to perform the steps of the method.
    Type: Application
    Filed: September 16, 2009
    Publication date: March 17, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Andrej LJOLJE, Diamantino Antonio CASEIRO, Alistair D. CONKIE
  • Publication number: 20100332286
    Abstract: Predicting a score related to a communication sent by a sender over a communications network to a first agent servicing the communication includes obtaining a regression result for an objective function by encoding features extracted from the communication. The encoded features are applied to a regression model for the objective function. The regression result is output to a network component in the communications network. The regression model is determined prior to or concurrently with receiving the communication from the sender.
    Type: Application
    Filed: June 24, 2009
    Publication date: December 30, 2010
    Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.,
    Inventors: I. Dan MELAMED, Yeon-Jun KIM, Andrej LJOLJE, Bernard S. RENGER, David J. SMITH
  • Publication number: 20100332227
    Abstract: A method of detecting pre-determined phrases to determine compliance quality is provided. The method includes determining whether at least one of an event or a precursor event has occurred based on a comparison between pre-determined phrases and a communication between a sender and a recipient in a communications network, and rating the recipient based on the presence of the pre-determined phrases associated with the event or the presence of the pre-determined phrases associated with the precursor event in the communication.
    Type: Application
    Filed: June 24, 2009
    Publication date: December 30, 2010
    Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventors: I. Dan MELAMED, Yeon-Jun KIM, Andrej LJOLJE, Bernard S. RENGER, David J. SMITH