Segmentation Or Word Limit Detection (epo) Patents (Class 704/E15.005)
  • Publication number: 20090192795
    Abstract: A steering wheel system for a vehicle. The steering wheel system includes a first microphone mounted in a steering wheel and a second microphone mounted in the vehicle. The first and second microphones are each configured to receive an audible input. The audible input includes an oral command component and a noise component. The steering wheel system also includes a controller configured to identify the noise component by determining that the noise component received at the first microphone is out of phase with the noise component received at the second microphone. The controller is configured to cancel the noise component from the audible input.
    Type: Application
    Filed: November 12, 2008
    Publication date: July 30, 2009
    Inventor: Leonard Cech
  • Publication number: 20090164217
    Abstract: This invention relates to processing of audio files, and more specifically, to an improved technique of searching audio. More particularly, a method and system for processing audio using a multi-stage searching process is disclosed.
    Type: Application
    Filed: December 19, 2007
    Publication date: June 25, 2009
    Applicant: Nexidia, Inc.
    Inventors: Jon A. Arrowood, Robert W. Morris, Kenneth K. Griggs
  • Publication number: 20090150154
    Abstract: A method of generating and detecting confusing phones/syllables is disclosed. The method includes a generating stage and a detecting stage. The generating stage includes: (a) input a Mandarin utterance; (b) partition the Mandarin utterance into segmented phones/syllables and generate the most likely route in a recognition net via Forced Alignment of Viterbi decoding; (c) compare the segmented phones/syllables with a Mandarin acoustic model; (d) determine whether a confusing phone/syllable exists; (e) add the confusing phone/syllable into the recognition net and repeat step (b), (c), and (d) when the confusing phone/syllable exists; (f) stop and output all generated confusing phones/syllables to a confusing phone/syllable file when a confusing phone/syllable does not exist.
    Type: Application
    Filed: February 12, 2008
    Publication date: June 11, 2009
    Inventors: Jyh-Shing Jang, Pai-Pin Wang, Jiang-Chun Chen, Zheng-Hao Lin
  • Publication number: 20090144050
    Abstract: A method and system for automatic speech recognition are disclosed. The method comprises receiving speech from a user, the speech including at least one speech error, increasing the probabilities of closely related words to the at least one speech error and processing the received speech using the increased probabilities. A corpora of data having common words that are mis-stated is used to identify and increase the probabilities of related words. The method applies to at least the automatic speech recognition module and the spoken language understanding module.
    Type: Application
    Filed: February 5, 2009
    Publication date: June 4, 2009
    Applicant: AT&T Corp.
    Inventors: Steven H. Lewis, Kenneth H. Rosen
  • Publication number: 20090135177
    Abstract: Systems and methods are disclosed for performing voice personalization of video content. The personalized media content may include a composition of a background scene having a character, head model data representing an individualized three-dimensional (3D) head model of a user, audio data simulating the user's voice, and a viseme track containing instructions for causing the individualized 3D head model to lip sync the words contained in the audio data. The audio data simulating the user's voice can be generated using a voice transformation process. In certain examples, the audio data is based on a text input or selected by the user (e.g., via a telephone or computer) or a textual dialogue of a background character.
    Type: Application
    Filed: November 19, 2008
    Publication date: May 28, 2009
    Applicant: BIG STAGE ENTERTAINMENT, INC.
    Inventors: Jonathan Isaac Strietzel, Jon Hayes Snoddy, Douglas Alexander Fidaleo
  • Publication number: 20090138260
    Abstract: A voice judging system including feature value extraction means that analyzes a sound signal input from a sound signal input device, and extracts a time series of the feature values, sub-word boundary score calculating means that calculates a time series of sub-word boundary scores, by having reference to sound models of voice stored in a voice model storage unit, temporal regularity analyzing means that analyzes temporal regularity of the sub-word boundary scores, and voice judgment means judges whether the input sound signal is voice or non-voice using of the temporal regularity of the sub-word boundary scores.
    Type: Application
    Filed: October 10, 2006
    Publication date: May 28, 2009
    Applicant: NEC CORPORATION
    Inventor: Makoto Terao
  • Publication number: 20090106026
    Abstract: A speech recognition method including for a spoken expression: a) providing a vocabulary of words including predetermined subsets of words, b) assigning to each word of at least one subset an individual score as a function of the value of a criterion of the acoustic resemblance of that word to a portion of the spoken expression, c) for a plurality of subsets, assigning to each subset of the plurality of subsets a composite score corresponding to a sum of the individual scores of the words of said subset, d) determining at least one preferred subset having the highest composite score.
    Type: Application
    Filed: May 24, 2006
    Publication date: April 23, 2009
    Applicant: France Telecom
    Inventor: Alexandre Ferrieux
  • Publication number: 20090086933
    Abstract: In an example embodiment, a system that appropriately routes calls to an agent at a contact center based on the agent's voice and/or hearing characteristics. The agent is selected by matching speech and hearing characteristics of a caller with the speech and hearing characteristics of an agent. In order to find the best match for the caller, the contact center determines if the caller is hearing impaired, and if so determines a suitable frequency range for the caller. If a match cannot be found, the agent's and/or caller's voice may be shifted in real time and adjusted to a frequency range that is best suited for the caller.
    Type: Application
    Filed: October 1, 2007
    Publication date: April 2, 2009
    Inventors: Labhesh Patel, Mukul Jain, Shmuel Shaffer, Sanjeev Kumar
  • Publication number: 20090048837
    Abstract: A system and method that utilizes common symbols for marking the tones of alphabet letters of different languages. The marking system and method employs the symbols from the standard English typing keyboard to denote tones. There are seven phonetic tone marks. Each mark represents a unique tone. The system can be applied to any alphabetic writing letters of different languages to denote specific language tones. The method makes it possible for alphabetic writing of any kind of language and for people to effectively capture the tones of words in different languages.
    Type: Application
    Filed: August 14, 2007
    Publication date: February 19, 2009
    Inventors: Ling Ju Su, Kuojui Su
  • Publication number: 20090024391
    Abstract: According to the present invention, a method for integrating processes with a multi-faceted human centered interface is provided. The interface is facilitated to implement a hands free, voice driven environment to control processes and applications. A natural language model is used to parse voice initiated commands and data, and to route those voice initiated inputs to the required applications or processes. The use of an intelligent context based parser allows the system to intelligently determine what processes are required to complete a task which is initiated using natural language. A single window environment provides an interface which is comfortable to the user by preventing the occurrence of distracting windows from appearing. The single window has a plurality of facets which allow distinct viewing areas. Each facet has an independent process routing its outputs thereto. As other processes are activated, each facet can reshape itself to bring a new process into one of the viewing areas.
    Type: Application
    Filed: September 29, 2008
    Publication date: January 22, 2009
    Applicant: EASTERN INVESTMENTS, LLC
    Inventors: Richard Grant, Pedro E. McGregor
  • Publication number: 20080294441
    Abstract: The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method.
    Type: Application
    Filed: December 6, 2006
    Publication date: November 27, 2008
    Inventor: Zsolt Saffer
  • Publication number: 20080270129
    Abstract: A method for automatically providing a hypothesis of a linguistic formulation that is uttered by users of a voice service based on an automatic speech recognition system and that is outside a recognition domain of the automatic speech recognition system. The method includes providing a constrained and an unconstrained speech recognition from an input speech signal, identifying a part of the constrained speech recognition outside the recognition domain, identifying a part of the unconstrained speech recognition corresponding to the identified part of the constrained speech recognition, and providing the linguistic formulation hypothesis based on the identified part of the unconstrained speech recognition.
    Type: Application
    Filed: February 17, 2005
    Publication date: October 30, 2008
    Applicant: Loquendo S.p.A.
    Inventors: Daniele Colibro, Claudio Vair, Luciano Fissore, Cosmin Popovici
  • Publication number: 20080262844
    Abstract: A method and system for determining the gender of a communicant in a communication is provided. According to the method, at least one aural segment corresponding to at least one word spoken by a communicant is identified. The aural segment is then analyzed by applying a gender detection model to the aural segment, and gender detection data is generated based on the application of the gender detection model.
    Type: Application
    Filed: March 28, 2008
    Publication date: October 23, 2008
    Inventors: Roger Warford, Chris Thoman
  • Publication number: 20080262843
    Abstract: An apparatus and method for recognizing paraphrases of uttered phrases, such as place names. At least one keyword contained in a speech utterance is recognized. Then, the keyword(s) contained in the speech utterance are re-recognized using a phrase including the keyword(s). Based on both recognition results, it is determined whether a paraphrase could have been uttered. If a paraphrase could have been uttered, a phrase corresponding to the paraphrase is determined as a result of speech recognition of the speech utterance.
    Type: Application
    Filed: October 22, 2007
    Publication date: October 23, 2008
    Applicant: NISSAN MOTOR CO., LTD.
    Inventors: Keiko Katsuragawa, Minoru Tomikashi, Takeshi Ono, Daisuke Saitoh, Eiji Tozuka
  • Publication number: 20080262842
    Abstract: A portable computer with a speech recognition function and the method for processing a speech command thereof is disclosed. In the method of a speech command, the speech command has Y command character strings, wherein Y is a positive integer and which is greater than or equal to one. The method includes a step: providing a plurality of speech recognition databases and loading a corresponding speech recognition database responding to execute the X-th command string of the speech command, wherein X is a positive integer and is greater than or equal to one and is less than or equal to N. When the string corresponding to the X-th command character string is found in the loaded speech recognition database, an operation designated by the X-th command string is executed, and when X is not equal to Y, one is added to X.
    Type: Application
    Filed: April 11, 2008
    Publication date: October 23, 2008
    Applicant: ASUSTeK COMPUTER INC.
    Inventors: Hung-Lung Liang, Po-Wei Chou
  • Publication number: 20080221893
    Abstract: New language constantly emerges from complex, collaborative human-human interactions like meetings—such as when a presenter handwrites a new term on a whiteboard while saying it redundantly. The system and method described includes devices for receiving various types of human communication activities (e.g., speech, writing and gestures) presented in a multimodally redundant manner, includes processors and recognizers for segmenting or parsing, and then recognizing selected sub-word units such as phonemes and syllables, and then includes alignment, refinement, and integration modules to find or at least an approximate match to the one or more terms that were presented in the multimodally redundant manner. Once the system has performed a successful integration, one or more terms may be newly enrolled into a database of the system, which permits the system to continuously learn and provide an association for proper names, abbreviations, acronyms, symbols, and other forms of communicated language.
    Type: Application
    Filed: February 29, 2008
    Publication date: September 11, 2008
    Applicant: Adapx, Inc.
    Inventor: Edward C. Kaiser
  • Publication number: 20080221890
    Abstract: Techniques for acquiring, from an input text and an input speech, a set of a character string and a pronunciation thereof which should be recognized as a word. A system according to the present invention: selects, from an input text, plural candidate character strings which are candidates to be recognized as a word; generates plural pronunciation candidates of the selected candidate character strings; generates frequency data by combining data in which the generated pronunciation candidates are respectively associated with the character strings; generates recognition data in which character strings respectively indicating plural words contained in the input speech are associated with pronunciations; and selects and outputs a combination contained in the recognition data, out of combinations each consisting of one of the candidate character strings and one of the pronunciation candidates.
    Type: Application
    Filed: March 6, 2008
    Publication date: September 11, 2008
    Inventors: Gakuto Kurata, Shinsuke Mori, Masafumi Nishimura
  • Publication number: 20080215325
    Abstract: An apparatus, method and program for dividing a conversational dialog into utterance. The apparatus includes a computer processor; a word database for storing spellings and pronunciations of words; a grammar database for storing syntactic rules on words; a pause detecting section which detects a pause location in a channel making a main speech among conversational dialogs inputted in at least two channels; an acknowledgement detecting section which detects an acknowledgement location in a channel not making the main speech; a boundary-candidate extracting section which extracts boundary candidates in the main speech, by extracting pauses existing within a predetermined range before and after a base point that is the acknowledgement location; and a recognizing unit which outputs a word string of the main speech segmented by one of the extracted boundary candidates after dividing the segmented speech into optimal utterance in reference to the word database and grammar database.
    Type: Application
    Filed: December 27, 2007
    Publication date: September 4, 2008
    Inventors: Hiroshi Horii, Hideki Tai, Gaku Yamamoto
  • Publication number: 20080215327
    Abstract: Speech signal information is formatted, processed and transported in accordance with a format adapted for TCP/IP protocols used on the Internet and other communications networks. NULL characters are used for indicating the end of a voice segment. The method is useful for distributed speech recognition systems such as a client-server system, typically implemented on an intranet or over the Internet based on user queries at his/her computer, a PDA, or a workstation using a speech input interface.
    Type: Application
    Filed: May 19, 2008
    Publication date: September 4, 2008
    Inventor: Ian M. Bennett
  • Publication number: 20080208582
    Abstract: Computer-implemented methods and apparatus are provided to facilitate the recognition of the content of a body of speech data. In one embodiment, a method for analyzing verbal communication is provided, comprising acts of producing an electronic recording of a plurality of spoken words; processing the electronic recording to identify a plurality of word alternatives for each of the spoken words, each of the plurality of word alternatives being identified by comparing a portion of the electronic recording with a lexicon, and each of the plurality of word alternatives being assigned a probability of correctly identifying a spoken word; loading the word alternatives and the probabilities to a database for subsequent analysis; and examining the word alternatives and the probabilities to determine at least one characteristic of the plurality of spoken words.
    Type: Application
    Filed: January 29, 2008
    Publication date: August 28, 2008
    Applicant: CallMiner, Inc.
    Inventor: Jeffrey A. Gallino
  • Publication number: 20080183471
    Abstract: A system and method of recognizing speech comprises an audio receiving element and a computer server. The audio receiving element and the computer server perform the process steps of the method. The method involves training a stored set of phonemes by converting them into n-dimensional space, where n is a relatively large number. Once the stored phonemes are converted, they are transformed using single value decomposition to conform the data generally into a hypersphere. The received phonemes from the audio-receiving element are also converted into n-dimensional space and transformed using single value decomposition to conform the data into a hypersphere. The method compares the transformed received phoneme to each transformed stored phoneme by comparing a first distance from a center of the hypersphere to a point associated with the transformed received phoneme and a second distance from the center of the hypersphere to a point associated with the respective transformed stored phoneme.
    Type: Application
    Filed: March 28, 2008
    Publication date: July 31, 2008
    Applicant: AT&T Corp.
    Inventor: Bishnu Saroop Atal
  • Publication number: 20080177544
    Abstract: The invention concerns a method and system for detecting morphemes in a user's communication. The method may include recognizing a lattice of phone strings from the user's input communication, the lattice representing a distribution over the phone strings, and detecting morphemes in the user's input communication using the lattice. The morphemes may be acoustic and/or non-acoustic. The morphemes may represent any unit or sub-unit of communication including phones, diphones, phone-phrases, syllables, grammars, words, gestures, tablet strokes, body movements, mouse clicks, etc. The training speech may be verbal, non-verbal, a combination of verbal and non-verbal, or multimodal.
    Type: Application
    Filed: September 13, 2007
    Publication date: July 24, 2008
    Applicant: AT&T Corp.
    Inventors: Allen Louis Gorin, Dijana Petrovska-Delacretaz, Giuseppe Riccardi, Jeremy Huntley Wright
  • Publication number: 20080154585
    Abstract: In a sound signal processing apparatus, a frame information generation section generates frame information of each frame of a sound signal. A storage stores the frame information generated by the frame information generation section. A first interval determination section determines a first utterance interval in the sound signal. A second interval determination section determines a second utterance interval based on the frame information of the first utterance interval stored in the storage such that the second utterance interval is made shorter than the first utterance interval and confined within the first utterance interval by trimming frames from either of a start point or an end point of the first utterance interval.
    Type: Application
    Filed: December 21, 2007
    Publication date: June 26, 2008
    Applicant: Yamaha Corporation
    Inventor: Yasuo Yoshioka
  • Publication number: 20080089495
    Abstract: A method, system and computer product for accessing, by means of a telephone, a database containing data arranged in at least one table including a plurality of table entries, each table entry including a plurality of information elements. The invention retrieves identifiers of the information elements from the table. Each of the retrieved identifiers of the information elements are associated with a choice key, wherein the choice keys are adapted to be input by a user through a telephone. The invention presents, through the telephone, the identifiers of the information elements together with the respective choice keys to a user. Once a choice key input by the user is received through the telephone, the invention retrieves data from the table using the input choice key, wherein said retrieved data includes the content of the information element in at least one table entry whose identifier corresponds to the input choice key. The retrieved data is presented to the user through the telephone.
    Type: Application
    Filed: October 11, 2007
    Publication date: April 17, 2008
    Inventors: Scot MacLellan, Ivan Orlandi
  • Publication number: 20080084976
    Abstract: A method and system are provided for automatically presenting information during a telephone conversation between two parties to a telephone call. A search term connected with parameters of the telephone connection or with the content of the telephone conversation is generated. A search request is generated from the search term in order to be transferred to a search function which will use the search term to search a volume of data. Information is returned as a search result and the information is presented to at least one of the parties to the conversation during the telephone call.
    Type: Application
    Filed: October 2, 2007
    Publication date: April 10, 2008
    Applicant: Deutsche Telekom AG
    Inventor: Ludwig Brackmann
  • Publication number: 20080071546
    Abstract: The invention provides a system and method for selective vehicle component control. Receiving a speech recognition engine activation signal activates a speech recognition engine in an in-vehicle telematics unit. A voice command is then received at the speech recognition engine of the in-vehicle telematics unit. A vehicle component control command is sent to a control entity from the in-vehicle telematics unit based on the voice command received. Another aspect of the invention provides a computer usable medium that includes a program for selective vehicle component control.
    Type: Application
    Filed: November 30, 2007
    Publication date: March 20, 2008
    Applicant: GENERAL MOTORS CORPORATION
    Inventors: Frederick Beiermeister, Christopher Oesterling, Jeffrey Stefan
  • Publication number: 20080059178
    Abstract: An interface apparatus of an embodiment of the present invention is configured to perform a device operation in response to a voice instruction from a user.
    Type: Application
    Filed: June 28, 2007
    Publication date: March 6, 2008
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Daisuke Yamamoto, Miwako Doi
  • Publication number: 20080046242
    Abstract: A computer-loadable data structure is provided that represents a state-and-transition-based description of a speech grammar. The data structure includes first and second transition entries that both represent transitions from a first state. The second transition entry is contiguous with the first transition entry in the data structure and includes a last-transition value. The last-transition value indicating that the second transition is the last transition from the first state in the data structure.
    Type: Application
    Filed: September 18, 2007
    Publication date: February 21, 2008
    Applicant: Microsoft Corporation
    Inventors: Philipp Schmid, Ralph Lipe
  • Publication number: 20080046243
    Abstract: The invention concerns a method and system for detecting morphemes in a user's communication. The method may include recognizing a lattice of phone strings from the user's input communication, the lattice representing a distribution over the phone strings, and detecting morphemes in the user's input communication using the lattice. The morphemes may be acoustic and/or non-acoustic. The morphemes may represent any unit or sub-unit of communication including phones, diphones, phone-phrases, syllables, grammars, words, gestures, tablet strokes, body movements, mouse clicks, etc. The training speech may be verbal, non-verbal, a combination of verbal and non-verbal, or multimodal.
    Type: Application
    Filed: September 13, 2007
    Publication date: February 21, 2008
    Applicant: AT&T Corp.
    Inventors: Allen Gorin, Dijana Petrovska-Delacretaz, Giuseppe Riccardi, Jeremy Wright
  • Publication number: 20070288238
    Abstract: An end-pointer determines a beginning and an end of a speech segment. The end-pointer includes a voice triggering module that identifies a portion of an audio stream that has an audio speech segment. A rule module communicates with the voice triggering module. The rule module includes a plurality of rules used to analyze a part of the audio stream to detect a beginning and an end of the audio speech segment. A consonant detector detects occurrences of a high frequency consonant in the portion of the audio stream.
    Type: Application
    Filed: May 18, 2007
    Publication date: December 13, 2007
    Inventors: Phillip Hetherington, Mark Fallat