Subportions Patents (Class 704/254)
  • Patent number: 7508393
    Abstract: A system comprising a plurality of three dimensional artificially animated portraits for performing preprogrammed animations of voice and facial expressions in the form of a scripted dialogue orchestrated by a central source. The system is operable to prepare animations of recorded voice and selected depictions of facial expressions to be transferred to the animated portraits and performed by the animated portraits. The system is operable to combine prepared animations in a scripted dialogue to be performed so as to mimic an interactive conversation.
    Type: Grant
    Filed: June 6, 2006
    Date of Patent: March 24, 2009
    Inventors: Patricia L. Gordon, Robert E Glaser
  • Publication number: 20090063151
    Abstract: In some aspects, a wordspotter is used to locate occurrences in an audio corpus of each of a set of predetermined subword units, which may be phoneme sequences. To locate a query (e.g., a keyword or phrase) in the audio corpus, constituent subword units in the query are indentified and then locations of those subwords are determined based on the locations of those subword units determined earlier by the wordspotter, for example, using a pre-built inverted index that maps subword units to their locations.
    Type: Application
    Filed: August 27, 2008
    Publication date: March 5, 2009
    Applicant: NEXIDIA INC.
    Inventors: Jon A. Arrowood, Robert W. Morris, Mark Finlay, Scott A. Judy
  • Publication number: 20090048832
    Abstract: [Problems] To provide a speech-to-text system and the like capable of matching edit result text acquired by editing recognition result text or edit result text which is newly-written text information with speech data. [Means for Solving Problems] A speech-to-text system (1) includes a matching unit (27) which collates edit result text acquired by a text editor unit (22) with speech recognition result information having time information created by a speech recognition unit (11) to thereby match the edit result text and speech data.
    Type: Application
    Filed: November 8, 2006
    Publication date: February 19, 2009
    Applicant: Nec Corporation
    Inventor: Makoto Terao
  • Publication number: 20090048837
    Abstract: A system and method that utilizes common symbols for marking the tones of alphabet letters of different languages. The marking system and method employs the symbols from the standard English typing keyboard to denote tones. There are seven phonetic tone marks. Each mark represents a unique tone. The system can be applied to any alphabetic writing letters of different languages to denote specific language tones. The method makes it possible for alphabetic writing of any kind of language and for people to effectively capture the tones of words in different languages.
    Type: Application
    Filed: August 14, 2007
    Publication date: February 19, 2009
    Inventors: Ling Ju Su, Kuojui Su
  • Publication number: 20090048838
    Abstract: Provided is a system and method for building and managing a customized voice of an end-user, comprising the steps of designing a set of prompts for collection from the user, wherein the prompts are selected from both an analysis tool and by the user's own choosing to capture voice characteristics unique to the user. The prompts are delivered to the user over a network to allow the user to save a user recording on a server of a service provider. This recording is then retrieved and stored on the server and then set up on the server to build a voice database using text-to-speech synthesis tools. A graphical interface allows the user to continuously refine the data file to improve the voice and customize parameter and configuration settings, thereby forming a customized voice database which can be deployed or accessed.
    Type: Application
    Filed: May 29, 2008
    Publication date: February 19, 2009
    Inventors: Craig F. Campbell, Kevin A. Lenzo, Alexandre D. Cox
  • Publication number: 20090043581
    Abstract: This invention relates to a method of searching spoken audio data for one or more search terms comprising performing a phonetic search of the audio data to identify likely matches to a search term and producing textual data corresponding to a portion of the spoken audio data including a likely match. An embodiment of the method comprises the steps of taking phonetic index data corresponding to the spoken audio data, searching the phonetic index data for likely matches to the search term, wherein when a likely match is detected a portion of the spoken audio data or phonetic index data is selected which includes the likely match and said selected portion of the spoken audio data or phonetic index data is processed using a large vocabulary speech recogniser. The large vocabulary speech recogniser may derive textual data which can be used for further processing or may be used to present a transcript to a user.
    Type: Application
    Filed: August 7, 2008
    Publication date: February 12, 2009
    Applicant: AURIX LIMITED
    Inventors: Martin G. Abbott, Keith M. Ponting
  • Publication number: 20090030680
    Abstract: A method and system of indexing speech data. The method includes indexing word transcripts including a timestamp for a word occurrence; and indexing sub-word transcripts including a timestamp for a sub-word occurrence. A timestamp in the index indicates the time and duration of occurrence of the word or sub-word in the speech data, and word and sub-word occurrences can be correlated using the timestamps. A method of searching speech transcripts is also provided in which a search query in the form of a phrase to be searched includes at least one in-vocabulary word and at least one out-of-vocabulary word.
    Type: Application
    Filed: July 23, 2007
    Publication date: January 29, 2009
    Inventor: Jonathan Joseph Mamou
  • Publication number: 20090024392
    Abstract: A speech recognition dictionary making supporting system for efficiently making/updating a speech recognition dictionary/language model with reduced speech recognition errors by using text data available at low cost. The speech recognition dictionary making supporting system comprises a recognition dictionary storage section (105), a language model storage section (106), and a sound model storage section (107). A virtual speech recognizing section (102) creates virtual speech recognition result text data in regard to an analyzed text data created by a text analyzing section (101) with reference to a recognition dictionary, language model, and sound model, and compares the virtual speech recognition result text data with the original analyzed text data. An updating section (103) updates the recognition dictionary and language model so that the different portions in both the text data may be lessened.
    Type: Application
    Filed: February 2, 2007
    Publication date: January 22, 2009
    Applicant: NEC CORPORATION
    Inventor: Takafumi Koshinaka
  • Patent number: 7480616
    Abstract: Information relating to an amount of muscle activity is extracted from a myo-electrical signal by activity amount information extraction means, and information recognition is performed by activity amount information recognition means using the information relating to the amount of muscle activity of a speaker. There is a prescribed correspondence relationship between the amount of muscle activity of a speaker and a phoneme uttered by a speaker, so the content of an utterance can be recognized with a high recognition rate by information recognition using information relating to an amount of muscle activity.
    Type: Grant
    Filed: February 27, 2003
    Date of Patent: January 20, 2009
    Assignee: NTT DoCoMo, Inc.
    Inventors: Hiroyuki Manabe, Akira Hiraiwa, Toshiaki Sugimura
  • Patent number: 7472061
    Abstract: Systems and methods are provided for automatically building a native phonetic lexicon for a speech-based application trained to process a native (base) language, wherein the native phonetic lexicon includes native phonetic transcriptions (base forms) for non-native (foreign) words which are automatically derived from non-native phonetic transcriptions of the non-native words.
    Type: Grant
    Filed: March 31, 2008
    Date of Patent: December 30, 2008
    Assignee: International Business Machines Corporation
    Inventors: Neal Alewine, Eric Janke, Paul Sharp, Roberto Sicconi
  • Publication number: 20080319749
    Abstract: A system and method for creating a mnemonics Language Model for use with a speech recognition software application, wherein the method includes generating an n-gram Language Model containing a predefined large body of characters, wherein the n-gram Language Model includes at least one character from the predefined large body of characters, constructing a new Language Model (LM) token for each of the at least one character, extracting pronunciations for each of the at least one character responsive to a predefined pronunciation dictionary to obtain a character pronunciation representation, creating at least one alternative pronunciation for each of the at least one character responsive to the character pronunciation representation to create an alternative pronunciation dictionary and compiling the n-gram Language Model for use with the speech recognition software application, wherein compiling the Language Model is responsive to the new Language Model token and the alternative pronunciation dictionary.
    Type: Application
    Filed: July 11, 2008
    Publication date: December 25, 2008
    Applicant: MICROSOFT CORPORATION
    Inventors: David Mowatt, Robert Chambers, Ciprian Chelba, Qiang Wu
  • Patent number: 7467086
    Abstract: A system and method for effectively performing speech recognition procedures includes enhanced demiphone acoustic models that a speech recognition engine utilizes to perform the speech recognition procedures. The enhanced demiphone acoustic models each have three states that are collectively arranged to form a preceding demiphone and a succeeding demiphone. An acoustic model generator may utilize a decision tree for analyzing speech context information from a training database. The acoustic model generator then effectively configures each of the enhanced demiphone acoustic models as either a succeeding-dominant enhanced demiphone acoustic model or a preceding-dominant enhanced demiphone acoustic model to accurately model speech characteristics.
    Type: Grant
    Filed: December 16, 2004
    Date of Patent: December 16, 2008
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Xavier Menendez-Pidal, Lex S. Olorenshaw, Gustavo Hernandez Abrego
  • Publication number: 20080294441
    Abstract: The invention deals with speech recognition, such as a system for recognizing words in continuous speech. A speech recognition system is disclosed which is capable of recognizing a huge number of words, and in principle even an unlimited number of words. The speech recognition system comprises a word recognizer for deriving a best path through a word graph, and wherein words are assigned to the speech based on the best path. The word score being obtained from applying a phonemic language model to each word of the word graph. Moreover, the invention deals with an apparatus and a method for identifying words from a sound block and to computer readable code for implementing the method.
    Type: Application
    Filed: December 6, 2006
    Publication date: November 27, 2008
    Inventor: Zsolt Saffer
  • Patent number: 7457751
    Abstract: A speech recognition system and method are provided to correctly distinguish among multiple interpretations of an utterance. This system is particularly useful when the set of possible interpretations is large, changes dynamically, and/or contains items that are not phonetically distinctive. The speech recognition system extends the capabilities of mobile wireless communication devices that are voice operated after their initial activation.
    Type: Grant
    Filed: November 30, 2004
    Date of Patent: November 25, 2008
    Assignee: Vocera Communications, Inc.
    Inventor: Robert E. Shostak
  • Patent number: 7453994
    Abstract: A method of presenting instructions to a user sending an incoming communication to a service center includes presenting a menu to the user. The menu includes a plurality of procedure descriptors to the user. The user is presented, according to a selection of one of the procedure descriptors by the user, a sequence of instructions which enable completion of a procedure described by the selected procedure descriptor. The incoming communication is transferred at a position in the sequence of instructions to a representative. The incoming communication is also transferred back to the same position in the sequence of instructions.
    Type: Grant
    Filed: October 22, 2007
    Date of Patent: November 18, 2008
    Assignee: AT&T Labs, Inc.
    Inventors: Philip Ted Kortum, Robert R. Bushey
  • Patent number: 7451125
    Abstract: A system, a method, and a machine-readable medium are provided. A group of linear rules and associated weights are provided as a result of machine learning. Each one of the group of linear rules is partitioned into a respective one of a group of types of rules. A respective transducer for each of the linear rules is compiled. A combined finite state transducer is created from a union of the respective transducers compiled from the linear rules.
    Type: Grant
    Filed: November 7, 2005
    Date of Patent: November 11, 2008
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Srinivas Bangalore
  • Patent number: 7447634
    Abstract: A recognizing target vocabulary comparing unit calculates a compared likelihood of recognizing target vocabulary, i.e., a compared likelihood of registered vocabulary, by using the time series of the amount of characteristics of an input speech. An environment adaptive noise model comparing unit obtains a likelihood that respective recognizing-unit standard patterns coincide with a time series of the amount of characteristics representing the characteristics of the input speed.
    Type: Grant
    Filed: June 11, 2007
    Date of Patent: November 4, 2008
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Ryosuke Koshiba
  • Publication number: 20080270129
    Abstract: A method for automatically providing a hypothesis of a linguistic formulation that is uttered by users of a voice service based on an automatic speech recognition system and that is outside a recognition domain of the automatic speech recognition system. The method includes providing a constrained and an unconstrained speech recognition from an input speech signal, identifying a part of the constrained speech recognition outside the recognition domain, identifying a part of the unconstrained speech recognition corresponding to the identified part of the constrained speech recognition, and providing the linguistic formulation hypothesis based on the identified part of the unconstrained speech recognition.
    Type: Application
    Filed: February 17, 2005
    Publication date: October 30, 2008
    Applicant: Loquendo S.p.A.
    Inventors: Daniele Colibro, Claudio Vair, Luciano Fissore, Cosmin Popovici
  • Patent number: 7440897
    Abstract: In an embodiment, a lattice of phone strings in an input communication of a user may be recognized, wherein the lattice may represent a distribution over the phone strings. Morphemes in the input communication of the user may be detected using the recognized lattice. Task-type classification decisions may be made based on the detected morphemes in the input communication of the user.
    Type: Grant
    Filed: May 27, 2006
    Date of Patent: October 21, 2008
    Assignee: AT&T Corp.
    Inventors: Allen Louis Gorin, Dijana Petrovska-Delacretaz, Giuseppe Riccardi, Jeremy Huntley Wright
  • Publication number: 20080228485
    Abstract: The aural similarity measuring system and method provides a measure of the aural similarity between a target text (10) and one or more reference texts (11). Both the target text (10) and the reference texts (11) are converted into a string of phonemes (15) and then one or other of the phoneme strings are adjusted (16) so that both are equal in length. The phoneme strings are compared (12) and a score generated representative of the degree of similarity of the two phoneme strings. Finally, where there is a plurality of reference texts the similarity scores for each of the reference texts are ranked (13). With this aural similarity measuring system the analysis is automated thereby reducing risks of errors and omissions. Moreover, the system provides an objective measure of aural similarity enabling consistency of comparison in results and reproducibility of results.
    Type: Application
    Filed: March 5, 2008
    Publication date: September 18, 2008
    Applicant: MONGOOSE VENTURES LIMITED
    Inventor: Mark Owen
  • Publication number: 20080215328
    Abstract: The invention concerns a method and system for detecting morphemes in a user's communication. The method may include recognizing a lattice of phone strings from the user's input communication, the lattice representing a distribution over the phone strings, and detecting morphemes in the user's input communication using the lattice. The morphemes may be acoustic and/or non-acoustic. The morphemes may represent any unit or sub-unit of communication including phones, diphones, phone-phrases, syllables, grammars, words, gestures, tablet strokes, body movements, mouse clicks, etc. The training speech may be verbal, non-verbal, a combination of verbal and non-verbal, or multimodal.
    Type: Application
    Filed: September 13, 2007
    Publication date: September 4, 2008
    Applicant: AT&T Corp.
    Inventors: Allen Louis Gorin, Dijana Petrovska-Delacretaz, Giuseppe Riccardi, Jeremy Huntley Wright
  • Patent number: 7418385
    Abstract: This voice detection device is composed of a myoelectric signal acquisition part for acquiring, from a plurality of regions, myoelectric signals generated at the time of a vocalization operation, a parameter calculation part for calculating, as parameters, the fluctuations of the acquired myoelectric signals relative to a predetermined value in every channel corresponding to one of the plurality of regions, a vowel vocalization recognition part for specifying the vowel vocalization operation timing at the time of the vocalization operation, based on the fluctuations of the calculated parameters, and a vowel specification part for specifying a vowel corresponding to the vocalization operation, based on the fluctuation condition of the parameters before and after the specified vocalization operation timing in every channel.
    Type: Grant
    Filed: June 18, 2004
    Date of Patent: August 26, 2008
    Assignee: NTT DoCoMo, Inc.
    Inventors: Hiroyuki Manabe, Yumiko Hiraiwa, legal representative, Kouki Hayashi, Takashi Ninjouji, Toshiaki Sugimura, Akira Hiraiwa
  • Publication number: 20080201147
    Abstract: Provided are a distributed speech recognition system, a distributed speech recognition speech method, and a terminal and a server for distributed speech recognition. The distributed speech recognition system includes a terminal which decodes a feature vector that is extracted from an input speech signal into a sequence of phonemes and generates the final recognition result by rescoring a candidate list provided from the outside; and a server which generates the candidate list by performing symbol matching on the recognized sequence of phonemes provided from the terminal and transmits the candidate list for the rescoring to the terminal.
    Type: Application
    Filed: July 13, 2007
    Publication date: August 21, 2008
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Ick-sang Han, Kyu-hong Kim, Jeong-su Kim
  • Patent number: 7415408
    Abstract: A recognizing target vocabulary comparing unit calculates a compared likelihood of recognizing target vocabulary, i.e., a compared likelihood of registered vocabulary, by using the time series of the amount of characteristics of an input speech. An environment adaptive noise model comparing unit compares the time series of the amount of characteristics with one recognizing standard pattern or with two or more combined recognizing standard patterns one-by-one to obtain a likelihood that respective environment adaptive noise models coincide with the time series of the amount of characteristics. A rejection determining unit determining unit determines whether or not the input signal is a noise by comparing the likelihood obtained by the recognizing target vocabulary comparing step with the likelihood obtained by the environment adaptive noise model comparing step.
    Type: Grant
    Filed: June 11, 2007
    Date of Patent: August 19, 2008
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Ryosuke Koshiba
  • Patent number: 7412386
    Abstract: A method, apparatus, computer program product and service for directory dialer name recognition. The directory dialer has a directory of names and a first name grammar and a second name grammar representing phonetic baseforms of first names and second names respectively. The method includes: receiving voice data for a spoken name after requesting a user to speak the required name; extracting a set of phonetic baseforms for the voice data; and finding the best matches between the extracted set of phonetic baseforms voice data and any combination of the first name grammar and the second name grammar. The method can further include: checking the best match against the directory of names; if the best match does not exist in the directory, informing the user and prompting the next best match as an alternative; and if the best match does exist in the directory, forwarding the call to that best match.
    Type: Grant
    Filed: November 24, 2004
    Date of Patent: August 12, 2008
    Assignee: International Business Machines Corporation
    Inventors: Eric William Janke, Keith Sloan
  • Patent number: 7409341
    Abstract: A recognizing target vocabulary comparing unit calculates a compared likelihood of recognizing target vocabulary, i.e., a compared likelihood of registered vocabulary, by using the time series of the amount of characteristics of an input speech. An environment adapted noise model comparing unit compares the time series of the amount of characteristics with one recognizing standard pattern or with two or more combined recognizing standard patterns one-by-one to obtain a likelihood that respective environment adaptive noise models coincide with the time series of the amount of characteristics. A rejection determining unit determines whether or not the input signal is noise by comparing the likelihood obtained by the recognizing target vocabulary comparing step with the likelihood obtained by the environment adaptive noise model comparing step.
    Type: Grant
    Filed: June 11, 2007
    Date of Patent: August 5, 2008
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Ryosuke Koshiba
  • Patent number: 7409345
    Abstract: Techniques for improving an automatic baseform generation system. More particularly, the invention provides techniques for reducing insertion of spurious speech events in a word or phone sequence generated by an automatic baseform generation system. Such automatic baseform generation techniques may be accomplished by enhancing the scores of long-lasting speech events with respect to the scores of short-lasting events. For example, this may be achieved by merging competing candidates that relate to the same speech event (e.g., phone or word) and that overlap in time into a single candidate, the score of which may be equal to the sum of the scores of the merged candidates.
    Type: Grant
    Filed: April 4, 2003
    Date of Patent: August 5, 2008
    Assignee: International Business Machines Corporation
    Inventors: Sabine V. Deligne, Lidia L. Mangu
  • Patent number: 7409346
    Abstract: A structured generative model of a speech coarticulation and reduction is described with a novel two-stage implementation. At the first stage, the dynamics of formants or vocal tract resonance (VTR) are generated using prior information of resonance targets in the phone sequence. Bi-directional temporal filtering with finite impulse response (FIR) is applied to the segmental target sequence as the FIR filter's input. At the second stage the dynamics of speech cepstra are predicted analytically based on the FIR filtered VTR targets. The combined system of these two stages thus generates correlated and causally related VTR and cepstral dynamics where phonetic reduction is represented explicitly in the hidden resonance space and implicitly in the observed cepstral space. The combined system also gives the acoustic observation probability given a phone sequence. Using this probability, different phone sequences can be compared and ranked in terms of their respective probability values.
    Type: Grant
    Filed: March 1, 2005
    Date of Patent: August 5, 2008
    Assignee: Microsoft Corporation
    Inventors: Alejandro Acero, Dong Yu, Li Deng
  • Publication number: 20080183471
    Abstract: A system and method of recognizing speech comprises an audio receiving element and a computer server. The audio receiving element and the computer server perform the process steps of the method. The method involves training a stored set of phonemes by converting them into n-dimensional space, where n is a relatively large number. Once the stored phonemes are converted, they are transformed using single value decomposition to conform the data generally into a hypersphere. The received phonemes from the audio-receiving element are also converted into n-dimensional space and transformed using single value decomposition to conform the data into a hypersphere. The method compares the transformed received phoneme to each transformed stored phoneme by comparing a first distance from a center of the hypersphere to a point associated with the transformed received phoneme and a second distance from the center of the hypersphere to a point associated with the respective transformed stored phoneme.
    Type: Application
    Filed: March 28, 2008
    Publication date: July 31, 2008
    Applicant: AT&T Corp.
    Inventor: Bishnu Saroop Atal
  • Publication number: 20080177544
    Abstract: The invention concerns a method and system for detecting morphemes in a user's communication. The method may include recognizing a lattice of phone strings from the user's input communication, the lattice representing a distribution over the phone strings, and detecting morphemes in the user's input communication using the lattice. The morphemes may be acoustic and/or non-acoustic. The morphemes may represent any unit or sub-unit of communication including phones, diphones, phone-phrases, syllables, grammars, words, gestures, tablet strokes, body movements, mouse clicks, etc. The training speech may be verbal, non-verbal, a combination of verbal and non-verbal, or multimodal.
    Type: Application
    Filed: September 13, 2007
    Publication date: July 24, 2008
    Applicant: AT&T Corp.
    Inventors: Allen Louis Gorin, Dijana Petrovska-Delacretaz, Giuseppe Riccardi, Jeremy Huntley Wright
  • Publication number: 20080172224
    Abstract: A representation of a speech signal is received and is decoded to identify a sequence of position-dependent phonetic tokens wherein each token comprises a phone and a position indicator that indicates the position of the phone within a syllable.
    Type: Application
    Filed: January 11, 2007
    Publication date: July 17, 2008
    Applicant: Microsoft Corporation
    Inventors: Peng Liu, Yu Shi, Frank Kao-ping Soong
  • Patent number: 7401019
    Abstract: A method of searching audio data is provided including receiving a query defining multiple phonetic possibilities. The method also includes comparing the query with a lattice of phonetic hypotheses associated with the audio data to identify if at least one of the multiple phonetic possibilities is approximated by at least one phonetic hypothesis in the lattice of phonetic hypotheses.
    Type: Grant
    Filed: January 15, 2004
    Date of Patent: July 15, 2008
    Assignee: Microsoft Corporation
    Inventors: Frank T. Seide, Eric I-Chao Chang
  • Publication number: 20080167873
    Abstract: A method for pronunciation of English alphas according to the indications at different orientations of the alpha, comprises the steps of: dividing an area around an alpha into six sections, indicating short sounds, long sounds and strong sounds by points, lines and slashes; that put a small piece of line (in different angle) or point on an alpha indicating that it is pronounced by the pronunciation of another alpha; using underlines to indicate long sounds and short sounds of phonetic symbols of a set of double alphas; using a delete line to indicate that the alpha will not be pronounced, using a space area to divide syllables of a word; using a vertical cut line to indicate that one alpha is pronounced by two sounds; indicating an original sound line at an upper side of the first stroke to represents that the alpha is pronounced with an original sound; and a “?” under a double alpha set representing that the alpha is pronounced with a reverse sound.
    Type: Application
    Filed: January 8, 2007
    Publication date: July 10, 2008
    Inventor: Wei-Chou Su
  • Patent number: 7392178
    Abstract: The present invention is a preprocessing apparatus including a voice input apparatus for acquiring an uttered voice, an analog-digital conversion apparatus for converting the acquired uttered voice to digital voice data, and a comparator for selecting voice data, having a level which is equal to or higher than a certain level, from the digital voice data and for outputting the selected voice data. The preprocessing apparatus also includes a voice data cutout apparatus capable of cutting out voice data having a level which is equal to or higher than a certain level output from the comparator, while taking a phoneme as a unit, and a voice data output apparatus for outputting voice data of the phoneme unit output from the voice data cutout apparatus.
    Type: Grant
    Filed: February 26, 2003
    Date of Patent: June 24, 2008
    Assignees: Electronic Navigation Research Institute, An Independent Administration Institution, Mitsubishi Space Software Co., Ltd.
    Inventors: Kakuichi Shiomi, Naritomo Meguro, Tomoya Maruyama
  • Patent number: 7392189
    Abstract: A speech recognition system for processing voice inputs from a user to select a list element from a list or group of list elements. Recognition procedures are carried out on the voice input of the user. One recognition procedure separates the voice input of a whole word into at least one sequence of speech subunits to produce a vocabulary of list elements. Another recognition procedure compares the voice input of the whole word with the vocabulary of list elements.
    Type: Grant
    Filed: February 21, 2003
    Date of Patent: June 24, 2008
    Assignee: Harman Becker Automotive Systems GmbH
    Inventors: Marcus Hennecke, Walter Koch, Gerhard Nüβle, Richard Reng
  • Publication number: 20080147403
    Abstract: A method, system and article of manufacture of recognizing a voice command. One embodiment of the invention comprises: receiving a voice input; using the number of sound fragments, determining a number of sound fragments to be processed in a first set of sound fragments; determining whether the first set of sound fragments of the voice input matches with the first set of sound fragments of a voice command; and if the first set of sound fragments matches with the first set of sound fragments of the voice command, then determining whether one or more remaining sound fragments matches with one or more remaining sound fragments of the voice command.
    Type: Application
    Filed: March 3, 2008
    Publication date: June 19, 2008
    Inventors: Joseph Herbert McIntyre, Victor S. Moore
  • Publication number: 20080133239
    Abstract: Provided are an apparatus and method for recognizing continuous speech using search space restriction based on phoneme recognition. In the apparatus and method, a search space can be primarily reduced by restricting connection words to be shifted at a boundary between words based on the phoneme recognition result. In addition, the search space can be secondarily reduced by rapidly calculating a degree of similarity between the connection word to be shifted and the phoneme recognition result using a phoneme code and shifting the corresponding phonemes to only connection words having degrees of similarity equal to or higher than a predetermined reference value. Therefore, the speed and performance of the speech recognition process can be improved in various speech recognition services.
    Type: Application
    Filed: December 4, 2007
    Publication date: June 5, 2008
    Inventors: Hyung Bae Jeon, Jun Park, Seung Hi Kim, Kyu Woong Hwang
  • Publication number: 20080126093
    Abstract: An apparatus for providing a language based interactive multimedia system includes a selection element, a comparison element and a processing element. The selection element may be configured to select a phoneme graph based on a type of speech processing associated with an input sequence of phonemes. The comparison element may be configured to compare the input sequence of phonemes to the selected phoneme graph. The processing element may be in communication with the comparison element and configured to process the input sequence of phonemes based on the comparison.
    Type: Application
    Filed: November 28, 2006
    Publication date: May 29, 2008
    Inventor: Sunil Sivadas
  • Publication number: 20080120108
    Abstract: Performing speech recognition on a tonal language is done using a plurality of tonal models. Each tonal model has a multi-space distribution and corresponds to a known syllable in a language. A first data stream indicative of an observation of an utterance is received. The observation has both a discrete and a continuous tonal feature. A second data stream indicative of spectral features of a syllable of an utterance is also received. The first data stream is compared against at least one of the plurality of tonal models and the second data stream is compared against a spectral model.
    Type: Application
    Filed: November 16, 2006
    Publication date: May 22, 2008
    Inventors: Frank Kao-Ping Soong, Yao Qian
  • Patent number: 7376648
    Abstract: A computer-implemented method for selecting a desired Roman or non-Roman-alphabet character or objects from a set of non-Roman characters or objects may include steps of providing an association database that includes, for each non-Roman-alphabet character of the set, a Roman alphabet or other phonetic transliteration associated with each said non-Roman-alphabet character and a plurality of entries that are associated with each said non-Roman-alphabet character; receiving a phonetic transliteration of the desired non-Roman-alphabet character or data object and at least one associated entry that is associated with the desired non-Roman-alphabet character or other similar symbolic input; accessing the association database and identifying as candidate characters those characters of the set that are associated with the received phonetic transliteration and with the at least one received associated entry; if a number of candidate characters is greater than one, receiving additional associated entries and repeating
    Type: Grant
    Filed: October 20, 2004
    Date of Patent: May 20, 2008
    Assignee: Oracle International Corporation
    Inventor: Richard C. Johnson
  • Publication number: 20080114598
    Abstract: A motor vehicle has a speech interface for an acoustic input of commands for operating the motor vehicle or a module of the motor vehicle. The speech interface includes a speech recognition database in which a substantial portion of commands or command components, which can be input, are stored in a version according to a pronunciation in a first language and in a version according to a pronunciation in at least a second language, and a speech recognition engine for automatically comparing an acoustic command to commands and/or command components, which are stored in the speech recognition database, in a version according to the pronunciation in the first language and to commands and/or command components, which are stored in the speech recognition database, in a version according to the pronunciation in the second language.
    Type: Application
    Filed: November 9, 2006
    Publication date: May 15, 2008
    Applicant: Volkswagen of America, Inc.
    Inventors: Ramon Prieto, M. Kashif Imam, Carsten Bergmann, Wai Yin Cheung, Carly Williams
  • Patent number: 7369993
    Abstract: A system and method of recognizing speech comprises an audio receiving element and a computer server. The audio receiving element and the computer server perform the process steps of the method. The method involves training a stored set of phonemes by converting them into n-dimensional space, where n is a relatively large number. Once the stored phonemes are converted, they are transformed using single value decomposition to conform the data generally into a hypersphere. The received phonemes from the audio-receiving element are also converted into n-dimensional space and transformed using single value decomposition to conform the data into a hypersphere. The method compares the transformed received phoneme to each transformed stored phoneme by comparing a first distance from a center of the hypersphere to a point associated with the transformed received phoneme and a second distance from the center of the hypersphere to a point associated with the respective transformed stored phoneme.
    Type: Grant
    Filed: December 29, 2006
    Date of Patent: May 6, 2008
    Assignee: AT&T Corp.
    Inventor: Bishnu Saroop Atal
  • Publication number: 20080103774
    Abstract: A method of and a system for processing speech. A spoken utterance of a plurality of characters can be received. A plurality of known character sequences that potentially correspond to the spoken utterance can be selected. Each selected known character sequence can be scored based on, at least in part, a weighting of individual characters that comprise the known character sequence.
    Type: Application
    Filed: October 30, 2006
    Publication date: May 1, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Kenneth D. White
  • Publication number: 20080091428
    Abstract: The present invention provides, among other things, automatic identification of near-redundant units in a large TTS voice table, identifying which units are distinctive enough to keep and which units are sufficiently redundant to discard. According to an aspect of the invention, pruning is treated as a clustering problem in a suitable feature space. All instances of a given unit (e.g. word or characters expressed as Unicode strings) are mapped onto the feature space, and cluster units in that space using a suitable similarity measure. Since all units in a given cluster are, by construction, closely related from the point of view of the measure used, they are suitably redundant and can be replaced by a single instance. The disclosed method can detect near-redundancy in TTS units in a completely unsupervised manner, based on an original feature extraction and clustering strategy.
    Type: Application
    Filed: October 10, 2006
    Publication date: April 17, 2008
    Inventor: Jerome R. Bellegarda
  • Publication number: 20080091427
    Abstract: Systems and methods are provided for compressing data models, for example, N-gram language models used in speech recognition applications. Words in the vocabulary of the language model are assigned to classes of words, for example, by syntactic criteria, semantic criteria, or statistical analysis of an existing language model. After word classes are defined, the follower lists for words in the vocabulary may be stored as hierarchical sets of class indexes and word indexes within each class. Hierarchical word indexes may reduce the storage requirements for the N-gram language model by more efficiently representing multiple words in a single list in the same follower list.
    Type: Application
    Filed: October 11, 2006
    Publication date: April 17, 2008
    Applicant: Nokia Corporation
    Inventor: Jesper Olsen
  • Publication number: 20080082337
    Abstract: A method and system for obtaining a pool of speech syllable models. The model pool is generated by first detecting a training segment using unsupervised speech segmentation or speech unit spotting. If the model pool is empty, a first speech syllable model is trained and added to the model pool. If the model pool is not empty, an existing model is determined from the model pool that best matches the training segment. Then the existing module is scored for the training segment. If the score is less than a predefined threshold, a new model for the training segment is created and added to the pool. If the score equals the threshold or is larger than the threshold, the training segment is used to improve or to re-estimate the model.
    Type: Application
    Filed: September 21, 2007
    Publication date: April 3, 2008
    Applicant: HONDA RESEARCH INSTITUTE EUROPE GMBH
    Inventors: Frank Joublin, Holger Brandl
  • Publication number: 20080082336
    Abstract: Included are embodiments for providing speech analysis. At least one embodiment of a method includes receiving audio data associated with a communication and providing the at least one phoneme in a phonetic transcript, the phonetic transcript including at least one character from a phonetic alphabet.
    Type: Application
    Filed: September 29, 2006
    Publication date: April 3, 2008
    Inventors: Gary Duke, Joseph Watson
  • Publication number: 20080082335
    Abstract: A method and a system for automatically converting alphabetic words into a plurality of independent spellings. The method can include parsing textual input to identify at least one word and converting the word into a first word object having a first spelling including letter objects. The method also can include converting the word into a second word object having a second spelling including phonetic objects, each of the phonetic objects correlating to at least one of the letter objects. Further, the first word object and the second word object can be presented in a visual field such that each of the phonetic objects is visually associated with the letter object to which it correlates.
    Type: Application
    Filed: September 28, 2006
    Publication date: April 3, 2008
    Inventor: Howard Engelsen
  • Patent number: 7353174
    Abstract: The present invention comprises a system and method for effectively implementing a Mandarin Chinese speech recognition dictionary, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Mandarin Chinese phone set. The optimized Mandarin Chinese phone set may efficiently be implemented by utilizing an allophone and phonemic variation technique. In addition, the foregoing vocabulary dictionary may be implemented by utilizing unified dictionary optimization techniques to provide robust and accurate speech recognition. Furthermore, the vocabulary dictionary may be implemented as an optimized dictionary to accurately recognize either Northern Mandarin Chinese speech or Southern Mandarin Chinese speech during the speech recognition procedure.
    Type: Grant
    Filed: March 31, 2003
    Date of Patent: April 1, 2008
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Xavier Menendez-Pidal, Lei Duan, Jingwen Lu, Lex Olorenshaw
  • Patent number: 7353173
    Abstract: The present invention comprises a system and method for implementing a Mandarin Chinese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Mandarin Chinese phone set. The optimized Mandarin Chinese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Mandarin Chinese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Mandarin Chinese speech during the speech recognition procedure.
    Type: Grant
    Filed: March 31, 2003
    Date of Patent: April 1, 2008
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Xavier Menendez-Pidal, Lei Duan, Jingwen Lu, Lex Olorenshaw