Creation Of Reference Templates; Training Of Speech Recognition Systems, E.g., Adaption To The Characteristics Of The Speaker's Voice, Etc. (epo) Patents (Class 704/E15.007)
  • Publication number: 20100312556
    Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for speaker recognition personalization. The method recognizes speech received from a speaker interacting with a speech interface using a set of allocated resources, the set of allocated resources including bandwidth, processor time, memory, and storage. The method records metrics associated with the recognized speech, and after recording the metrics, modifies at least one of the allocated resources in the set of allocated resources commensurate with the recorded metrics. The method recognizes additional speech from the speaker using the modified set of allocated resources. Metrics can include a speech recognition confidence score, processing speed, dialog behavior, requests for repeats, negative responses to confirmations, and task completions.
    Type: Application
    Filed: June 9, 2009
    Publication date: December 9, 2010
    Applicant: AT & T Intellectual Property I , L.P.
    Inventors: Andrej LJOLJE, Alistair D. CONKIE, Ann K. SYRDAL
  • Publication number: 20100250235
    Abstract: In one example, a phrase analyzer may analyze a text input stream to identify phrases contained in the text input stream. The phrase analyzer may receive a specification, which includes dictionaries of phrases and synonyms, and a specification of the phrases, or sequences of phrases to be matched. The phrase analyzer may compare the input stream to the specification and may produce, as output, an identification of which phrases appear in the input stream, and where in the input stream those phrases occur.
    Type: Application
    Filed: March 24, 2009
    Publication date: September 30, 2010
    Applicant: MICROSOFT CORPORATION
    Inventor: Umesh Madan
  • Publication number: 20100250251
    Abstract: Architecture that suppresses the unexpected appearance of words by applying appropriate restrictions to long-term and short-term memory. The quickness of adaptation is also realized by leveraging the restriction. The architecture includes a history component for processing user input history for conversion of a phonetic string by a conversion process that output conversion results, and an adaptation component for adapting the conversion process to the user input history based on restriction(s) applied to short-term memory that impacts word appearances during the conversion process. The architecture performs probability boosting based on context-dependent probability differences (short-term memory), and dynamic linear-interpolation between long-term memory and baseline language model based on frequency of preceding context of word (long-term memory).
    Type: Application
    Filed: March 30, 2009
    Publication date: September 30, 2010
    Applicant: Microsoft Corporation
    Inventors: Katsutoshi Ohtsuki, Takashi Umeoka
  • Publication number: 20100241430
    Abstract: Disclosed are systems and methods for providing a spoken dialog system using meta-data to build language models to improve speech processing. Meta-data is generally defined as data outside received speech; for example, meta-data may be a customer profile having a name, address and purchase history of a caller to a spoken dialog system. The method comprises building tree clusters from meta-data and estimating a language model using the built tree clusters. The language model may be used by various modules in the spoken dialog system, such as the automatic speech recognition module and/or the dialog management module. Building the tree clusters from the meta-data may involve generating projections from the meta-data and further may comprise computing counts as a result of unigram tree clustering and then building both unigram trees and higher-order trees from the meta-data as well as computing node distances within the built trees that are used for estimating the language model.
    Type: Application
    Filed: June 3, 2010
    Publication date: September 23, 2010
    Applicant: AT&T Intellectual Property II, L.P., via transfer from AT&T Corp.
    Inventors: Michiel A. U. Bacchiani, Brian E. Roark
  • Publication number: 20100204990
    Abstract: A speech analyzer includes a vocal tract and sound source separating unit which separates a vocal tract feature and a sound source feature from an input speech, based on a speech generation model, a fundamental frequency stability calculating unit which calculates a temporal stability of a fundamental frequency of the input speech in the sound source feature, from the separated sound source feature, a stable analyzed period extracting unit which extracts time information of a stable period, based on the temporal stability, and a vocal tract feature interpolation unit which interpolates a vocal tract feature which is not included in the stable period, using a vocal tract feature included in the extracted stable period.
    Type: Application
    Filed: May 3, 2010
    Publication date: August 12, 2010
    Inventors: Yoshifumi Hirose, Takahiro Kamai
  • Publication number: 20100169093
    Abstract: An information processing apparatus for speech recognition includes a first speech dataset storing speech data uttered by low recognition rate speakers; a second speech dataset storing speech data uttered by a plurality of speakers; a third speech dataset storing speech data to be mixed with the speech data of the second speech dataset; a similarity calculating part obtaining, for each piece of the speech data in the second speech dataset, a degree of similarity to a given average voice in the first speech dataset; a speech data selecting part recording the speech data, the degree of similarity of which is within a given selection range, as selected speech data in the third speech dataset; and an acoustic model generating part generating a first acoustic model using the speech data recorded in the second speech dataset and the third speech dataset.
    Type: Application
    Filed: December 22, 2009
    Publication date: July 1, 2010
    Applicant: FUJITSU LIMITED
    Inventor: Nobuyuki Washio
  • Publication number: 20100169094
    Abstract: A speaker adaptation apparatus includes an acquiring unit configured to acquire an acoustic model including HMMs and decision trees for estimating what type of the phoneme or the word is included in a feature value used for speech recognition, the HMMs having a plurality of states on a phoneme-to-phoneme basis or a word-to-word basis, and the decision trees being configured to reply to questions relating to the feature value and output likelihoods in the respective states of the HMMs, and a speaker adaptation unit configured to adapt the decision trees to a speaker, the decision trees being adapted using speaker adaptation data vocalized by the speaker of an input speech.
    Type: Application
    Filed: September 17, 2009
    Publication date: July 1, 2010
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masami Akamine, Jitendra Ajmera, Partha Lal
  • Publication number: 20100161327
    Abstract: A computer-implemented method for automatically analyzing, predicting, and/or modifying acoustic units of prosodic human speech utterances for use in speech synthesis or speech recognition. Possible steps include: initiating analysis of acoustic wave data representing the human speech utterances, via the phase state of the acoustic wave data; using one or more phase state defined acoustic wave metrics as common elements for analyzing, and optionally modifying, pitch, amplitude, duration, and other measurable acoustic parameters of the acoustic wave data, at predetermined time intervals; analyzing acoustic wave data representing a selected acoustic unit to determine the phase state of the acoustic unit; and analyzing the acoustic wave data representing the selected acoustic unit to determine at least one acoustic parameter of the acoustic unit with reference to the determined phase state of the selected acoustic unit. Also included are systems for implementing the described and related methods.
    Type: Application
    Filed: December 16, 2009
    Publication date: June 24, 2010
    Inventors: Nishant CHANDRA, Reiner Wilhelms-Tricarico, Rattima Nitisaroj, Brian Mottershead, Gary A. Marple, John B. Reichenbach
  • Publication number: 20100153108
    Abstract: The present invention is a system and method for generating a personal voice font including, monitoring voice segments automatically from phone conversations of a user by a voice learning processor to generate a personalized voice font and delivering the personalized voice font (PVF) to the a server.
    Type: Application
    Filed: February 10, 2009
    Publication date: June 17, 2010
    Inventors: Zsolt Szalai, Philippe Bazot, Bernard Pucci, Joel Viale
  • Publication number: 20100138222
    Abstract: A method for adapting a codebook for speech recognition, wherein the codebook is from a set of codebooks comprising a speaker-independent codebook and at least one speaker-dependent codebook is disclosed. A speech input is received and a feature vector based on the received speech input is determined. For each of the Gaussian densities, a first mean vector is estimated using an expectation process and taking into account the determined feature vector. For each of the Gaussian densities, a second mean vector using an Eigenvoice adaptation is determined taking into account the determined feature vector. For each of the Gaussian densities, the mean vector is set to a convex combination of the first and the second mean vector. Thus, this process allows for adaptation during operation and does not require a lengthy training phase.
    Type: Application
    Filed: November 20, 2009
    Publication date: June 3, 2010
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Tobias Herbig, Franz Gerl
  • Publication number: 20100131272
    Abstract: Apparatuses and methods for generating and verifying a voice signature of a message and computer readable medium thereof are provided. The generation and verification ends both use the same set of pronounceable symbols. The set of pronounceable symbols comprises a plurality of pronounceable units, and each of the pronounceable units comprises an index and a pronounceable symbol. The generation end converts the message into a message digest by a hash function and generates a plurality of designated pronounceable symbols according to the message digest. A user utters the designated pronounceable symbols to generate the voice signature. After receiving the message and the voice signature, the verification end performs voice authentication to determine a user identity of the voice signature, performs speech recognition to determine the relation between the message and the voice signature, and determines whether the user generates the voice signature for the message.
    Type: Application
    Filed: January 6, 2009
    Publication date: May 27, 2010
    Applicant: INSTITUTE FOR INFORMATION INDUSTRY
    Inventor: Jui-Ming WU
  • Publication number: 20100121640
    Abstract: The present invention relates to a method for modeling a common-language speech recognition, by a computer, under the influence of multiple dialects and concerns a technical field of speech recognition by a computer. In this method, a triphone standard common-language model is first generated based on training data of standard common language, and first and second monophone dialectal-accented common-language models are based on development data of dialectal-accented common languages of first kind and second kind, respectively. Then a temporary merged model is obtained in a manner that the first dialectal-accented common-language model is merged into the standard common-language model according to a first confusion matrix obtained by recognizing the development data of first dialectal-accented common language using the standard common-language model.
    Type: Application
    Filed: October 29, 2009
    Publication date: May 13, 2010
    Applicants: SONY COMPUTER ENTERTAINMENT INC., TSINGHUA UNIVERSITY
    Inventors: Fang Zheng, Xi Xiao, Linquan Liu, Zhan You, Wenxiao Cao, Makoto Akabane, Ruxin Chen, Yoshikazu Takahashi
  • Publication number: 20100121638
    Abstract: Speech recognition is performed in near-real-time and improved by exploiting events and event sequences, employing machine learning techniques including boosted classifiers, ensembles, detectors and cascades and using perceptual clusters. Speech recognition is also improved using tandem processing. An automatic punctuator injects punctuation into recognized text streams.
    Type: Application
    Filed: November 11, 2009
    Publication date: May 13, 2010
    Inventors: Mark PINSON, David Pinson, SR., Mary Flanagan, Shahrokh Makanvand
  • Publication number: 20100106501
    Abstract: Updating a voice template for recognizing a speaker on the basis of a voice uttered by the speaker is disclosed. Stored voice templates indicate distinctive characteristics of utterances from speakers. Distinctive characteristics are extracted for a specific speaker based on a voice message utterance received from that speaker. The distinctive characteristics are compared to the characteristics indicated by the stored voice templates to selected a template that matches within a predetermined threshold. The selected template is updated on the basis of the extracted characteristics.
    Type: Application
    Filed: October 27, 2009
    Publication date: April 29, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yukari Miki, Masami Noguchi
  • Publication number: 20100100380
    Abstract: A system, method and computer-readable medium provide a multitask learning method for intent or call-type classification in a spoken language understanding system. Multitask learning aims at training tasks in parallel while using a shared representation. A computing device automatically re-uses the existing labeled data from various applications, which are similar but may have different call-types, intents or intent distributions to improve the performance. An automated intent mapping algorithm operates across applications. In one aspect, active learning is employed to selectively sample the data to be re-used.
    Type: Application
    Filed: December 28, 2009
    Publication date: April 22, 2010
    Applicant: AT&T Corp.
    Inventor: Gokhan Tur
  • Publication number: 20100076765
    Abstract: Described is a technology by which a structured model of repetition is used to determine the words spoken by a user, and/or a corresponding database entry, based in part on a prior utterance. For a repeated utterance, a joint probability analysis is performed on (at least some of) the corresponding word sequences as recognized by one or more recognizers) and associated acoustic data. For example, a generative probabilistic model, or a maximum entropy model may be used in the analysis. The second utterance may be a repetition of the first utterance using the exact words, or another structural transformation thereof relative to the first utterance, such as an extension that adds one or more words, a truncation that removes one or more words, or a whole or partial spelling of one or more words.
    Type: Application
    Filed: September 19, 2008
    Publication date: March 25, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Geoffrey G. Zweig, Xiao Li, Dan Bohus, Alejandro Acero, Eric J. Horvitz
  • Publication number: 20100076968
    Abstract: Implementations relate to systems and methods for aggregating and presenting data related to geographic locations. Geotag data related to geographic locations and associated features or attributes can be collected to build a regional profile characterizing a set of locations within the region. Geotag data related to the constituent locations, such as user ratings or popularity ranks for restaurants, shops, parks, or other features, sites, or attractions, can be combined to generate a profile of characteristics of locations in the region. The platform can generate recommendations of locations to transmit to the user of a mobile device, based for instance on the location of the device in the region as reported by GPS or other location service and the regional profile. Geotag data can include audio data analyzed using region-specific terms, and user recommendations can be presented via dynamic menus based on regional profiles, user preferences or other criteria.
    Type: Application
    Filed: May 21, 2009
    Publication date: March 25, 2010
    Inventors: Mark R. BOYNS, Chand MEHTA, Jeffrey C. TSAY, Giridhar D. MANDYAM
  • Publication number: 20100063817
    Abstract: An acoustic model registration apparatus, an talker recognition apparatus, an acoustic model registration method and an acoustic model registration processing program, each of which prevents certainly an acoustic model having a low recognition capability for talker from being registered certainly, are provided.
    Type: Application
    Filed: March 14, 2007
    Publication date: March 11, 2010
    Applicant: Pioneer Corporation
    Inventors: Soichi Toyama, Ikuo Fujita, Yukio Kamoshida
  • Publication number: 20100057461
    Abstract: In a method and a system (20) for creating or updating entries in a speech recognition (SR) lexicon (7) of a speech recognition system, said entries mapping speech recognition (SR) phoneme sequences to words, said method comprising entering a respective word, and in the case that the word is a new word to be added to the SR lexicon, also entering at least one associated SR phoneme sequence through input means (26), it is provided that the SR phoneme sequence associated with the respective word is converted into speech by phoneme to speech conversion means (4.4), and the speech is played back by playback means (28), to control the match of the phoneme sequence and the word.
    Type: Application
    Filed: February 4, 2008
    Publication date: March 4, 2010
    Inventors: Andreas Neubacher, Gerhard Grobauer
  • Publication number: 20100049519
    Abstract: A system and a method are provided. A speech recognition processor receives unconstrained input speech and outputs a string of words. The speech recognition processor is based on a numeric language that represents a subset of a vocabulary. The subset includes a set of words identified as being for interpreting and understanding number strings. A numeric understanding processor contains classes of rules for converting the string of words into a sequence of digits. The speech recognition processor utilizes an acoustic model database. A validation database stores a set of valid sequences of digits. A string validation processor outputs validity information based on a comparison of a sequence of digits output by the numeric understanding processor with valid sequences of digits in the validation database.
    Type: Application
    Filed: November 5, 2009
    Publication date: February 25, 2010
    Applicant: AT&T Corp.
    Inventors: Mazin G. Rahim, Giuseppe Riccardi, Jeremy Huntley Wright, Bruce Melvin Buntschuh, Allen Louis Gorin
  • Publication number: 20100036661
    Abstract: A computing system, comprising: an I/O platform for interfacing with a user; and a processing entity configured to implement a dialog with the user via the I/O platform. The processing entity is further configured for: identifying a grammar template and an instantiation context associated with a current point in the dialog; causing creation of an instantiated grammar model from the grammar template and the instantiation context; storing the instantiated grammar model in a memory; and interpreting user input received via the I/O platform in accordance with the instantiated grammar model. Also, a grammar authoring environment supporting a variety of grammar development tools is disclosed.
    Type: Application
    Filed: July 15, 2009
    Publication date: February 11, 2010
    Applicant: NU ECHO INC.
    Inventors: Dominique Boucher, Yves Normandin
  • Publication number: 20090326952
    Abstract: [Problems] To convert a signal of non-audible murmur obtained through an in-vivo conduction microphone into a signal of a speech that is recognizable for (hardly misrecognized by) a receiving person with maximum accuracy.
    Type: Application
    Filed: February 7, 2007
    Publication date: December 31, 2009
    Applicant: NATIONAL UNIVERSITY CORPORATION NARA INSTITUTE OF SCIENCE AND TECHNOLOGY
    Inventors: Tomoki Toda, Mikihiro Nakagiri, Hideki Kashioka, Kiyohiro Shikano
  • Publication number: 20090265166
    Abstract: A boundary estimation apparatus includes an boundary estimation unit which estimates a first boundary separating a speech into first meaning units, a boundary estimation unit configured to estimate a second boundary separating a speech, related to the speech, into second meaning units related to the first meaning units, a pattern generating unit configured to generate a representative pattern showing representative characteristic in the analysis interval, a similarity calculation unit configured to calculate a similarity between the representative pattern and a characteristic pattern showing feature in a calculation interval for calculating the similarity in the speech, and the boundary estimation unit estimate as the second boundary based on the calculation interval, in which the similarity is higher than a threshold value or relatively high.
    Type: Application
    Filed: June 30, 2009
    Publication date: October 22, 2009
    Inventor: Kazuhiko Abe
  • Publication number: 20090259467
    Abstract: A voice recognition apparatus 10 carries out voice recognition of an inputted voice with reference to a voice recognition dictionary, and outputs a voice recognition result. In this voice recognition apparatus, a plurality of voice recognition dictionaries 23-1 to 23-N are provided according to predetermined classification items.
    Type: Application
    Filed: August 16, 2006
    Publication date: October 15, 2009
    Inventors: Yuki Sumiyoshi, Reiko Okada
  • Publication number: 20090210230
    Abstract: A system and method is provided for recognizing a speech input and selecting an entry from a list of entries. The method includes recognizing a speech input. A fragment list of fragmented entries is provided and compared to the recognized speech input to generate a candidate list of best matching entries based on the comparison result. The system includes a speech recognition module, and a data base for storing the list of entries and the fragmented list. The speech recognition module may obtain the fragmented list from the data base and store a candidate list of best matching entries in memory. A display may also be provided to allow the user to select from a list of best matching entries.
    Type: Application
    Filed: January 16, 2009
    Publication date: August 20, 2009
    Applicant: Harman Becker Automotive Systems GmbH
    Inventor: Markus Schwarz
  • Publication number: 20090204398
    Abstract: The fluency of a spoken utterance or passage is measure and presented to the speaker and to others. In one embodiment, a method is described that includes recording a spoken utterance, evaluating the spoken utterance for accuracy, evaluating the spoken utterance for duration, and assigning a score to the spoken utterance based on the accuracy and the duration.
    Type: Application
    Filed: June 24, 2005
    Publication date: August 13, 2009
    Inventors: Robert Du, Lingfei Song, Nan N. Li, Minerva Yeung
  • Publication number: 20090171663
    Abstract: The present invention discloses creating and using speech recognition grammars of reduced size. The reduced speech recognition grammars can include a set of entries, each entry having a unique identifier and a phonetic representation that is used when matching speech input against the entries. Each entry can lack a textual spelling corresponding to the phonetic representation. The reduced speech recognition grammar can be digitally encoded and stored in a computer readable media, such as a hard drive or flash memory of a portable speech enabled device.
    Type: Application
    Filed: January 2, 2008
    Publication date: July 2, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: DANIEL E. BADT, VLADIMIR BERGL, JOHN W. ECKHART, RADEK HAMPL, JONATHAN PALGON, HARVEY M. RUBACK
  • Publication number: 20090157401
    Abstract: An intelligent query system for processing voiced-based queries is disclosed, which uses semantic based processing to identify the question posed by the user by understanding the meaning of the user's utterance. Based on identifying the meaning of the utterance, the system selects a single answer that best matches the user's query. The answer that is paired to this single question is then retrieved and presented to the user. The system, as implemented, accepts environmental variables selected by the user and is scalable to provide answers to a variety and quantity of user-initiated queries.
    Type: Application
    Filed: June 23, 2008
    Publication date: June 18, 2009
    Inventor: Ian M. Bennett
  • Publication number: 20090150153
    Abstract: Described is the use of acoustic data to improve grapheme-to-phoneme conversion for speech recognition, such as to more accurately recognize spoken names in a voice-dialing system. A joint model of acoustics and graphonemes (acoustic data, phonemes sequences, grapheme sequences and an alignment between phoneme sequences and grapheme sequences) is described, as is retraining by maximum likelihood training and discriminative training in adapting graphoneme model parameters using acoustic data. Also described is the unsupervised collection of grapheme labels for received acoustic data, thereby automatically obtaining a substantial number of actual samples that may be used in retraining. Speech input that does not meet a confidence threshold may be filtered out so as to not be used by the retrained model.
    Type: Application
    Filed: December 7, 2007
    Publication date: June 11, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Xiao Li, Asela J. R. Gunawardana, Alejandro Acero
  • Publication number: 20090150148
    Abstract: A voice recognition apparatus can reduce false recognition caused by matching with respect to the phrases composed of a small number of syllables, when it performs a recognition process, by a pronunciation unit, for voice data based on voice produced by a speaker such as a syllable and further performs recognition by a method such as the Word Spotting for matching with respect to the phrases stored in the phrase database. The voice recognition apparatus performs a recognition process for comparing a result of the recognition process by a pronunciation unit with the extended phrases obtained by adding the additional phrase before and/or behind the respective phrases.
    Type: Application
    Filed: October 1, 2008
    Publication date: June 11, 2009
    Applicant: FUJITSU LIMITED
    Inventor: Kenji ABE
  • Publication number: 20090138263
    Abstract: To provide a data process unit and data process unit control program which are suitable for generating acoustic models for unspecified speakers taking distribution of diversifying feature parameters into consideration under such specific conditions as the type of speaker, speech lexicons, speech styles, and speech environment and which are suitable for providing acoustic models intended for unspecified speakers and adapted to speech of a specific person. A data process unit 1 comprises a data classification section 1a, data storing section 1b, pattern model generating section 1c, data control section 1d, mathematical distance calculating section 1e, pattern model converting section 1f, pattern model display section 1g, region dividing section 1h, division changing section 1i, region selecting section 1j, and specific pattern model generating section 1k.
    Type: Application
    Filed: December 30, 2008
    Publication date: May 28, 2009
    Inventors: Makoto Shozakai, Goshu Nagino
  • Publication number: 20090119104
    Abstract: Systems and methods are described that automatically control modules of dialog systems. The systems and methods include a dialog module that receives and processes utterances from a speaker and outputs data used to generate synthetic speech outputs as responses to the utterances. A controller is coupled to the dialog module, and the controller detects an abnormal output of the dialog module when the dialog module is processing in an automatic mode. The controller comprises a mode control for an agent to control the dialog module by correcting the abnormal output and transferring a corrected output to a downstream dialog module that follows, in a processing path, the dialog module. The corrected output is used in further processing the utterances.
    Type: Application
    Filed: November 7, 2007
    Publication date: May 7, 2009
    Applicant: Robert Bosch GmbH
    Inventors: Fuliang Weng, Baoshi Yan, Zhe Feng
  • Publication number: 20090119103
    Abstract: A method automatically recognizes speech received through an input. The method accesses one or more speaker-independent speaker models. The method detects whether the received speech input matches a speaker model according to an adaptable predetermined criterion. The method creates a speaker model assigned to a speaker model set when no match occurs based on the input.
    Type: Application
    Filed: October 10, 2008
    Publication date: May 7, 2009
    Inventors: Franz Gerl, Tobias Herbig
  • Publication number: 20080300865
    Abstract: In a natural language, mixed-initiative system, a method of processing user dialogue can include receiving a user input and determining whether the user input specifies an action to be performed or a token of an action. The user input can be selectively routed to an action interpreter or a token interpreter according to the determining step.
    Type: Application
    Filed: April 30, 2008
    Publication date: December 4, 2008
    Applicant: INTERNATIIONAL BUSINESS MACHINES CORPORATION
    Inventors: Rajesh Balchandran, Linda Boyer
  • Publication number: 20080281582
    Abstract: An input system for mobile search and a method therefor are provided. The input system includes an input module receiving a code input for a specific term and a voice input corresponding thereto, a database including a glossary and an acoustic model, wherein the glossary includes a plurality of terms and a sequence list, and each of the terms has a search weight based on an order of the sequence list, a process module selecting a first number of candidate terms from the glossary according to the code input by using an input algorithm and obtaining a second number of candidate terms by using a speech recognition algorithm to compare the voice input with the first number of candidate terms via the acoustic model, wherein the second number of candidate terms are listed in a particular order based on their respective search weights, and an output module showing the second number of candidate terms in the particular order for selecting the specific term therefrom.
    Type: Application
    Filed: October 1, 2007
    Publication date: November 13, 2008
    Inventors: Tien-ming Hsu, Ming-hong Wang, Yuan-chia Lu, Jia-lin Shen
  • Publication number: 20080177546
    Abstract: A novel system for speech recognition uses differential cepstra over time frames as acoustic features, together with the traditional static cepstral features, for hidden trajectory modeling, and provides greater accuracy and performance in automatic speech recognition. According to one illustrative embodiment, an automatic speech recognition method includes receiving a speech input, generating an interpretation of the speech, and providing an output based at least in part on the interpretation of the speech input. The interpretation of the speech uses hidden trajectory modeling with observation vectors that are based on cepstra and on differential cepstra derived from the cepstra. A method is developed that can automatically train the hidden trajectory model's parameters that are corresponding to the components of the differential cepstra in the full acoustic feature vectors.
    Type: Application
    Filed: January 19, 2007
    Publication date: July 24, 2008
    Applicant: Microsoft Corporation
    Inventors: Li Deng, Dong Yu
  • Publication number: 20080172232
    Abstract: A voice recognition emergency system and method. The system includes a microphone, a speaker, and a voice recognition emergency device. The device includes a processor and a transmitter. The processor analyzes sound received from the microphone, detects when an emergency phrase has been spoken, and conveys an alert condition to a gateway via the transmitter in response to detecting the emergency phrase has been spoken. The processor recognizes pre-defined, spoken emergency phrases and triggers an alert condition in response to detecting that one of the emergency phrases was spoken. The device continuously listens through the microphone for any one of the pre-defined phrases, in response to which an alert condition is conveyed to the gateway device. The alert condition is conveyed to the gateway device through Power Line Communications (PLC), radio communications, Wi-Fi communications, or Ethernet communications. The processor may recognize an emergency phrase spoken by a particular person.
    Type: Application
    Filed: January 14, 2008
    Publication date: July 17, 2008
    Inventor: Scott A. Gurley
  • Publication number: 20080162135
    Abstract: A method, article of manufacture, and apparatus for monitoring data traffic on a network is disclosed. In an embodiment, this includes obtaining intrinsic data from at least a portion of the traffic, obtaining extrinsic data from at least a portion of the traffic, associating the intrinsic data with the extrinsic data, and logging the intrinsic data and extrinsic data. The portion of the traffic from which the intrinsic data and extrinsic data are derived may not be stored, or may be stored in encrypted form.
    Type: Application
    Filed: December 30, 2006
    Publication date: July 3, 2008
    Inventors: Christopher Hercules Claudatos, William Dale Andruss, Scott R. Bevan
  • Publication number: 20080154613
    Abstract: A voice processing system for a vehicle environment is provided for detecting a sound signal in the vehicle environment and identifying a voice command that originates from a vehicle user outside the vehicle. The voice processing system, in the detected the sound signal, takes into account position information relating to the position of the vehicle user in the vehicle environment. Information on the position of the vehicle user may be obtained from a keyless-go-system or another monitoring device of the motor vehicle, for example an optical imaging device of a parking-assistance system or a driver-assistance system.
    Type: Application
    Filed: August 6, 2007
    Publication date: June 26, 2008
    Applicant: Harman Becker Automotive Systems GmbH
    Inventors: Tim Haulick, Markus Buck, Hans-Joerg Koepf
  • Publication number: 20080103771
    Abstract: A method for the distributed construction of a voice recognition model that is intended to be used by a device comprising a model base and a reference base in which the modeling elements are stored. The method includes the steps of obtaining the entity to be modeled, transmitting data representative of the entity over a communication link to a server, determining a set of modeling parameters indicating the modeling elements, transmitting the modeling parameters to the device, determining the voice recognition model of the entity to be modeled as a function of at least the modeling parameters received and at least one modeling element that is stored in the reference base and indicated in the transmitted parameters, and subsequently saving the voice recognition model in the model base.
    Type: Application
    Filed: October 27, 2005
    Publication date: May 1, 2008
    Applicant: France Telecom
    Inventors: Denis Jouvet, Jean Monne
  • Publication number: 20080077405
    Abstract: A password grammar for speech recognition is described. A password is normalized into a list of strings of a plurality of character types such as letters and numerals. For each string of letters, one or more corresponding letter permutations are determined which represent pronounceable combinations of that string. Then, for each letter permutation, a corresponding recognition grammar entry is created for a speech recognition grammar.
    Type: Application
    Filed: September 21, 2007
    Publication date: March 27, 2008
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventor: Richard Breuer
  • Publication number: 20080015858
    Abstract: A speech reference enrollment method involves the following steps: (a) requesting a user speak a vocabulary word; (b) detecting a first utterance (354); (c) requesting the user speak the vocabulary word; (d) detecting a second utterance (358); (e) determining a first similarity between the first utterance and the second utterance (362); (f) when the first similarity is less than a predetermined similarity, requesting the user speak the vocabulary word; (g) detecting a third utterance (366); (h) determining a second similarity between the first utterance and the third utterance (370); and (i) when the second similarity is greater than or equal to the predetermined similarity, creating a reference (364).
    Type: Application
    Filed: July 9, 2007
    Publication date: January 17, 2008
    Inventor: Robert Bossemeyer