Feature Extraction For Speech Recognition; Selection Of Recognition Unit (epo) Patents (Class 704/E15.004)
  • Publication number: 20110071826
    Abstract: A method and apparatus for ordering results from a query is provided herein. During operation, a spoken query is received and converted to a textual representation, such as a word lattice. Search strings are then created from the word lattice. For example a set search strings may be created from the N-grams, such as unigrams and bigrams, of the word lattice. The search strings may be ordered and truncated based on confidence values assigned to the n-grams by the speech recognition system. The set of search strings are sent to at least one search engine, and search results are obtained. The search results are then re-arranged or reordered based on a semantic similarity between the search results and the word lattice.
    Type: Application
    Filed: September 23, 2009
    Publication date: March 24, 2011
    Applicant: MOTOROLA, INC.
    Inventors: Changxue Ma, Harry M. Bliss
  • Publication number: 20110066434
    Abstract: The invention can recognize all languages and input words. It needs m unknown voices to represent m categories of known words with similar pronunciations. Words can be pronounced in any languages, dialects or accents. Each will be classified into one of m categories represented by its most similar unknown voice. When user pronounces a word, the invention finds its F most similar unknown voices. All words in F categories represented by F unknown voices will be arranged according to their pronunciation similarity and alphabetic letters. The pronounced word should be among the top words. Since we only find the F most similar unknown voices from m (=500) unknown voices and since the same word can be classified into several categories, our recognition method is stable for all users and can fast and accurately recognize all languages (English, Chinese and etc.) and input much more words without using samples.
    Type: Application
    Filed: September 29, 2009
    Publication date: March 17, 2011
    Inventors: Tze-Fen LI, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
  • Publication number: 20110040561
    Abstract: A method for compensating inter-session variability for automatic extraction of information from an input voice signal representing an utterance of a speaker, includes: processing the input voice signal to provide feature vectors each formed by acoustic features extracted from the input voice signal at a time frame; computing an intersession variability compensation feature vector; and computing compensated feature vectors based on the extracted feature vectors and the intersession variability compensation feature vector.
    Type: Application
    Filed: May 16, 2006
    Publication date: February 17, 2011
    Inventors: Claudio Vair, Daniele Colibro, Pietro Laface
  • Publication number: 20110029312
    Abstract: Methods are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. A method for model adaptation for a speech recognition system includes determining an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system. The method may further include adjusting an adaptation, of the model for the word or various models for the various words, based on the error rate. Apparatus are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. An apparatus for model adaptation for a speech recognition system includes a processor adapted to estimate an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system.
    Type: Application
    Filed: October 11, 2010
    Publication date: February 3, 2011
    Applicant: VOCOLLECT, INC.
    Inventors: Keith P. Braho, Jeffrey P. Pike, Lori A. Pike
  • Publication number: 20110029306
    Abstract: An audio discriminating device includes a plurality of audio discriminators for discriminating an input audio signal as a speech signal or a non-speech signal by using at least one feature parameter, and determines whether to drive the audio discriminator connected next to the corresponding audio discriminator according to the audio discriminator's audio signal discriminating result.
    Type: Application
    Filed: June 22, 2010
    Publication date: February 3, 2011
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Manho PARK, Sook Jin Lee, Jee Hwan Ahn
  • Publication number: 20110029311
    Abstract: There is provided a voice processing device. The device includes: score calculation unit configured to calculate a score indicating compatibility of a voice signal input on the basis of an utterance of a user with each of plural pieces of intention information indicating each of a plurality of intentions; intention selection unit configured to select the intention information indicating the intention of the utterance of the user among the plural pieces of intention information on the basis of the score calculated by the score calculation unit; and intention reliability calculation unit configured to calculate the reliability with respect to the intention information selected by the intention selection unit on the basis of the score calculated by the score calculation unit.
    Type: Application
    Filed: June 17, 2010
    Publication date: February 3, 2011
    Applicant: Sony Corporation
    Inventors: Katsuki MINAMINO, Hitoshi Honda, Yoshinori Maeda, Hiroaki Ogawa
  • Publication number: 20110010171
    Abstract: A system and method for providing speech recognition functionality offers improved accuracy and robustness in noisy environments having multiple speakers. The described technique includes receiving speech energy and converting the received speech energy to a digitized form. The digitized speech energy is decomposed into features that are then projected into a feature space having multiple speaker subspaces. The projected features fall either into one of the multiple speaker subspaces or outside of all speaker subspaces. A speech recognition operation is performed on a selected one of the multiple speaker subspaces to resolve the utterance to a command or data.
    Type: Application
    Filed: July 7, 2009
    Publication date: January 13, 2011
    Applicant: General Motors Corporation
    Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
  • Publication number: 20110004624
    Abstract: A method, a system and a computer program product for enabling a customer response speech recognition unit to dynamically receive customer feedback. The customer response speech recognition unit is positioned at a customer location. The speech recognition unit is automatically initialized when one or more spoken words are detected. The response statements of customers are dynamically received by the customer response speech recognition unit at the customer location, in real time. The customer response speech recognition unit determines when the one or more spoken words of the customer response statement are associated with a score in a database. An analysis of the words is performed to generate a score that reflects the evaluation of the subject by the customer. The score is dynamically updated as new evaluations are received, and the score is displayed within graphical user interface (GUI) to be viewed by one or more potential customers.
    Type: Application
    Filed: July 2, 2009
    Publication date: January 6, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ravi P. Bansal, Mike V. Macias, Saidas T. Kottawar, Salil P. Gandhi, Sandip D. Mahajan
  • Publication number: 20100332228
    Abstract: According to some embodiments, a method and apparatus are provided to buffer N audio frames of a plurality of audio frames associated with an audio signal, pre-compute scores for a subset of context dependent models (CDMs), and perform a graphical model search associated with the N audio frames where a score of a context independent model (CIM) associated with a CDM is used in lieu of a score for the CDM when a score for the CDM is needed and has not been pre-computed.
    Type: Application
    Filed: June 25, 2009
    Publication date: December 30, 2010
    Inventors: Michael Eugene Deisher, Tao Ma
  • Publication number: 20100312561
    Abstract: An apparatus and a method for performing a grounding process using the POMDP are provided. The configuration is designed so that, in order to understand a request from a user through the utterances from the user, a grounding process is performed using the POMDP (Partially Observable Markov Decision Process) in which analysis information acquired from a language analyzing unit that receives the utterances of the user and performs language analysis and pragmatic information including task feasibility information acquired from the task manager that performs a task are set as observation information. Accordingly, understanding can be efficiently achieved, and high-speed and accurate recognition of the user request and task execution based on the user request can be provided.
    Type: Application
    Filed: December 4, 2008
    Publication date: December 9, 2010
    Inventor: Ugo Di Profio
  • Publication number: 20100312560
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
    Type: Application
    Filed: June 9, 2009
    Publication date: December 9, 2010
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Andrej LJOLJE, Alistair D. CONKIE, Ann K. SYRDAL
  • Publication number: 20100292987
    Abstract: A circuit startup method utilizing utterance estimation in a speech processing system including a sound collecting device is provided. The circuit startup method includes a subset power supply step of supplying power to the sound collecting device and a signal processing circuit, and a sound collecting step of inputting a sound from the sound collecting device through the signal processing circuit. The circuit startup method further includes an utterance estimation step of estimating whether or not a speech is contained in the inputted sound, and a power supply step of supplying power to the speech processing circuit for an utterance interval when it is estimated that a speech is contained from an estimation result of the utterance estimation step.
    Type: Application
    Filed: May 6, 2010
    Publication date: November 18, 2010
    Inventors: Hiroshi Kawaguchi, Masahiko Yoshimoto, Hiroki Noguchi, Tomoya Takagi
  • Publication number: 20100286985
    Abstract: Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcome the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command.
    Type: Application
    Filed: July 19, 2010
    Publication date: November 11, 2010
    Applicant: VoiceBox Technologies, Inc.
    Inventors: Robert A. Kennewick, David Locke, Michael R. Kennewick, SR., Michael R. Kennewick, JR., Richard Kennewick, Tom Freeman
  • Publication number: 20100280983
    Abstract: Disclosed are an apparatus and method of deducing a user's intention using multimodal information. The user's intention deduction apparatus includes a first predictor to predict a part of a user's intention using at least one piece of motion information, and a second predictor to predict the user's intention using the predicted part of the user's intention and multimodal information received from at least one multimodal sensor.
    Type: Application
    Filed: April 29, 2010
    Publication date: November 4, 2010
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jeong-Mi CHO, Jeong-Su KIM, Won-Chul BANG, Nam-Hoon KIM
  • Publication number: 20100278377
    Abstract: The invention relates to a method for electronically evaluating a dialogue between at least two persons comprising receiving audio data, analysing the audio data to determine the reparation of utterances of the at least two persons in the course of the dialogue and comparing the results of the analysis with predetermined communication patterns.
    Type: Application
    Filed: June 25, 2008
    Publication date: November 4, 2010
    Applicant: Zero to One Technology
    Inventors: Philippe Hamel, Jean-Paul Audrain, Pierre-Sylvain Luquet, Eric Faurot
  • Publication number: 20100280827
    Abstract: Embodiments for implementing a speech recognition system that includes a speech classifier ensemble are disclosed. In accordance with one embodiment, the speech recognition system includes a classifier ensemble to convert feature vectors that represent a speech vector into log probability sets. The classifier ensemble includes a plurality of classifiers. The speech recognition system includes a decoder ensemble to transform the log probability sets into output symbol sequences. The speech recognition system further includes a query component to retrieve one or more speech utterances from a speech database using the output symbol sequences.
    Type: Application
    Filed: April 30, 2009
    Publication date: November 4, 2010
    Applicant: Microsoft Corporation
    Inventors: Kunal Mukerjee, Kazuhito Koishida, Shankar Regunathan
  • Publication number: 20100278505
    Abstract: The present disclosure provides a multi-media data editing system for editing a multi-media data. The multi-media data editing system includes a sample memory and a processor. The sample memory stores a plurality of undesired voice samples. The processor includes a voice obtaining module, a voice comparing module, and a voice editing module. The voice obtaining module is configured for obtaining an audio data from the multi-media data. The voice comparing module is configured for comparing the obtained audio data with the plurality of undesired voice samples looking for a match. The voice editing module is configured for editing the audio data during the audio data matched with undesired voice samples. The present disclosure also provides a multi-media data editing method. And the present disclosure provides an electronic device using the multi-media data editing system.
    Type: Application
    Filed: December 18, 2009
    Publication date: November 4, 2010
    Applicant: HON HAI PRECISION INDUSTRY CO., LTD.
    Inventor: CHUAN-FENG WU
  • Publication number: 20100268533
    Abstract: A speech detection apparatus and method are provided. The speech detection apparatus and method determine whether a frame is speech or not using feature information extracted from an input signal. The speech detection apparatus may estimate a situation related to an input frame and determine which feature information is required for speech detection for the input frame in the estimated situation. The speech detection apparatus may detect a speech signal using dynamic feature information that may be more suitable to the situation of a particular frame, instead of using the same feature information for each and every frame.
    Type: Application
    Filed: April 16, 2010
    Publication date: October 21, 2010
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Chi-youn PARK, Nam-hoon Kim, Jeong-mi Cho
  • Publication number: 20100262423
    Abstract: Described is a technology by which a feature compensation approach to speech recognition uses a high-order vector Taylor series (HOVTS) approximation of a model of distortions to improve recognition accuracy. Speech recognizer models trained with clean speech degrade when later dealing with speech that is corrupted by additive noises and convolutional distortions. The approach attempts to remove any such noise/distortions from the input speech. To use the HOVTS approximation, a Gaussian mixture model is trained and used to convert cepstral domain feature vectors to log spectrum components. HOVTS computes statistics for the components, which are transformed back to the cepstral domain. A noise/distortion estimate is obtained, and used to provide a clean speech estimate to the recognizer.
    Type: Application
    Filed: April 13, 2009
    Publication date: October 14, 2010
    Applicant: Microsoft Corporation
    Inventors: Qiang Huo, Jun Du
  • Publication number: 20100250241
    Abstract: The invention provides a dialogue-based learning apparatus through dialogue with users comprising: a speech input unit (10) for inputting speeches; a speech recognition unit (20) for recognizing the input speech; and a behavior and dialogue controller (30) for controlling behaviors and dialogues according to speech recognition results, wherein the behavior and dialogue controller (30) has a topic recognition expert (34) to memorise contents of utterances and to retrieve the topic that best matches the speech recognition results, and a mode switching expert (35) to control mode switching in accordance with a user utterance, wherein the mode switching expert switches modes in accordance with a user utterance, wherein the topic recognition expert registers a plurality words in the utterance as topics in first mode, performs searches from among the registered topics, and selects the maximum likelihood topic in second mode.
    Type: Application
    Filed: August 29, 2008
    Publication date: September 30, 2010
    Inventors: Naoto Iwahashi, Noriyuki Kimura, Mikio Nakano, Kotaro Funakoshi
  • Publication number: 20100223285
    Abstract: A critical test results management system and method for capturing test data from a test results providing program and delivering messages to interested recipients. The system and method generating alerts, escalating the alerts to message receiving devices, and tracking the status of the alerts. The recipient of an alert can access the system to obtain the contents of the alert, acknowledge receipt of the alert, and record notes related to the alert. The invention tracks when alerts were sent and if they have been acknowledged. If the alerts are not acknowledge prior expiration of a predetermined time period, further alerts are escalated to the recipient, to different receiving devices, or to different recipients.
    Type: Application
    Filed: April 28, 2010
    Publication date: September 2, 2010
    Inventor: Brian Biddulph-Krentar
  • Publication number: 20100217592
    Abstract: The present invention provides a method for identifying a turn, such as a sentence or phrase, for addition to a platform dialog comprising a plurality of turns. Lexical features of each of a set of candidate turns relative to one or more turns in the platform dialog are determined. Semantic features associated with each candidate turn and associated with the platform dialog are determined to identify one or more topics associated with each candidate turn and with the platform dialog. Lexical features of each candidate turn are compared to lexical features of the platform dialog and semantic features associated with each candidate turn are compared to semantic features of the platform dialog to rank the candidate turns based on similarity of lexical features and semantic features of each candidate turn to lexical features and semantic features of the platform dialog.
    Type: Application
    Filed: October 14, 2009
    Publication date: August 26, 2010
    Applicant: HONDA MOTOR CO., LTD.
    Inventors: Rakesh Gupta, Lev-Arie Ratinov
  • Publication number: 20100209003
    Abstract: In one embodiment, a method includes obtaining a target template and processing the target template to identify at least one component of the target template. The method also includes searching at least one collection of content to identify at least a first instance of content that substantially matches the component of the target template. The first instance of content is presented as substantially matching the component. Finally, a first arrangement that includes the first instance of content is created. Such a first arrangement is associated with a mash-up related to the target template.
    Type: Application
    Filed: February 16, 2009
    Publication date: August 19, 2010
    Applicant: Cisco Technology, Inc.
    Inventors: John Toebes, Glenn Thomas Millican, III
  • Publication number: 20100185446
    Abstract: It is provided a speech recognition system installed in a terminal coupled to a server via a network. The terminal holds map data including a landmark. The speech recognition system manages recognition data including a word corresponding to a name of the landmark, and sends update area information and updated time to the server. The server generates, when recognition data of the area of the update area information sent from the terminal has been changed after updated time, difference data between latest recognition data and recognition data of the update area information at a time of the updated time, and sends the generated difference data and map data of the update area information to the terminal. The terminal updates the map data based on the map data sent from the server. The speech recognition system updates the recognition data managed by the terminal based on the difference data.
    Type: Application
    Filed: January 20, 2010
    Publication date: July 22, 2010
    Inventors: Takeshi HOMMA, Hiroaki Kokubo, Akinori Asahara, Hisashi Takahashi
  • Publication number: 20100174533
    Abstract: Techniques are described for automatically measuring fluency of a patient's speech based on prosodic characteristics thereof. The prosodic characteristics may include statistics regarding silent pauses, filled pauses, repetitions, or fundamental frequency of the patient's speech. The statistics may include a count, average number of occurrences, duration, average duration, frequency of occurrence, standard deviation, or other statistics. In one embodiment, a method includes receiving an audio sample that includes speech of a patient, analyzing the audio sample to identify prosodic characteristics of the speech of the patient, and automatically measuring fluency of the speech of the patient based on the prosodic characteristics. These techniques may present several advantages, such as objectively measuring fluency of a patient's speech without requiring a manual transcription or other manual intervention in the analysis process.
    Type: Application
    Filed: January 5, 2010
    Publication date: July 8, 2010
    Applicant: Regents of the University of Minnesota
    Inventor: Serguei V.S. Pakhomov
  • Publication number: 20100161328
    Abstract: Embodiments are provided for utilizing a client-side cache for utterance processing to facilitate network based speech recognition. An utterance comprising a query is received in a client computing device. The query is sent from the client to a network server for results processing. The utterance is processed to determine a speech profile. A cache lookup is performed based on the speech profile to determine whether results data for the query is stored in the cache. If the results data is stored in the cache, then a query is sent to cancel the results processing on the network server and the cached results data is displayed on the client computing device.
    Type: Application
    Filed: December 18, 2008
    Publication date: June 24, 2010
    Applicant: Microsoft Corporation
    Inventors: Andrew K. Krumel, Shuangyu Chang, Robert L. Chambers
  • Publication number: 20100100381
    Abstract: The present invention relates to a system and method for automatically verifying that a message received from a user is intelligible. In an exemplary embodiment, a message is received from the user. A speech level of the user's message may be measured and compared to a pre-determined speech level threshold to determine whether the measured speech level is below the pre-determined speech level threshold. A signal-to-noise ratio of the user's message may be measured and compared to a pre-determined signal-to-noise ratio threshold to determine whether the measured signal-to-noise ratio of the message is below the pre-determined signal-to-noise ratio threshold. An estimate of intelligibility for the user's message may be calculated and compared to an intelligibility threshold to determine whether the calculated estimate of intelligibility is below the intelligibility threshold.
    Type: Application
    Filed: December 22, 2009
    Publication date: April 22, 2010
    Applicant: AT&T Corp.
    Inventors: Harvey S. Cohen, Randy G. Goldberg, Kenneth H. Rosen
  • Publication number: 20100094625
    Abstract: A system and method are disclosed for noise level/spectrum estimation and speech activity detection. Some embodiments include a probabilistic model to estimate noise level and subsequently detect the presence of speech. These embodiments outperform standard voice activity detectors (VADs), producing improved detection in a variety of noisy environments.
    Type: Application
    Filed: October 14, 2009
    Publication date: April 15, 2010
    Applicant: QUALCOMM Incorporated
    Inventors: Asif I. Mohammad, Dinesh Ramakrishnan
  • Publication number: 20100088098
    Abstract: A speech recognition apparatus includes a speech collating unit that calculates similarities at each time between a feature amount converted by a speech analyzing unit and a word model generated by a word model generating unit. The speech collating unit extracts a word model from word models generated by the word model generating unit, whose minimum similarity among similarities at each time or whose overall similarity obtained from similarities at each time satisfies a second threshold value condition, and whose similarity at each time in a section among vocalization sections of utterance speech and corresponding to either a phoneme or a phoneme string associated with a first threshold value condition satisfies the first threshold value condition, and outputs as a recognition result the recognized word corresponding to the extracted word model.
    Type: Application
    Filed: December 9, 2009
    Publication date: April 8, 2010
    Applicant: FUJITSU LIMITED
    Inventor: Shouji Harada
  • Publication number: 20100061529
    Abstract: An interactive voice and data response system that directs input to a voice, text, and web-capable software-based router, which is able to intelligently respond to the input by drawing on a combination of human agents, advanced speech recognition and expert systems, connected to the router via a TCP/IP network. The digitized input is broken down into components so that the customer interaction is managed as a series of small tasks performed by a pool of human agents, rather than one ongoing conversation between the customer and a single agent. The router manages the interactions and keeps pace with a real-time conversation. The system utilizes both speech recognition and human intelligence for purposes of interpreting customer utterances or customer text, where the role of the human agent(s) is to input the intent of caller utterances, and where the computer system—not the human agent—determines which response to provide given the customer's stated intent (as interpreted/captured by the human agents).
    Type: Application
    Filed: September 1, 2009
    Publication date: March 11, 2010
    Applicant: INTERACTIONS CORPORATION
    Inventor: Michael Eric Cloran
  • Publication number: 20100063880
    Abstract: A method of providing information including providing a communication session of at least one of audio and video media and applying automatic recognition to media transferred on the communication session. An advertisement is selected by a processor, based on the automatic recognition and non-advertisement information is selected by the processor, responsive to the automatic recognition. The selected advertisements and the selected non-advertisement information, are presented during the communication session.
    Type: Application
    Filed: March 13, 2009
    Publication date: March 11, 2010
    Inventors: Alon Atsmon, Saar Shai
  • Publication number: 20100057462
    Abstract: The present invention relates to a method for speech recognition of a speech signal comprising the steps of providing at least one codebook comprising codebook entries, in particular, multivariate Gaussians of feature vectors, that are frequency weighted such that higher weights are assigned to entries corresponding to frequencies below a predetermined level than to entries corresponding to frequencies above the predetermined level and processing the speech signal for speech recognition comprising extracting at least one feature vector from the speech signal and matching the feature vector with the entries of the codebook.
    Type: Application
    Filed: September 2, 2009
    Publication date: March 4, 2010
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Tobias Herbig, Martin Raab, Raymond Brueckner, Rainer Gruhn
  • Publication number: 20100049521
    Abstract: A method for processing speech audio in a network connected client device can include selecting a speech grammar for use in a speech recognition system in the network connected client device; characterizing the selected speech grammar; and, based on the characterization, determining whether to process the speech grammar locally in the network connected client device, or remotely in a speech server in the network. In one aspect of the invention, the selecting step can include establishing a communications session with a speech server; and, querying the speech server for a speech grammar over the established communications session. Additionally, the selecting step can further include registering the speech grammar in the speech recognition system.
    Type: Application
    Filed: October 26, 2009
    Publication date: February 25, 2010
    Applicant: Nuance Communications, Inc.
    Inventors: Harvey Ruback, Steven Woodward
  • Publication number: 20100036660
    Abstract: A prosody analyzer enhances the interpretation of natural language utterances. The analyzer is distributed over a client/server architecture, so that the scope of emotion recognition processing tasks can be allocated on a dynamic basis based on processing resources, channel conditions, client loads etc. The partially processed prosodic data can be sent separately or combined with other speech data from the client device and streamed to a server for a real-time response. Training of the prosody analyzer with real world expected responses improves emotion modeling and the real-time identification of potential features such as emphasis, intent, attitude and semantic meaning in the speaker's utterances.
    Type: Application
    Filed: October 14, 2009
    Publication date: February 11, 2010
    Applicant: PHOENIX SOLUTIONS, INC.
    Inventor: Ian M. Bennett
  • Publication number: 20100004932
    Abstract: A speech recognition system includes the following: a feature calculating unit; a sound level calculating unit that calculates an input sound level in each frame; a decoding unit that matches the feature of each frame with an acoustic model and a linguistic model, and outputs a recognized word sequence; a start-point detector that determines a start frame of a speech section based on a reference value; an end-point detector that determines an end frame of the speech section based on a reference value; and a reference value updating unit that updates the reference value in accordance with variations in the input sound level. The start-point detector updates the start frame every time the reference value is updated. The decoding unit starts matching before being notified of the end frame and corrects the matching results every time it is notified of the start frame. The speech recognition system can suppress a delay in response time while performing speech recognition based on a proper speech section.
    Type: Application
    Filed: September 11, 2009
    Publication date: January 7, 2010
    Applicant: FUJITSU LIMITED
    Inventors: Nobuyuki Washio, Shouji Harada
  • Publication number: 20090316862
    Abstract: An object of the present invention is to provide an information processing terminal that specifies emotions from a voice and audio outputs music suitable for the specified emotions to enable the emotions of a loudspeaker who uttered the voice to be recognized readily. In an information processing terminal according to the present invention, an emotion inferring unit 23 detects, from sound information, at least two emotions of an utterer who uttered a voice included in the sound information, and a music data generating unit 24 synthesizes music data, stored in a music parts database 242 and corresponding to the emotions detected by the emotion inferring unit 23, and a controller 22 reproduces the music data generated by the music data generating unit 24.
    Type: Application
    Filed: September 6, 2007
    Publication date: December 24, 2009
    Applicant: PANASONIC CORPORATION
    Inventors: Tetsurou Sugimoto, Yusuke Satoh, Tomoko Obama, Hideaki Matsuo
  • Publication number: 20090299741
    Abstract: A computer-driven device assists a user in self-regulating speech control of the device. The device processes an input signal representing human speech to compute acoustic signal quality indicators indicating conditions likely to be problematic to speech recognition, and advises the user of those conditions.
    Type: Application
    Filed: April 3, 2007
    Publication date: December 3, 2009
    Inventors: Naren Chittar, Vikas Gulati, Matthew Pratt, Harry Printz
  • Publication number: 20090292530
    Abstract: The method and system for modifications of grammars presented in this invention applies to automatic speech recognition systems which take a spoken utterance as input and use a grammar to assign word sequence(s) and, possibly, one or more semantic interpretations to that utterance. One type of modification may take a form of reducing the importance of select grammar components based on the analysis of the occurrence of these components in the original grammar. Another type of modification may take form of adding new grammar components to the grammar of some semantic interpretations based on the analysis of the occurrence of these components in the select set of other semantic interpretations. Both modifications can be carried out either automatically or offered for validation. Some benefits of the presented method and system are: reduced effort for building grammars, improvement of recognition accuracy, automatic adaptation of dynamic grammars to the context.
    Type: Application
    Filed: May 1, 2009
    Publication date: November 26, 2009
    Applicant: Resolvity, Inc.
    Inventors: Jacek Jarmulak, Yevgeniy Lyudovyk
  • Publication number: 20090265170
    Abstract: An audio feature is extracted from audio signal data for each analysis frame and stored in a storage part. Then, the audio feature is read from the storage part, and an emotional state probability of the audio feature corresponding to an emotional state is calculated using one or more statistical models constructed based on previously input learning audio signal data. Then, based on the calculated emotional state probability, the emotional state of a section including the analysis frame is determined.
    Type: Application
    Filed: September 13, 2007
    Publication date: October 22, 2009
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Go Irie, Kouta Hidaka, Takashi Satou, Yukinobu Taniguchi, Shinya Nakajima
  • Publication number: 20090259465
    Abstract: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.
    Type: Application
    Filed: June 24, 2009
    Publication date: October 15, 2009
    Applicant: AT&T Corp.
    Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
  • Publication number: 20090259467
    Abstract: A voice recognition apparatus 10 carries out voice recognition of an inputted voice with reference to a voice recognition dictionary, and outputs a voice recognition result. In this voice recognition apparatus, a plurality of voice recognition dictionaries 23-1 to 23-N are provided according to predetermined classification items.
    Type: Application
    Filed: August 16, 2006
    Publication date: October 15, 2009
    Inventors: Yuki Sumiyoshi, Reiko Okada
  • Publication number: 20090222264
    Abstract: An augmented version of a Low-Complexity Sub-band Coder (LC-SBC) is described herein that is better suited than conventional LC-SBC for wideband voice communication in the Bluetoothâ„¢ framework, where minimizing the power consumption is of paramount importance. The augmented version of LC-SBC utilizes certain techniques associated with speech coding, such as Voice Activity Detection (VAD), to reduce bandwidth usage and power consumption while maintaining voice quality. The augmented version of LC-SBC reduces the average bit rate used for transmitting wideband speech in a manner that does not add significant computational complexity. Furthermore, the augmented version of LC-SBC may advantageously be implemented in a manner that does not require any modification of the underlying logic/structure of LC-SBC.
    Type: Application
    Filed: February 27, 2009
    Publication date: September 3, 2009
    Applicant: BROADCOM CORPORATION
    Inventors: Laurent Pilati, Syavosh Zad-Issa
  • Publication number: 20090222266
    Abstract: A phoneme model clustering apparatus stores a classification condition of a phoneme context, generates a cluster by performing a clustering of context-dependent phoneme models having different acoustic characteristics of central phoneme for each model having a common central phoneme according to the classification condition, sets a conditional response for each cluster according to acoustic characteristics of context-dependent phoneme models included in the cluster, generates a set of clusters by performing a clustering on clusters according to the conditional response, and outputs the context-dependent phoneme models included in the set of clusters.
    Type: Application
    Filed: February 26, 2009
    Publication date: September 3, 2009
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventor: Masaru Sakai
  • Publication number: 20090222262
    Abstract: Signal separation techniques based on frequency dependency are described. In one implementation, a blind signal separation process is provided that avoids the permutation problem of previous signal separation processes. In the process, two or more signal sources are provided, with each signal source having recognized frequency dependencies. The process uses these inter-frequency dependencies to more robustly separate the source signals. The process receives a set of mixed signal input signals, and samples each input signal using a rolling window process. The sampled data is transformed into the frequency domain, which provides channel inputs to the inter-frequency dependent separation process. Since frequency dependencies have been defined for each source, the process is able to use the frequency dependency to more accurately separate the signals.
    Type: Application
    Filed: March 1, 2006
    Publication date: September 3, 2009
    Applicant: THE REGENTS OF THE UNIVERSITY OF CALIFORNIA
    Inventors: Taesu Kim, Te-Won Lee
  • Publication number: 20090216526
    Abstract: A system enhances speech by detecting a speaker's utterance through a first microphone positioned a first distance from a source of interference. A second microphone may detect the speaker's utterance at a different position. A monitoring device may estimate the power level of a first microphone signal. A synthesizer may synthesize part of the first microphone signal by processing the second microphone signal. The synthesis may occur when power level is below a predetermined level.
    Type: Application
    Filed: November 12, 2008
    Publication date: August 27, 2009
    Inventors: Gerhard Uwe Schmidt, Mohamed Krini
  • Publication number: 20090209341
    Abstract: A gaming apparatus of the present invention comprises: a microphone; a speaker; a display; a memory storing text data for each language type; and a controller. The controller is programmed to conduct the processing of: (A) recognizing a language type from a sound inputted from the microphone by executing a language recognition program; (B) conducting a conversation with a player by recognizing a voice inputted from the microphone, in addition to outputting a voice from the speaker by executing a conversation program corresponding to the language recognized in the processing (A); and (C) displaying to the display a text based on text data corresponding to the language type recognized in the processing (A) according to progress of a game, the text data being read from the memory.
    Type: Application
    Filed: January 21, 2009
    Publication date: August 20, 2009
    Applicant: ARUZE GAMING AMERICA, INC.
    Inventor: Kazuo OKADA
  • Publication number: 20090198496
    Abstract: A dialogue system enabling a natural language interaction between a user and a machine having a script interpreter capable of executing dialogue specifications formed according to the rules of an aspect oriented programming language. The script interpreter further contains an advice executor which operates in a weaver type fashion using an appropriately defined select function to determine at most one advice to be executed at join points identified by pointcuts.
    Type: Application
    Filed: February 2, 2009
    Publication date: August 6, 2009
    Inventor: Matthias Denecke
  • Publication number: 20090192795
    Abstract: A steering wheel system for a vehicle. The steering wheel system includes a first microphone mounted in a steering wheel and a second microphone mounted in the vehicle. The first and second microphones are each configured to receive an audible input. The audible input includes an oral command component and a noise component. The steering wheel system also includes a controller configured to identify the noise component by determining that the noise component received at the first microphone is out of phase with the noise component received at the second microphone. The controller is configured to cancel the noise component from the audible input.
    Type: Application
    Filed: November 12, 2008
    Publication date: July 30, 2009
    Inventor: Leonard Cech
  • Publication number: 20090157404
    Abstract: A device receives a voice recognition statistic from a voice recognition application and applies a grammar improvement rule based on the voice recognition statistic. The device also automatically adjusts a weight of the voice recognition statistic based on the grammar improvement rule, and outputs the weight adjusted voice recognition statistic for use in the voice recognition application.
    Type: Application
    Filed: December 17, 2007
    Publication date: June 18, 2009
    Applicant: Verizon Business Network Services Inc.
    Inventor: Kevin W. BROWN
  • Publication number: 20090157392
    Abstract: The present invention discloses a solution for providing a phonetic representation for a content item along with a content item delivered to a speech enabled computing device. The phonetic representation can be specified in a manner that enables it to be added to a speech recognition grammar of the speech enabled computing device. Thus, the device can recognize speech commands using the newly added phonetic representation that involve the content item. Current implementations of speech recognition systems of this type rely internal generation of speech recognition data that is added to the speech recognition grammar. Generation of speech recognition data can, however, be resource intensive, which can be particularly problematic when the speech enabled device is resource limited. The disclosed solution offloads the task of providing the speech recognition data to an external device, such as a relatively resource rich server or a desktop device.
    Type: Application
    Filed: December 18, 2007
    Publication date: June 18, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Neal J. ALEWINE, Daniel E. BADT