Feature Extraction For Speech Recognition; Selection Of Recognition Unit (epo) Patents (Class 704/E15.004)

METHOD AND APPARATUS FOR ORDERING RESULTS OF A QUERY

Publication number: 20110071826

Abstract: A method and apparatus for ordering results from a query is provided herein. During operation, a spoken query is received and converted to a textual representation, such as a word lattice. Search strings are then created from the word lattice. For example a set search strings may be created from the N-grams, such as unigrams and bigrams, of the word lattice. The search strings may be ordered and truncated based on confidence values assigned to the n-grams by the speech recognition system. The set of search strings are sent to at least one search engine, and search results are obtained. The search results are then re-arranged or reordered based on a semantic similarity between the search results and the word lattice.

Type: Application

Filed: September 23, 2009

Publication date: March 24, 2011

Applicant: MOTOROLA, INC.

Inventors: Changxue Ma, Harry M. Bliss
Method for Speech Recognition on All Languages and for Inputing words using Speech Recognition

Publication number: 20110066434

Abstract: The invention can recognize all languages and input words. It needs m unknown voices to represent m categories of known words with similar pronunciations. Words can be pronounced in any languages, dialects or accents. Each will be classified into one of m categories represented by its most similar unknown voice. When user pronounces a word, the invention finds its F most similar unknown voices. All words in F categories represented by F unknown voices will be arranged according to their pronunciation similarity and alphabetic letters. The pronounced word should be among the top words. Since we only find the F most similar unknown voices from m (=500) unknown voices and since the same word can be classified into several categories, our recognition method is stable for all users and can fast and accurately recognize all languages (English, Chinese and etc.) and input much more words without using samples.

Type: Application

Filed: September 29, 2009

Publication date: March 17, 2011

Inventors: Tze-Fen LI, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
Intersession variability compensation for automatic extraction of information from voice

Publication number: 20110040561

Abstract: A method for compensating inter-session variability for automatic extraction of information from an input voice signal representing an utterance of a speaker, includes: processing the input voice signal to provide feature vectors each formed by acoustic features extracted from the input voice signal at a time frame; computing an intersession variability compensation feature vector; and computing compensated feature vectors based on the extracted feature vectors and the intersession variability compensation feature vector.

Type: Application

Filed: May 16, 2006

Publication date: February 17, 2011

Inventors: Claudio Vair, Daniele Colibro, Pietro Laface
METHODS AND SYSTEMS FOR ADAPTING A MODEL FOR A SPEECH RECOGNITION SYSTEM

Publication number: 20110029312

Abstract: Methods are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. A method for model adaptation for a speech recognition system includes determining an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system. The method may further include adjusting an adaptation, of the model for the word or various models for the various words, based on the error rate. Apparatus are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. An apparatus for model adaptation for a speech recognition system includes a processor adapted to estimate an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system.

Type: Application

Filed: October 11, 2010

Publication date: February 3, 2011

Applicant: VOCOLLECT, INC.

Inventors: Keith P. Braho, Jeffrey P. Pike, Lori A. Pike
AUDIO SIGNAL DISCRIMINATING DEVICE AND METHOD

Publication number: 20110029306

Abstract: An audio discriminating device includes a plurality of audio discriminators for discriminating an input audio signal as a speech signal or a non-speech signal by using at least one feature parameter, and determines whether to drive the audio discriminator connected next to the corresponding audio discriminator according to the audio discriminator's audio signal discriminating result.

Type: Application

Filed: June 22, 2010

Publication date: February 3, 2011

Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Inventors: Manho PARK, Sook Jin Lee, Jee Hwan Ahn
VOICE PROCESSING DEVICE AND METHOD, AND PROGRAM

Publication number: 20110029311

Abstract: There is provided a voice processing device. The device includes: score calculation unit configured to calculate a score indicating compatibility of a voice signal input on the basis of an utterance of a user with each of plural pieces of intention information indicating each of a plurality of intentions; intention selection unit configured to select the intention information indicating the intention of the utterance of the user among the plural pieces of intention information on the basis of the score calculated by the score calculation unit; and intention reliability calculation unit configured to calculate the reliability with respect to the intention information selected by the intention selection unit on the basis of the score calculated by the score calculation unit.

Type: Application

Filed: June 17, 2010

Publication date: February 3, 2011

Applicant: Sony Corporation

Inventors: Katsuki MINAMINO, Hitoshi Honda, Yoshinori Maeda, Hiroaki Ogawa
Singular Value Decomposition for Improved Voice Recognition in Presence of Multi-Talker Background Noise

Publication number: 20110010171

Abstract: A system and method for providing speech recognition functionality offers improved accuracy and robustness in noisy environments having multiple speakers. The described technique includes receiving speech energy and converting the received speech energy to a digitized form. The digitized speech energy is decomposed into features that are then projected into a feature space having multiple speaker subspaces. The projected features fall either into one of the multiple speaker subspaces or outside of all speaker subspaces. A speech recognition operation is performed on a selected one of the multiple speaker subspaces to resolve the utterance to a command or data.

Type: Application

Filed: July 7, 2009

Publication date: January 13, 2011

Applicant: General Motors Corporation

Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
Method for Customer Feedback Measurement in Public Places Utilizing Speech Recognition Technology

Publication number: 20110004624

Abstract: A method, a system and a computer program product for enabling a customer response speech recognition unit to dynamically receive customer feedback. The customer response speech recognition unit is positioned at a customer location. The speech recognition unit is automatically initialized when one or more spoken words are detected. The response statements of customers are dynamically received by the customer response speech recognition unit at the customer location, in real time. The customer response speech recognition unit determines when the one or more spoken words of the customer response statement are associated with a score in a database. An analysis of the words is performed to generate a score that reflects the evaluation of the subject by the customer. The score is dynamically updated as new evaluations are received, and the score is displayed within graphical user interface (GUI) to be viewed by one or more potential customers.

Type: Application

Filed: July 2, 2009

Publication date: January 6, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ravi P. Bansal, Mike V. Macias, Saidas T. Kottawar, Salil P. Gandhi, Sandip D. Mahajan
METHOD AND APPARATUS FOR IMPROVING MEMORY LOCALITY FOR REAL-TIME SPEECH RECOGNITION

Publication number: 20100332228

Abstract: According to some embodiments, a method and apparatus are provided to buffer N audio frames of a plurality of audio frames associated with an audio signal, pre-compute scores for a subset of context dependent models (CDMs), and perform a graphical model search associated with the N audio frames where a score of a context independent model (CIM) associated with a CDM is used in lieu of a score for the CDM when a score for the CDM is needed and has not been pre-computed.

Type: Application

Filed: June 25, 2009

Publication date: December 30, 2010

Inventors: Michael Eugene Deisher, Tao Ma
Information Processing Apparatus, Information Processing Method, and Computer Program

Publication number: 20100312561

Abstract: An apparatus and a method for performing a grounding process using the POMDP are provided. The configuration is designed so that, in order to understand a request from a user through the utterances from the user, a grounding process is performed using the POMDP (Partially Observable Markov Decision Process) in which analysis information acquired from a language analyzing unit that receives the utterances of the user and performs language analysis and pragmatic information including task feasibility information acquired from the task manager that performs a task are set as observation information. Accordingly, understanding can be efficiently achieved, and high-speed and accurate recognition of the user request and task execution based on the user request can be provided.

Type: Application

Filed: December 4, 2008

Publication date: December 9, 2010

Inventor: Ugo Di Profio
SYSTEM AND METHOD FOR ADAPTING AUTOMATIC SPEECH RECOGNITION PRONUNCIATION BY ACOUSTIC MODEL RESTRUCTURING

Publication number: 20100312560

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

Type: Application

Filed: June 9, 2009

Publication date: December 9, 2010

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Andrej LJOLJE, Alistair D. CONKIE, Ann K. SYRDAL
CIRCUIT STARTUP METHOD AND CIRCUIT STARTUP APPARATUS UTILIZING UTTERANCE ESTIMATION FOR USE IN SPEECH PROCESSING SYSTEM PROVIDED WITH SOUND COLLECTING DEVICE

Publication number: 20100292987

Abstract: A circuit startup method utilizing utterance estimation in a speech processing system including a sound collecting device is provided. The circuit startup method includes a subset power supply step of supplying power to the sound collecting device and a signal processing circuit, and a sound collecting step of inputting a sound from the sound collecting device through the signal processing circuit. The circuit startup method further includes an utterance estimation step of estimating whether or not a speech is contained in the inputted sound, and a power supply step of supplying power to the speech processing circuit for an utterance interval when it is estimated that a speech is contained from an estimation result of the utterance estimation step.

Type: Application

Filed: May 6, 2010

Publication date: November 18, 2010

Inventors: Hiroshi Kawaguchi, Masahiko Yoshimoto, Hiroki Noguchi, Tomoya Takagi
SYSTEMS AND METHODS FOR RESPONDING TO NATURAL LANGUAGE SPEECH UTTERANCE

Publication number: 20100286985

Abstract: Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcome the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command.

Type: Application

Filed: July 19, 2010

Publication date: November 11, 2010

Applicant: VoiceBox Technologies, Inc.

Inventors: Robert A. Kennewick, David Locke, Michael R. Kennewick, SR., Michael R. Kennewick, JR., Richard Kennewick, Tom Freeman
APPARATUS AND METHOD FOR PREDICTING USER'S INTENTION BASED ON MULTIMODAL INFORMATION

Publication number: 20100280983

Abstract: Disclosed are an apparatus and method of deducing a user's intention using multimodal information. The user's intention deduction apparatus includes a first predictor to predict a part of a user's intention using at least one piece of motion information, and a second predictor to predict the user's intention using the predicted part of the user's intention and multimodal information received from at least one multimodal sensor.

Type: Application

Filed: April 29, 2010

Publication date: November 4, 2010

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Jeong-Mi CHO, Jeong-Su KIM, Won-Chul BANG, Nam-Hoon KIM
METHODS FOR ELECTRONICALLY ANALYSING A DIALOGUE AND CORRESPONDING SYSTEMS

Publication number: 20100278377

Abstract: The invention relates to a method for electronically evaluating a dialogue between at least two persons comprising receiving audio data, analysing the audio data to determine the reparation of utterances of the at least two persons in the course of the dialogue and comparing the results of the analysis with predetermined communication patterns.

Type: Application

Filed: June 25, 2008

Publication date: November 4, 2010

Applicant: Zero to One Technology

Inventors: Philippe Hamel, Jean-Paul Audrain, Pierre-Sylvain Luquet, Eric Faurot
NOISE ROBUST SPEECH CLASSIFIER ENSEMBLE

Publication number: 20100280827

Abstract: Embodiments for implementing a speech recognition system that includes a speech classifier ensemble are disclosed. In accordance with one embodiment, the speech recognition system includes a classifier ensemble to convert feature vectors that represent a speech vector into log probability sets. The classifier ensemble includes a plurality of classifiers. The speech recognition system includes a decoder ensemble to transform the log probability sets into output symbol sequences. The speech recognition system further includes a query component to retrieve one or more speech utterances from a speech database using the output symbol sequences.

Type: Application

Filed: April 30, 2009

Publication date: November 4, 2010

Applicant: Microsoft Corporation

Inventors: Kunal Mukerjee, Kazuhito Koishida, Shankar Regunathan
MULTI-MEDIA DATA EDITING SYSTEM, METHOD AND ELECTRONIC DEVICE USING SAME

Publication number: 20100278505

Abstract: The present disclosure provides a multi-media data editing system for editing a multi-media data. The multi-media data editing system includes a sample memory and a processor. The sample memory stores a plurality of undesired voice samples. The processor includes a voice obtaining module, a voice comparing module, and a voice editing module. The voice obtaining module is configured for obtaining an audio data from the multi-media data. The voice comparing module is configured for comparing the obtained audio data with the plurality of undesired voice samples looking for a match. The voice editing module is configured for editing the audio data during the audio data matched with undesired voice samples. The present disclosure also provides a multi-media data editing method. And the present disclosure provides an electronic device using the multi-media data editing system.

Type: Application

Filed: December 18, 2009

Publication date: November 4, 2010

Applicant: HON HAI PRECISION INDUSTRY CO., LTD.

Inventor: CHUAN-FENG WU
APPARATUS AND METHOD FOR DETECTING SPEECH

Publication number: 20100268533

Abstract: A speech detection apparatus and method are provided. The speech detection apparatus and method determine whether a frame is speech or not using feature information extracted from an input signal. The speech detection apparatus may estimate a situation related to an input frame and determine which feature information is required for speech detection for the input frame in the estimated situation. The speech detection apparatus may detect a speech signal using dynamic feature information that may be more suitable to the situation of a particular frame, instead of using the same feature information for each and every frame.

Type: Application

Filed: April 16, 2010

Publication date: October 21, 2010

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Chi-youn PARK, Nam-hoon Kim, Jeong-mi Cho
FEATURE COMPENSATION APPROACH TO ROBUST SPEECH RECOGNITION

Publication number: 20100262423

Abstract: Described is a technology by which a feature compensation approach to speech recognition uses a high-order vector Taylor series (HOVTS) approximation of a model of distortions to improve recognition accuracy. Speech recognizer models trained with clean speech degrade when later dealing with speech that is corrupted by additive noises and convolutional distortions. The approach attempts to remove any such noise/distortions from the input speech. To use the HOVTS approximation, a Gaussian mixture model is trained and used to convert cepstral domain feature vectors to log spectrum components. HOVTS computes statistics for the components, which are transformed back to the cepstral domain. A noise/distortion estimate is obtained, and used to provide a clean speech estimate to the recognizer.

Type: Application

Filed: April 13, 2009

Publication date: October 14, 2010

Applicant: Microsoft Corporation

Inventors: Qiang Huo, Jun Du
Non-dialogue-based Learning Apparatus and Dialogue-based Learning Apparatus

Publication number: 20100250241

Abstract: The invention provides a dialogue-based learning apparatus through dialogue with users comprising: a speech input unit (10) for inputting speeches; a speech recognition unit (20) for recognizing the input speech; and a behavior and dialogue controller (30) for controlling behaviors and dialogues according to speech recognition results, wherein the behavior and dialogue controller (30) has a topic recognition expert (34) to memorise contents of utterances and to retrieve the topic that best matches the speech recognition results, and a mode switching expert (35) to control mode switching in accordance with a user utterance, wherein the mode switching expert switches modes in accordance with a user utterance, wherein the topic recognition expert registers a plurality words in the utterance as topics in first mode, performs searches from among the registered topics, and selects the maximum likelihood topic in second mode.

Type: Application

Filed: August 29, 2008

Publication date: September 30, 2010

Inventors: Naoto Iwahashi, Noriyuki Kimura, Mikio Nakano, Kotaro Funakoshi
CRITICAL TEST RESULT MANAGEMENT SYSTEM AND METHOD

Publication number: 20100223285

Abstract: A critical test results management system and method for capturing test data from a test results providing program and delivering messages to interested recipients. The system and method generating alerts, escalating the alerts to message receiving devices, and tracking the status of the alerts. The recipient of an alert can access the system to obtain the contents of the alert, acknowledge receipt of the alert, and record notes related to the alert. The invention tracks when alerts were sent and if they have been acknowledged. If the alerts are not acknowledge prior expiration of a predetermined time period, further alerts are escalated to the recipient, to different receiving devices, or to different recipients.

Type: Application

Filed: April 28, 2010

Publication date: September 2, 2010

Inventor: Brian Biddulph-Krentar
Dialog Prediction Using Lexical and Semantic Features

Publication number: 20100217592

Abstract: The present invention provides a method for identifying a turn, such as a sentence or phrase, for addition to a platform dialog comprising a plurality of turns. Lexical features of each of a set of candidate turns relative to one or more turns in the platform dialog are determined. Semantic features associated with each candidate turn and associated with the platform dialog are determined to identify one or more topics associated with each candidate turn and with the platform dialog. Lexical features of each candidate turn are compared to lexical features of the platform dialog and semantic features associated with each candidate turn are compared to semantic features of the platform dialog to rank the candidate turns based on similarity of lexical features and semantic features of each candidate turn to lexical features and semantic features of the platform dialog.

Type: Application

Filed: October 14, 2009

Publication date: August 26, 2010

Applicant: HONDA MOTOR CO., LTD.

Inventors: Rakesh Gupta, Lev-Arie Ratinov
METHOD AND APPARATUS FOR AUTOMATIC MASH-UP GENERATION

Publication number: 20100209003

Abstract: In one embodiment, a method includes obtaining a target template and processing the target template to identify at least one component of the target template. The method also includes searching at least one collection of content to identify at least a first instance of content that substantially matches the component of the target template. The first instance of content is presented as substantially matching the component. Finally, a first arrangement that includes the first instance of content is created. Such a first arrangement is associated with a mash-up related to the target template.

Type: Application

Filed: February 16, 2009

Publication date: August 19, 2010

Applicant: Cisco Technology, Inc.

Inventors: John Toebes, Glenn Thomas Millican, III
SPEECH RECOGNITION SYSTEM AND DATA UPDATING METHOD

Publication number: 20100185446

Abstract: It is provided a speech recognition system installed in a terminal coupled to a server via a network. The terminal holds map data including a landmark. The speech recognition system manages recognition data including a word corresponding to a name of the landmark, and sends update area information and updated time to the server. The server generates, when recognition data of the area of the update area information sent from the terminal has been changed after updated time, difference data between latest recognition data and recognition data of the update area information at a time of the updated time, and sends the generated difference data and map data of the update area information to the terminal. The terminal updates the map data based on the map data sent from the server. The speech recognition system updates the recognition data managed by the terminal based on the difference data.

Type: Application

Filed: January 20, 2010

Publication date: July 22, 2010

Inventors: Takeshi HOMMA, Hiroaki Kokubo, Akinori Asahara, Hisashi Takahashi
AUTOMATIC MEASUREMENT OF SPEECH FLUENCY

Publication number: 20100174533

Abstract: Techniques are described for automatically measuring fluency of a patient's speech based on prosodic characteristics thereof. The prosodic characteristics may include statistics regarding silent pauses, filled pauses, repetitions, or fundamental frequency of the patient's speech. The statistics may include a count, average number of occurrences, duration, average duration, frequency of occurrence, standard deviation, or other statistics. In one embodiment, a method includes receiving an audio sample that includes speech of a patient, analyzing the audio sample to identify prosodic characteristics of the speech of the patient, and automatically measuring fluency of the speech of the patient based on the prosodic characteristics. These techniques may present several advantages, such as objectively measuring fluency of a patient's speech without requiring a manual transcription or other manual intervention in the analysis process.

Type: Application

Filed: January 5, 2010

Publication date: July 8, 2010

Applicant: Regents of the University of Minnesota

Inventor: Serguei V.S. Pakhomov
Utterance Processing For Network-Based Speech Recognition Utilizing A Client-Side Cache

Publication number: 20100161328

Abstract: Embodiments are provided for utilizing a client-side cache for utterance processing to facilitate network based speech recognition. An utterance comprising a query is received in a client computing device. The query is sent from the client to a network server for results processing. The utterance is processed to determine a speech profile. A cache lookup is performed based on the speech profile to determine whether results data for the query is stored in the cache. If the results data is stored in the cache, then a query is sent to cancel the results processing on the network server and the cached results data is displayed on the client computing device.

Type: Application

Filed: December 18, 2008

Publication date: June 24, 2010

Applicant: Microsoft Corporation

Inventors: Andrew K. Krumel, Shuangyu Chang, Robert L. Chambers
System and Method for Automatic Verification of the Understandability of Speech

Publication number: 20100100381

Abstract: The present invention relates to a system and method for automatically verifying that a message received from a user is intelligible. In an exemplary embodiment, a message is received from the user. A speech level of the user's message may be measured and compared to a pre-determined speech level threshold to determine whether the measured speech level is below the pre-determined speech level threshold. A signal-to-noise ratio of the user's message may be measured and compared to a pre-determined signal-to-noise ratio threshold to determine whether the measured signal-to-noise ratio of the message is below the pre-determined signal-to-noise ratio threshold. An estimate of intelligibility for the user's message may be calculated and compared to an intelligibility threshold to determine whether the calculated estimate of intelligibility is below the intelligibility threshold.

Type: Application

Filed: December 22, 2009

Publication date: April 22, 2010

Applicant: AT&T Corp.

Inventors: Harvey S. Cohen, Randy G. Goldberg, Kenneth H. Rosen
METHODS AND APPARATUS FOR NOISE ESTIMATION

Publication number: 20100094625

Abstract: A system and method are disclosed for noise level/spectrum estimation and speech activity detection. Some embodiments include a probabilistic model to estimate noise level and subsequently detect the presence of speech. These embodiments outperform standard voice activity detectors (VADs), producing improved detection in a variety of noisy environments.

Type: Application

Filed: October 14, 2009

Publication date: April 15, 2010

Applicant: QUALCOMM Incorporated

Inventors: Asif I. Mohammad, Dinesh Ramakrishnan
SPEECH RECOGNIZER, SPEECH RECOGNITION METHOD, AND SPEECH RECOGNITION PROGRAM

Publication number: 20100088098

Abstract: A speech recognition apparatus includes a speech collating unit that calculates similarities at each time between a feature amount converted by a speech analyzing unit and a word model generated by a word model generating unit. The speech collating unit extracts a word model from word models generated by the word model generating unit, whose minimum similarity among similarities at each time or whose overall similarity obtained from similarities at each time satisfies a second threshold value condition, and whose similarity at each time in a section among vocalization sections of utterance speech and corresponding to either a phoneme or a phoneme string associated with a first threshold value condition satisfies the first threshold value condition, and outputs as a recognition result the recognized word corresponding to the extracted word model.

Type: Application

Filed: December 9, 2009

Publication date: April 8, 2010

Applicant: FUJITSU LIMITED

Inventor: Shouji Harada
APPARATUS AND METHOD FOR PROCESSING SERVICE INTERACTIONS

Publication number: 20100061529

Abstract: An interactive voice and data response system that directs input to a voice, text, and web-capable software-based router, which is able to intelligently respond to the input by drawing on a combination of human agents, advanced speech recognition and expert systems, connected to the router via a TCP/IP network. The digitized input is broken down into components so that the customer interaction is managed as a series of small tasks performed by a pool of human agents, rather than one ongoing conversation between the customer and a single agent. The router manages the interactions and keeps pace with a real-time conversation. The system utilizes both speech recognition and human intelligence for purposes of interpreting customer utterances or customer text, where the role of the human agent(s) is to input the intent of caller utterances, and where the computer system—not the human agent—determines which response to provide given the customer's stated intent (as interpreted/captured by the human agents).

Type: Application

Filed: September 1, 2009

Publication date: March 11, 2010

Applicant: INTERACTIONS CORPORATION

Inventor: Michael Eric Cloran
PROVIDING CONTENT RESPONSIVE TO MULTIMEDIA SIGNALS

Publication number: 20100063880

Abstract: A method of providing information including providing a communication session of at least one of audio and video media and applying automatic recognition to media transferred on the communication session. An advertisement is selected by a processor, based on the automatic recognition and non-advertisement information is selected by the processor, responsive to the automatic recognition. The selected advertisements and the selected non-advertisement information, are presented during the communication session.

Type: Application

Filed: March 13, 2009

Publication date: March 11, 2010

Inventors: Alon Atsmon, Saar Shai
Speech Recognition

Publication number: 20100057462

Abstract: The present invention relates to a method for speech recognition of a speech signal comprising the steps of providing at least one codebook comprising codebook entries, in particular, multivariate Gaussians of feature vectors, that are frequency weighted such that higher weights are assigned to entries corresponding to frequencies below a predetermined level than to entries corresponding to frequencies above the predetermined level and processing the speech signal for speech recognition comprising extracting at least one feature vector from the speech signal and matching the feature vector with the entries of the codebook.

Type: Application

Filed: September 2, 2009

Publication date: March 4, 2010

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Tobias Herbig, Martin Raab, Raymond Brueckner, Rainer Gruhn
SELECTIVE ENABLEMENT OF SPEECH RECOGNITION GRAMMARS

Publication number: 20100049521

Abstract: A method for processing speech audio in a network connected client device can include selecting a speech grammar for use in a speech recognition system in the network connected client device; characterizing the selected speech grammar; and, based on the characterization, determining whether to process the speech grammar locally in the network connected client device, or remotely in a speech server in the network. In one aspect of the invention, the selecting step can include establishing a communications session with a speech server; and, querying the speech server for a speech grammar over the established communications session. Additionally, the selecting step can further include registering the speech grammar in the speech recognition system.

Type: Application

Filed: October 26, 2009

Publication date: February 25, 2010

Applicant: Nuance Communications, Inc.

Inventors: Harvey Ruback, Steven Woodward
Emotion Detection Device and Method for Use in Distributed Systems

Publication number: 20100036660

Abstract: A prosody analyzer enhances the interpretation of natural language utterances. The analyzer is distributed over a client/server architecture, so that the scope of emotion recognition processing tasks can be allocated on a dynamic basis based on processing resources, channel conditions, client loads etc. The partially processed prosodic data can be sent separately or combined with other speech data from the client device and streamed to a server for a real-time response. Training of the prosody analyzer with real world expected responses improves emotion modeling and the real-time identification of potential features such as emphasis, intent, attitude and semantic meaning in the speaker's utterances.

Type: Application

Filed: October 14, 2009

Publication date: February 11, 2010

Applicant: PHOENIX SOLUTIONS, INC.

Inventor: Ian M. Bennett
SPEECH RECOGNITION SYSTEM, SPEECH RECOGNITION PROGRAM, AND SPEECH RECOGNITION METHOD

Publication number: 20100004932

Abstract: A speech recognition system includes the following: a feature calculating unit; a sound level calculating unit that calculates an input sound level in each frame; a decoding unit that matches the feature of each frame with an acoustic model and a linguistic model, and outputs a recognized word sequence; a start-point detector that determines a start frame of a speech section based on a reference value; an end-point detector that determines an end frame of the speech section based on a reference value; and a reference value updating unit that updates the reference value in accordance with variations in the input sound level. The start-point detector updates the start frame every time the reference value is updated. The decoding unit starts matching before being notified of the end frame and corrects the matching results every time it is notified of the start frame. The speech recognition system can suppress a delay in response time while performing speech recognition based on a proper speech section.

Type: Application

Filed: September 11, 2009

Publication date: January 7, 2010

Applicant: FUJITSU LIMITED

Inventors: Nobuyuki Washio, Shouji Harada
INFORMATION PROCESSING TERMINAL AND MUSIC INFORMATION GENERATING METHOD AND PROGRAM

Publication number: 20090316862

Abstract: An object of the present invention is to provide an information processing terminal that specifies emotions from a voice and audio outputs music suitable for the specified emotions to enable the emotions of a loudspeaker who uttered the voice to be recognized readily. In an information processing terminal according to the present invention, an emotion inferring unit 23 detects, from sound information, at least two emotions of an utterer who uttered a voice included in the sound information, and a music data generating unit 24 synthesizes music data, stored in a music parts database 242 and corresponding to the emotions detected by the emotion inferring unit 23, and a controller 22 reproduces the music data generated by the music data generating unit 24.

Type: Application

Filed: September 6, 2007

Publication date: December 24, 2009

Applicant: PANASONIC CORPORATION

Inventors: Tetsurou Sugimoto, Yusuke Satoh, Tomoko Obama, Hideaki Matsuo
Detection and Use of Acoustic Signal Quality Indicators

Publication number: 20090299741

Abstract: A computer-driven device assists a user in self-regulating speech control of the device. The device processes an input signal representing human speech to compute acoustic signal quality indicators indicating conditions likely to be problematic to speech recognition, and advises the user of those conditions.

Type: Application

Filed: April 3, 2007

Publication date: December 3, 2009

Inventors: Naren Chittar, Vikas Gulati, Matthew Pratt, Harry Printz
Method and system for grammar relaxation

Publication number: 20090292530

Abstract: The method and system for modifications of grammars presented in this invention applies to automatic speech recognition systems which take a spoken utterance as input and use a grammar to assign word sequence(s) and, possibly, one or more semantic interpretations to that utterance. One type of modification may take a form of reducing the importance of select grammar components based on the analysis of the occurrence of these components in the original grammar. Another type of modification may take form of adding new grammar components to the grammar of some semantic interpretations based on the analysis of the occurrence of these components in the select set of other semantic interpretations. Both modifications can be carried out either automatically or offered for validation. Some benefits of the presented method and system are: reduced effort for building grammars, improvement of recognition accuracy, automatic adaptation of dynamic grammars to the context.

Type: Application

Filed: May 1, 2009

Publication date: November 26, 2009

Applicant: Resolvity, Inc.

Inventors: Jacek Jarmulak, Yevgeniy Lyudovyk
EMOTION DETECTING METHOD, EMOTION DETECTING APPARATUS, EMOTION DETECTING PROGRAM THAT IMPLEMENTS THE SAME METHOD, AND STORAGE MEDIUM THAT STORES THE SAME PROGRAM

Publication number: 20090265170

Abstract: An audio feature is extracted from audio signal data for each analysis frame and stored in a storage part. Then, the audio feature is read from the storage part, and an emotional state probability of the audio feature corresponding to an emotional state is calculated using one or more statistical models constructed based on previously input learning audio signal data. Then, based on the calculated emotional state probability, the emotional state of a section including the analysis frame is determined.

Type: Application

Filed: September 13, 2007

Publication date: October 22, 2009

Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventors: Go Irie, Kouta Hidaka, Takashi Satou, Yukinobu Taniguchi, Shinya Nakajima
LOW LATENCY REAL-TIME VOCAL TRACT LENGTH NORMALIZATION

Publication number: 20090259465

Abstract: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.

Type: Application

Filed: June 24, 2009

Publication date: October 15, 2009

Applicant: AT&T Corp.

Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
Voice Recognition Apparatus

Publication number: 20090259467

Abstract: A voice recognition apparatus 10 carries out voice recognition of an inputted voice with reference to a voice recognition dictionary, and outputs a voice recognition result. In this voice recognition apparatus, a plurality of voice recognition dictionaries 23-1 to 23-N are provided according to predetermined classification items.

Type: Application

Filed: August 16, 2006

Publication date: October 15, 2009

Inventors: Yuki Sumiyoshi, Reiko Okada
SUB-BAND CODEC WITH NATIVE VOICE ACTIVITY DETECTION

Publication number: 20090222264

Abstract: An augmented version of a Low-Complexity Sub-band Coder (LC-SBC) is described herein that is better suited than conventional LC-SBC for wideband voice communication in the Bluetooth™ framework, where minimizing the power consumption is of paramount importance. The augmented version of LC-SBC utilizes certain techniques associated with speech coding, such as Voice Activity Detection (VAD), to reduce bandwidth usage and power consumption while maintaining voice quality. The augmented version of LC-SBC reduces the average bit rate used for transmitting wideband speech in a manner that does not add significant computational complexity. Furthermore, the augmented version of LC-SBC may advantageously be implemented in a manner that does not require any modification of the underlying logic/structure of LC-SBC.

Type: Application

Filed: February 27, 2009

Publication date: September 3, 2009

Applicant: BROADCOM CORPORATION

Inventors: Laurent Pilati, Syavosh Zad-Issa
APPARATUS, METHOD, AND RECORDING MEDIUM FOR CLUSTERING PHONEME MODELS

Publication number: 20090222266

Abstract: A phoneme model clustering apparatus stores a classification condition of a phoneme context, generates a cluster by performing a clustering of context-dependent phoneme models having different acoustic characteristics of central phoneme for each model having a common central phoneme according to the classification condition, sets a conditional response for each cluster according to acoustic characteristics of context-dependent phoneme models included in the cluster, generates a set of clusters by performing a clustering on clusters according to the conditional response, and outputs the context-dependent phoneme models included in the set of clusters.

Type: Application

Filed: February 26, 2009

Publication date: September 3, 2009

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventor: Masaru Sakai
Systems And Methods For Blind Source Signal Separation

Publication number: 20090222262

Abstract: Signal separation techniques based on frequency dependency are described. In one implementation, a blind signal separation process is provided that avoids the permutation problem of previous signal separation processes. In the process, two or more signal sources are provided, with each signal source having recognized frequency dependencies. The process uses these inter-frequency dependencies to more robustly separate the source signals. The process receives a set of mixed signal input signals, and samples each input signal using a rolling window process. The sampled data is transformed into the frequency domain, which provides channel inputs to the inter-frequency dependent separation process. Since frequency dependencies have been defined for each source, the process is able to use the frequency dependency to more accurately separate the signals.

Type: Application

Filed: March 1, 2006

Publication date: September 3, 2009

Applicant: THE REGENTS OF THE UNIVERSITY OF CALIFORNIA

Inventors: Taesu Kim, Te-Won Lee
SYSTEM ENHANCEMENT OF SPEECH SIGNALS

Publication number: 20090216526

Abstract: A system enhances speech by detecting a speaker's utterance through a first microphone positioned a first distance from a source of interference. A second microphone may detect the speaker's utterance at a different position. A monitoring device may estimate the power level of a first microphone signal. A synthesizer may synthesize part of the first microphone signal by processing the second microphone signal. The synthesis may occur when power level is below a predetermined level.

Type: Application

Filed: November 12, 2008

Publication date: August 27, 2009

Inventors: Gerhard Uwe Schmidt, Mohamed Krini
Gaming Apparatus Capable of Conversation with Player and Control Method Thereof

Publication number: 20090209341

Abstract: A gaming apparatus of the present invention comprises: a microphone; a speaker; a display; a memory storing text data for each language type; and a controller. The controller is programmed to conduct the processing of: (A) recognizing a language type from a sound inputted from the microphone by executing a language recognition program; (B) conducting a conversation with a player by recognizing a voice inputted from the microphone, in addition to outputting a voice from the speaker by executing a conversation program corresponding to the language recognized in the processing (A); and (C) displaying to the display a text based on text data corresponding to the language type recognized in the processing (A) according to progress of a game, the text data being read from the memory.

Type: Application

Filed: January 21, 2009

Publication date: August 20, 2009

Applicant: ARUZE GAMING AMERICA, INC.

Inventor: Kazuo OKADA
Aspect oriented programmable dialogue manager and apparatus operated thereby

Publication number: 20090198496

Abstract: A dialogue system enabling a natural language interaction between a user and a machine having a script interpreter capable of executing dialogue specifications formed according to the rules of an aspect oriented programming language. The script interpreter further contains an advice executor which operates in a weaver type fashion using an appropriately defined select function to determine at most one advice to be executed at join points identified by pointcuts.

Type: Application

Filed: February 2, 2009

Publication date: August 6, 2009

Inventor: Matthias Denecke
System and method for receiving audible input in a vehicle

Publication number: 20090192795

Abstract: A steering wheel system for a vehicle. The steering wheel system includes a first microphone mounted in a steering wheel and a second microphone mounted in the vehicle. The first and second microphones are each configured to receive an audible input. The audible input includes an oral command component and a noise component. The steering wheel system also includes a controller configured to identify the noise component by determining that the noise component received at the first microphone is out of phase with the noise component received at the second microphone. The controller is configured to cancel the noise component from the audible input.

Type: Application

Filed: November 12, 2008

Publication date: July 30, 2009

Inventor: Leonard Cech
GRAMMAR WEIGHTING VOICE RECOGNITION INFORMATION

Publication number: 20090157404

Abstract: A device receives a voice recognition statistic from a voice recognition application and applies a grammar improvement rule based on the voice recognition statistic. The device also automatically adjusts a weight of the voice recognition statistic based on the grammar improvement rule, and outputs the weight adjusted voice recognition statistic for use in the voice recognition application.

Type: Application

Filed: December 17, 2007

Publication date: June 18, 2009

Applicant: Verizon Business Network Services Inc.

Inventor: Kevin W. BROWN
PROVIDING SPEECH RECOGNITION DATA TO A SPEECH ENABLED DEVICE WHEN PROVIDING A NEW ENTRY THAT IS SELECTABLE VIA A SPEECH RECOGNITION INTERFACE OF THE DEVICE

Publication number: 20090157392

Abstract: The present invention discloses a solution for providing a phonetic representation for a content item along with a content item delivered to a speech enabled computing device. The phonetic representation can be specified in a manner that enables it to be added to a speech recognition grammar of the speech enabled computing device. Thus, the device can recognize speech commands using the newly added phonetic representation that involve the content item. Current implementations of speech recognition systems of this type rely internal generation of speech recognition data that is added to the speech recognition grammar. Generation of speech recognition data can, however, be resource intensive, which can be particularly problematic when the speech enabled device is resource limited. The disclosed solution offloads the task of providing the speech recognition data to an external device, such as a relatively resource rich server or a desktop device.

Type: Application

Filed: December 18, 2007

Publication date: June 18, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Neal J. ALEWINE, Daniel E. BADT

prev 1 2 3 next