Speech Recognition (epo) Patents (Class 704/E15.001)

E Subclasses

Assessment or evaluation of speech recognition systems (epo) (Class 704/E15.002)

Language recognition (epo) (Class 704/E15.003)

Feature extraction for speech recognition; selection of recognition unit (epo) (Class 704/E15.004)

Segmentation or word limit detection (epo) (Class 704/E15.005)

Word boundary detection (EPO) (Class 704/E15.006)

Creation of reference templates; training of speech recognition systems, e.g., adaption to the characteristics of the speaker's voice, etc. (epo) (Class 704/E15.007)

Speech classification or search (epo) (Class 704/E15.014)

Speech recognition techniques for robustness in adverse environments, e.g., in noise, of stress induced speech, etc. (epo) (Class 704/E15.039)

Procedures used during a speech recognition process, e.g., man-machine dialogue, etc. (epo) (Class 704/E15.04)

Speech recognition using nonacoustical features, e.g., position of the lips, etc. (epo) (Class 704/E15.041)

Using position of the lips, movement of the lips, or face analysis (EPO) (Class 704/E15.042)

Speech to text systems (epo) (Class 704/E15.043)

Constructional details of speech recognition systems (epo) (Class 704/E15.046)

SPEECH RECOGNITION DEVICE AND SPEECH RECOGNITION METHOD

Publication number: 20120310646

Abstract: A speech recognition device and a speech recognition method thereof are disclosed. In the speech recognition method, a key phrase containing at least one key word is received. The speech recognition method comprises steps: receiving a sound source signal of a key word and generating a plurality of audio signals; transforming the audio signals into a plurality of frequency signals; receiving the frequency signals to obtain a space-frequency spectrum and an angular estimation value thereof; receiving the space-frequency spectrum to define and output at least one spatial eigenparameter and, and using the angular estimation value and the frequency signals to perform spotting and evaluation and outputting a Bhattacharyya distance; and receiving the spatial eigenparameter and the Bhattacharyya distance and using corresponding thresholds to determine correctness of the key phrase. Thereby this invention robustly achieves high speech recognition rate under very low SNR conditions.

Type: Application

Filed: July 7, 2011

Publication date: December 6, 2012

Applicant: NATIONAL CHIAO TUNG UNIVERSITY

Inventors: JWU-SHENG HU, MING-TANG LEE, TING-CHAO WANG, CHIA HSIN YANG
Audio Signal De-Identification

Publication number: 20120303365

Abstract: Techniques are disclosed for automatically de-identifying spoken audio signals. In particular, techniques are disclosed for automatically removing personally identifying information from spoken audio signals and replacing such information with non-personally identifying information. De-identification of a spoken audio signal may be performed by automatically generating a report based on the spoken audio signal. The report may include concept content (e.g., text) corresponding to one or more concepts represented by the spoken audio signal. The report may also include timestamps indicating temporal positions of speech in the spoken audio signal that corresponds to the concept content. Concept content that represents personally identifying information is identified. Audio corresponding to the personally identifying concept content is removed from the spoken audio signal. The removed audio may be replaced with non-personally identifying audio.

Type: Application

Filed: November 23, 2011

Publication date: November 29, 2012

Inventors: Michael Finke, Detlef Koll
SYSTEMS AND METHODS FOR EXTRACTING MEANING FROM MULTIMODAL INPUTS USING FINITE-STATE DEVICES

Publication number: 20120303370

Abstract: Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.

Type: Application

Filed: May 31, 2012

Publication date: November 29, 2012

Applicant: AT&T INTELLECTUAL PROPERTY II, L.P.

Inventors: Srinivas Bangalore, Michael J. Johnston
Distributed Speech Recognition Using One Way Communication

Publication number: 20120296645

Abstract: A speech recognition client sends a speech stream and control stream in parallel to a server-side speech recognizer over a network. The network may be an unreliable, low-latency network. The server-side speech recognizer recognizes the speech stream continuously. The speech recognition client receives recognition results from the server-side recognizer in response to requests from the client. The client may remotely reconfigure the state of the server-side recognizer during recognition.

Type: Application

Filed: August 1, 2012

Publication date: November 22, 2012

Inventors: Eric Carraux, Detlef Koll
USER AUTHENTICATION BY COMBINING SPEAKER VERIFICATION AND REVERSE TURING TEST

Publication number: 20120296651

Abstract: Methods and system for authenticating a user are disclosed. The present invention includes accessing a collection of personal information related to the user. The present invention also includes performing an authentication operation that is based on the collection of personal information. The authentication operation incorporates at least one dynamic component and prompts the user to give an audible utterance. The audible utterance is compared to a stored voiceprint.

Type: Application

Filed: July 26, 2012

Publication date: November 22, 2012

Applicant: MICROSOFT CORPORATION

Inventor: Kuansan Wang
SYSTEMS AND METHODS FOR DETERMINING THE N-BEST STRINGS

Publication number: 20120296648

Abstract: Systems and methods for identifying the N-best strings of a weighted automaton. A potential for each state of an input automaton to a set of destination states of the input automaton is first determined. Then, the N-best paths are found in the result of an on-the-fly determinization of the input automaton. Only the portion of the input automaton needed to identify the N-best paths is determinized. As the input automaton is determinized, a potential for each new state of the partially determinized automaton is determined and is used in identifying the N-best paths of the determinized automaton, which correspond exactly to the N-best strings of the input automaton.

Type: Application

Filed: July 30, 2012

Publication date: November 22, 2012

Applicant: AT&T Corp.

Inventors: Mehryar Mohri, Michael Dennis Riley
Hybrid Speech Recognition

Publication number: 20120296644

Abstract: A hybrid speech recognition system uses a client-side speech recognition engine and a server-side speech recognition engine to produce speech recognition results for the same speech. An arbitration engine produces speech recognition output based on one or both of the client-side and server-side speech recognition results.

Type: Application

Filed: August 1, 2012

Publication date: November 22, 2012

Inventor: Detlef Koll
OBTAINING INFORMATION ON AUDIO VIDEO PROGRAM USING VOICE RECOGNITION OF SOUNDTRACK

Publication number: 20120296652

Abstract: A method for obtaining information on an audio video program being presented on a consumer electronics (CE) device includes receiving at the CE device a viewer command to recognize the audio video program being presented on the CE device. The method also includes receiving signals from a microphone representative of audio from the audio video program as sensed by the microphone as the audio is played real time on the CE device. The method then includes executing voice recognition on the signals from the microphone to determine words in the audio from the audio video program as sensed by the microphone. Words are then uploaded to an Internet server, where they are correlated to at least one audio video script. The method then includes receiving back from the Internet server information correlated by the server using the words to the audio video program.

Type: Application

Filed: May 18, 2011

Publication date: November 22, 2012

Inventors: Seth Hill, Frederick J. Zustak
METHOD AND SYSTEM FOR QUICKLY RECOGNIZING AND RESPONDING TO USER INTENTS AND QUESTIONS FROM NATURAL LANGUAGE INPUT USING INTELLIGENT HIERARCHICAL PROCESSING AND PERSONALIZED ADAPTIVE SEMANTIC INTERFACE

Publication number: 20120296638

Abstract: In embodiments of the present invention, capabilities are described for understanding and responding to the user intent and questions quickly wherein the understanding is based on supervised system learning, Intelligent layered semantic and syntactic information processing and personalized adaptive semantic interface. Supervised system learning creates reference pattern set for the intent repository and possible question categories. Each layer in the layered processing increases the probability of the intent/question recognition. Personalized adaptive voice interface learns from user's interactions over time by enriching the pattern sets and personal index for successfully resolved user intents and questions. Collectively, all these technologies improve the response time for correctly recognizing and responding to user's intents and questions.

Type: Application

Filed: May 18, 2012

Publication date: November 22, 2012

Inventor: Ashish Patwa
PREDICTIVE PRE-RECORDING OF AUDIO FOR VOICE INPUT

Publication number: 20120296655

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing predictive pre-recording of audio for voice input. In one aspect, a method includes obtaining sensor data from one or more sensors of a mobile device while the mobile device is operating in an inactive state, determining that a user of the mobile device is interacting with the mobile device based on the sensor data, invoking voice input functionality of the mobile device in response to determining that the user of the mobile device is interacting with the mobile device, detecting a voice input, and activating the mobile device in response to detecting the voice input.

Type: Application

Filed: July 31, 2012

Publication date: November 22, 2012

Applicant: GOOGLE, INC.

Inventors: Trausti Kristjansson, Matthew I. Lloyd
CHINESE SPEECH RECOGNITION SYSTEM AND METHOD

Publication number: 20120290302

Abstract: A Chinese speech recognition system and method is disclosed. Firstly, a speech signal is received and recognized to output a word lattice. Next, the word lattice is received, and word arcs of the word lattice are rescored and reranked with a prosodic break model, a prosodic state model, a syllable prosodic-acoustic model, a syllable-juncture prosodic-acoustic model and a factored language model, so as to output a language tag, a prosodic tag and a phonetic segmentation tag, which correspond to the speech signal. The present invention performs rescoring in a two-stage way to promote the recognition rate of basic speech information and labels the language tag, prosodic tag and phonetic segmentation tag to provide the prosodic structure and language information for the rear-stage voice conversion and voice synthesis.

Type: Application

Filed: April 13, 2012

Publication date: November 15, 2012

Inventors: Jyh-Her YANG, Chen-Yu Chiang, Ming-Chieh Liu, Yih-Ru Wang, Yuan-Fu Liao, Sin-Horng Chen
Position sensing using loudspeakers as microphones

Patent number: 8311233

Abstract: A multi-channel audio system having multiple loudspeakers is used to obtain information on the location of one or more independent noise sources within an area covered by the loudspeakers. Within the multi-channel audio system, an audio output device has an input for coupling to and receiving audio signals from one or more audio sources; an audio processing module for generating a audio drive signals and providing them on respective outputs to a number of loudspeakers. A sensing module has inputs connected to respective outputs of the audio processing module, for receiving signals corresponding to sound sensed by the loudspeakers. The sensing module includes a discriminator for discriminating between signals corresponding to the audio drive signals and sensed signals from an independent noise source within range of the loudspeakers. A position computation module determines a two or three dimensional position of each independent noise source sensed, relative to the loudspeakers.

Type: Grant

Filed: November 30, 2005

Date of Patent: November 13, 2012

Assignee: Koninklijke Philips Electronics N.V.

Inventor: John Kinghorn
NOISE REDUCTION SYSTEM USING A SENSOR BASED SPEECH DETECTOR

Publication number: 20120284022

Abstract: Speech detection is a technique to determine and classify periods of speech. In a normal conversation, each speaker speaks less than half the time. The remaining time is devoted to listening to the other end and pauses between speech and silence. Embodiments of the current invention provide systems and methods that may be implemented in a communication device. A system may include one or more sensors for detecting information corresponding to a user. The user is in a state of verbal communication. The system further includes one or more sensors for determining periods of speech and non-speech, in the verbal communication, based on the detected information and the audio signal captured by the microphones. The determined periods of speech and non-speech may be used in the coding, compression, noise reduction and other aspects of signal processing.

Type: Application

Filed: July 18, 2012

Publication date: November 8, 2012

Inventor: Alon Konchitsky
DISAMBIGUATION OF CONTACT INFORMATION USING HISTORICAL DATA

Publication number: 20120278076

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for disambiguating contact information are described. A method includes determining, for each of multiple communications that were initiated by a user of a mobile device, a time when the communication was initiated or received; determining, for each of multiple contacts associated with the user, a probability associated with the contact based at least on the times when the communications were initiated or received; weighting a contact disambiguation grammar according to the probabilities; and processing audio data using the contact disambiguation grammar to select a particular contact.

Type: Application

Filed: July 10, 2012

Publication date: November 1, 2012

Applicant: GOOGLE INC.

Inventors: Matthew I. Lloyd, Willard Van Tuyl Rusch, II
System and Method for Community Feedback and Automatic Ratings for Speech Metrics

Publication number: 20120278075

Abstract: A system and method for collecting from an ASR, a first rating of an intelligibility of human speech, and collecting another intelligibility rating of such speech from networked listeners to such speech. The first rating and the second rating are weighed based on an importance to a user of the ratings, and a third rating is created from such weighted two ratings.

Type: Application

Filed: April 25, 2012

Publication date: November 1, 2012

Inventors: Sherrie Ellen Shammass, Eyal Eshed, Ariel Velikovsky
INPUT AND DISPLAYED INFORMATION DEFINITION BASED ON AUTOMATIC SPEECH RECOGNITION DURING A COMMUNICATION SESSION

Publication number: 20120278078

Abstract: Methods and systems for providing contextually relevant information to a user are provided. In particular, a user context is determined. The determination of the user context can be made from information stored on or entered in a user device. The determined user context is provided to an automatic speech recognition (ASR) engine as a watch list. A voice stream is monitored by the ASR engine. In response to the detection of a word on the watch list by the ASR engine, the context engine is notified. The context engine then modifies a display presented to the user, to provide a selectable item that the user can select to access relevant information.

Type: Application

Filed: April 26, 2011

Publication date: November 1, 2012

Applicant: AVAYA INC.

Inventors: Christopher Ricci, Shane Ricci
VOICE INPUT DEVICE

Publication number: 20120271636

Abstract: A voice input device includes: a mastery level identifying device identifying a mastery level of a user with respect to voice input; and an input mode setting device switching a voice input mode between a guided input mode and an unguided input mode. In the guided input mode, preliminary registered contents of the voice input are presented to the user. The input mode setting device sets the voice input mode to the unguided input mode at a starting time when the voice input device starts to receive the voice input. The input mode setting device switches the voice input mode from the unguided input mode to the guided input mode at a switching time. The input mode setting device sets a time interval between the starting time and the switching time in proportion to the mastery level.

Type: Application

Filed: April 16, 2012

Publication date: October 25, 2012

Applicant: DENSO CORPORATION

Inventor: Yuki Fujisawa
Speaker Identification

Publication number: 20120271632

Abstract: Speaker identification techniques are described. In one or more implementations, sample data is received at a computing device of one or more user utterances captured using a microphone. The sample data is processed by the computing device to identify a speaker of the one or more user utterances. The processing involving use of a feature set that includes features obtained using a filterbank having filters that space linearly at higher frequencies and logarithmically at lower frequencies, respectively, features that model the speaker's vocal tract transfer function, and features that indicate a vibration rate of vocal folds of the speaker of the sample data.

Type: Application

Filed: April 25, 2011

Publication date: October 25, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Hoang T. Do, Ivan J. Tashev, Alejandro Acero, Jason S. Flaks, Robert N. Heitkamp, Molly R. Suver
USER INTENT ANALYSIS EXTENT OF SPEAKER INTENT ANALYSIS SYSTEM

Publication number: 20120262296

Abstract: A speaker intent analysis system and method for validating the truthfulness and intent of a plurality of participants' responses to questions. A computer stores, retrieves, and transmits a series of questions to be answered audibly by participants. The participants' answers are received by a data processor. The data processor analyzes and records the participants' speech parameters for determining the likelihood of dishonesty. In addition to analyzing participants' speech parameters for distinguishing stress or other abnormality, the processor may be equipped with voice recognition software to screen responses that while not dishonest, are indicative of possible malfeasance on the part of the participants. Once the responses are analyzed, the processor produces an output that is indicative of the participant's credibility. The output may be sent to proper parties and/or devices such as a web page, computer, e-mail, PDA, pager, database, report, etc. for appropriate action.

Type: Application

Filed: June 12, 2012

Publication date: October 18, 2012

Inventor: DAVID BEZAR
VOICE REMOTE CONTROL

Publication number: 20120265538

Abstract: A device may include a display and logic. The logic may be configured to receive, from a user, a selection f a first control action associated with an application stored in the device, provide, via the display, a number of choices associated with the first control action, and receive, from the user, a word or a phrase to use as a voice command corresponding to the first control action, wherein the word or phrase is selected from the choices.

Type: Application

Filed: May 10, 2012

Publication date: October 18, 2012

Applicant: SONY MOBILE COMMUNICATIONS AB

Inventors: Mats Gustafsson, Julian Charles Hope
PERSONAL VOICE OPERATED REMINDER SYSTEM

Publication number: 20120265535

Abstract: A personal voice operated reminder system. In one embodiment, the system is worn as a device on the body in a form similar to a watch, bracelet or necklace. In another embodiment the system is a device normally held in a person's pocket or purse, and in another embodiment the system is a method added as an application to already existing devices such as PDAs or cellular telephones. This device is configured to record reminders using speech recognition and to play back the reminder message in accordance with directions received using speech recognition, and/or position and/or motion inputs.

Type: Application

Filed: September 6, 2010

Publication date: October 18, 2012

Inventors: Donald Ray Bryant-Rich, Diana Eve Barshaw-Rich
MOBILE ELECTRONIC DEVICE AND DISPLAY CONTROLLING METHOD

Publication number: 20120264487

Abstract: According to an aspect, a mobile electronic device, includes a display unit, a detection unit, and a display control unit. The display unit displays a standard screen on which objects are superimposed. The detection unit detects occurrence of a predetermined event. The display control unit changes arrangement of the objects displayed on the display unit in accordance with the predetermined event detected by the detection unit.

Type: Application

Filed: December 13, 2010

Publication date: October 18, 2012

Applicant: KYOCERA CORPORATION

Inventors: Nayu Nomachi, Takayuki Sato
Speech End-Pointer

Publication number: 20120265530

Abstract: A rule-based end-pointer isolates spoken utterances contained within an audio stream from background noise and non-speech transients. The rule-based end-pointer includes a plurality of rules to determine the beginning and/or end of a spoken utterance based on various speech characteristics. The rules may analyze an audio stream or a portion of an audio stream based upon an event, a combination of events, the duration of an event, or a duration relative to an event. The rules may be manually or dynamically customized depending upon factors that may include characteristics of the audio stream itself, an expected response contained within the audio stream, or environmental conditions.

Type: Application

Filed: April 25, 2012

Publication date: October 18, 2012

Inventors: Phil Hetherington, Alex Escott
VOICE CONTROL DEVICE AND VOICE CONTROL METHOD

Publication number: 20120259640

Abstract: A voice control unit controlling and outputting a first voice signal includes an analysis unit configured to calculate an average value of a gradient of spectrum at a high frequency of an inputted second voice signal as a voice characteristic, a determination unit configured to determine an amplification band and an amplification amount of a spectrum of the first voice signal based on the gradient, and an amplification unit configured to amplify the spectrum of the first voice signal to realize the determined amplification band and the determined amplification amount.

Type: Application

Filed: June 20, 2012

Publication date: October 11, 2012

Applicant: FUJITSU LIMITED

Inventors: Taro TOGAWA, Takeshi OTANI, Masanao SUZUKI, Yasuji OTA
DISPLAY APPARATUS AND VOICE CONVERSION METHOD THEREOF

Publication number: 20120259630

Abstract: The voice conversion method of a display apparatus includes: in response to the receipt of a first video frame, detecting one or more entities from the first video frame; in response to the selection of one of the detected entities, storing the selected entity; in response to the selection of one of a plurality of previously-stored voice samples, storing the selected voice sample in connection with the selected entity; and in response to the receipt of a second video frame including the selected entity, changing a voice of the selected entity based on the selected voice sample and outputting the changed voice.

Type: Application

Filed: April 11, 2012

Publication date: October 11, 2012

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Aditi GARG, Kasthuri Jayachand YADLAPALLI
METHOD AND APPARATUS FOR RECEIVING AUDIO

Publication number: 20120259637

Abstract: An electronic apparatus and method for retrieving a song, and a storage medium. The electronic apparatus includes: a storage unit which stores a plurality of songs; a user input unit which receives a hummed query which is inputted for retrieving a song; and a song retrieving unit which retrieves a song based on the hummed query from among the plurality of stored songs when the hummed query is received. The song retrieving unit extracts a pitch and a duration of the hummed query, converts each of the extracted pitch and duration into multi-level symbols, calculates a string edit distance between the hummed query and one of the plurality of songs based on the symbols, and determines a similarity between the hummed query and a song based on edit operations which are performed within the calculated string edit distance.

Type: Application

Filed: April 11, 2012

Publication date: October 11, 2012

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: A. Srinivas, P. Krishnamoorthy, Rajen Bhatt, Sarvesh Kumar
METHODS AND APPARATUS FOR INITIATING ACTIONS USING A VOICE-CONTROLLED INTERFACE

Publication number: 20120259641

Abstract: Methods and apparatus for initiating an action using a voice-controlled human interface. The interface provides a hands free, voice driven environment to control processes and applications. According to one embodiment, a method comprises electronically receiving first user input, parsing the first user input to determine whether the first user input contains a command activation statement that cues a voice-controlled human interface to enter a command mode in which a second user input comprising a voice signal is processed to identify at least one executable command and, in response to determining that the first user input comprises the command activation statement, identifying the at least one executable command in the second user input.

Type: Application

Filed: April 11, 2012

Publication date: October 11, 2012

Applicant: Nuance Communications, Inc.

Inventors: Richard Grant, Pedro E. McGregor
CONTROLLING AUDIO VIDEO DISPLAY DEVICE (AVDD) TUNING USING CHANNEL NAME

Publication number: 20120259639

Abstract: A television, or other device with television tuner, can be controlled to directly tune to a specific channel name, such as a broadcaster's station name, by using EPG metadata to provide a correlation between a channel number and channel name.

Type: Application

Filed: June 8, 2011

Publication date: October 11, 2012

Inventors: Sabrina Tai-Chen Yeh, David Young, Steven Friedlander
Efficient Exploitation of Model Complementariness by Low Confidence Re-Scoring in Automatic Speech Recognition

Publication number: 20120259627

Abstract: A method for speech recognition is described that uses an initial recognizer to perform an initial speech recognition pass on an input speech utterance to determine an initial recognition result corresponding to the input speech utterance, and a reliability measure reflecting a per word reliability of the initial recognition result. For portions of the initial recognition result where the reliability of the result is low, a re-evaluation recognizer is used to perform a re-evaluation recognition pass on the corresponding portions of the input speech utterance to determine a re-evaluation recognition result corresponding to the re-evaluated portions of the input speech utterance. The initial recognizer and the re-evaluation recognizer are complementary so as to make different recognition errors. A final recognition result is determined based on the re-evaluation recognition result if any, and otherwise based on the initial recognition result.

Type: Application

Filed: May 27, 2010

Publication date: October 11, 2012

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Daniel Willett, Venkatesh Nagesha
System and Method for Modifying and Updating a Speech Recognition Program

Publication number: 20120253800

Abstract: The system provides a speech recognition program, an update website for updating a speech recognition program, and a way of storing data. A user may utilize an update website, to add, modify, and delete items that may comprise speech commands, dll's, multimedia files, executable code, and other information. Speech recognition program may communicate with update website to request information about possible updates. Update website may send a response consisting of information to speech recognition program. Speech recognition program may utilize received information to decide what items to download. A speech recognition program may send one or more requests to update website to download items. Update website may respond by transmitting, requested items to a speech recognition program that overwrite existing items with newly received items.

Type: Application

Filed: September 23, 2011

Publication date: October 4, 2012

Inventors: Michael D. Goller, Stuart E. Goller
Mobile speech attendant access

Publication number: 20120253817

Abstract: A system and method for connecting to a telephone extension listed in a telephone number database is disclosed. The method comprises recording an audio token on a mobile communication device. The audio token is associated with a telephone number included in the database. The audio token is transmitted from the mobile communication device to a server over a digital channel. The telephone number in the database that is associated with the audio token is selected using speech recognition. The mobile communication device is then connected with the telephone number.

Type: Application

Filed: April 4, 2011

Publication date: October 4, 2012

Inventor: Trung (Tim) Trinh
SYSTEM AND METHOD FOR RAPID CUSTOMIZATION OF SPEECH RECOGNITION MODELS

Publication number: 20120253799

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition models when a speech recognizer does not have access to a speech recognition model for that domain of interest and when available domain-specific data is below a minimum desired threshold to create a new domain-specific speech recognition model. A system configured to practice the method identifies a speech recognition domain and combines a set of speech recognition models, each speech recognition model of the set of speech recognition models being from a respective speech recognition domain. The system receives an amount of data specific to the speech recognition domain, wherein the amount of data is less than a minimum threshold to create a new domain-specific model, and tunes the combined speech recognition model for the speech recognition domain based on the data.

Type: Application

Filed: March 28, 2011

Publication date: October 4, 2012

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Srinivas BANGALORE, Robert Bell, Diamantino Antonio Caseiro, Mazin Gilbert, Patrick Haffner
SPEAKER STATE DETECTING APPARATUS AND SPEAKER STATE DETECTING METHOD

Publication number: 20120253807

Abstract: A speaker state detecting apparatus comprises: an audio input unit for acquiring, at least, a first voice emanated by a first speaker and a second voice emanated by a second speaker; a speech interval detecting unit for detecting an overlap period between a first speech period of the first speaker included in the first voice and a second speech period of the second speaker included in the second voice, which starts before the first speech period, or an interval between the first speech period and the second speech period; a state information extracting unit for extracting state information representing a state of the first speaker from the first speech period; and a state detecting unit for detecting the state of the first speaker in the first speech period based on the overlap period or the interval and the first state information.

Type: Application

Filed: February 3, 2012

Publication date: October 4, 2012

Applicant: FUJITSU LIMITED

Inventor: Akira KAMANO
COMPUTER PROGRAM, METHOD, AND SYSTEM FOR VOICE AUTHENTICATION OF A USER TO ACCESS A SECURE RESOURCE

Publication number: 20120253810

Abstract: Authenticating a purported user attempting to access a secure resource includes enrolling a user's voice sample by requiring the user to orally speak preselected enrollment utterances, generating prompts and respective predetermined correct responses where each question has only one correct response, presenting a prompt to the user in real time, and analyzing the user's real time live response to determine if the live response matches the predetermined correct response and if voice characteristics of the user's live voice sample match characteristics of the enrolled voice sample.

Type: Application

Filed: March 29, 2012

Publication date: October 4, 2012

Inventors: Timothy S. Sutton, Stephen T. Dispensa
SYSTEMS, METHODS, AND MEDIA FOR DETERMINING FRAUD RISK FROM AUDIO SIGNALS

Publication number: 20120253805

Abstract: Systems, methods, and media for determining fraud risk from audio signals and non-audio data are provided herein. Some exemplary methods include receiving an audio signal and an associated audio signal identifier, receiving a fraud event identifier associated with a fraud event, determining a speaker model based on the received audio signal, determining a channel model based on a path of the received audio signal, using a server system, updating a fraudster channel database to include the determined channel model based on a comparison of the audio signal identifier and the fraud event identified, and updating a fraudster voice database to include the determined speaker model based on a comparison of the audio signal identifier and the fraud event identifier.

Type: Application

Filed: March 8, 2012

Publication date: October 4, 2012

Inventors: Anthony Rajakumar, Torsten Zeppenfeld, Lisa Guerra, Vipul Vyas
System And Method For Distributed Speech Recognition

Publication number: 20120253806

Abstract: A system and method for distributed speech recognition is provided. Audio data is obtained from a caller participating in a call with an agent. A main recognizer receives a main grammar template and the audio data. A plurality of secondary recognizers each receive the audio data and a reference that identifies a secondary grammar, which is a non-overlapping section of the main grammar template. Speech recognition is performed on each of the secondary recognizers and speech recognition results are identified by applying the secondary grammar to the audio data. An n number of most likely speech recognition results are selected. The main recognizer constructs a new grammar based on the main grammar template using the speech recognition results from each of the secondary recognizers as a new vocabulary. Further speech recognition results are identified by applying the new grammar to the audio data.

Type: Application

Filed: June 18, 2012

Publication date: October 4, 2012

Inventor: Gilad Odinak
Voice Verification System

Publication number: 20120253809

Abstract: A voice verification module 308, for example for an interactive voice response system, is disclosed. The voice verification module 308 is configured to select from a store 310 of verification words one or more verification words responsive to a request for a verification phrase and form a verification phrase including said one or more verification words distributed throughout said verification phrase.

Type: Application

Filed: March 26, 2012

Publication date: October 4, 2012

Applicant: BIOMETRIC SECURITY LTD

Inventors: Trevor Thomas, Nicholas Wise, David Cowell
VOICE RECOGNITION APPARATUS

Publication number: 20120245932

Abstract: According to one embodiment, a voice recognition apparatus includes a determination unit, an estimating unit, and a voice recognition unit. The determination unit determines whether a component with a frequency of not less than 1000 Hz and with a level not lower than a predetermined level is included in a sound input from a plurality of microphones. The estimating unit estimates a sound source direction of the sound when the determination unit determines that the component is included in the sound. The voice recognition unit recognizes whether the sound obtained in the sound source direction coincides with a voice model registered beforehand.

Type: Application

Filed: March 26, 2012

Publication date: September 27, 2012

Inventors: Kazushige OUCHI, Toshiyuki Koga, Daisuke Yamamoto, Miwako Doi
MULTIPLE AUDIO/VIDEO DATA STREAM SIMULATION

Publication number: 20120239393

Abstract: A multiple audio/video data stream simulation method and system. A computing system receives first audio and/or video data streams. The first audio and/or video data streams include data associated with a first person and a second person. The computing system monitors the first audio and/or video data streams. The computing system identifies emotional attributes comprised by the first audio and/or video data streams. The computing system generates second audio and/or video data streams associated with the first audio and/or video data streams. The second audio and/or video data streams include the first audio and/or video data streams data without the emotional attributes. The computing system stores the second audio and/or video data streams.

Type: Application

Filed: May 31, 2012

Publication date: September 20, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Sara H. Basson, Dimitri Kanevsky, Edward Emile Kelley, Bhuvana Ramabhadran
SYSTEM AND METHOD OF SPOKEN LANGUAGE UNDERSTANDING IN HUMAN COMPUTER DIALOGS

Publication number: 20120239383

Abstract: A system and method are disclosed that improve automatic speech recognition in a spoken dialog system. The method comprises partitioning speech recognizer output into self-contained clauses, identifying a dialog act in each of the self-contained clauses, qualifying dialog acts by identifying a current domain object and/or a current domain action, and determining whether further qualification is possible for the current domain object and/or current domain action. If further qualification is possible, then the method comprises identifying another domain action and/or another domain object associated with the current domain object and/or current domain action, reassigning the another domain action and/or another domain object as the current domain action and/or current domain object and then recursively qualifying the new current domain action and/or current object. This process continues until nothing is left to qualify.

Type: Application

Filed: May 25, 2012

Publication date: September 20, 2012

Applicant: AT&T Intellectual Property II, L.P.

Inventors: Srinivas Bangalore, Narendra K. Gupta, Mazin G. Rahim
APPARATUS AND METHOD FOR DISCRIMINATING SPEECH, AND COMPUTER READABLE MEDIUM

Publication number: 20120232890

Abstract: According to one embodiment, an apparatus for discriminating speech/non-speech of a first acoustic signal includes a weight assignment unit, a feature extraction unit, and a speech/non-speech discrimination unit. The first acoustic signal includes a user's speech and a reproduced sound. The reproduced sound is a system sound having a plurality of channels reproduced from a plurality of speakers. The weight assignment unit is configured to assign a weight to each frequency band based on the system sound. The feature extraction unit is configured to extract a feature from a second acoustic signal based on the weight of each frequency band. The second acoustic signal is the first acoustic signal in which the reproduced sound is suppressed. The speech/non-speech discrimination unit is configured to discriminate speech/non-speech of the first acoustic signal based on the feature.

Type: Application

Filed: September 14, 2011

Publication date: September 13, 2012

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Kaoru Suzuki, Masaru Sakai, Yusuke Kida
METHOD AND APPARATUS FOR MICROPHONE MATCHING FOR WEARABLE DIRECTIONAL HEARING DEVICE USING WEARER'S OWN VOICE

Publication number: 20120230526

Abstract: Method and apparatus for microphone matching for wearable directional hearing assistance devices are provided. An embodiment includes a method for matching at least a first microphone to a second microphone, using a user's voice from the user's mouth. The user's voice is processed as received by at least one microphone to determine a frequency profile associated with voice of the user. Intervals are detected where the user is speaking using the frequency profile. Variations in microphone reception between the first microphone and the second microphone are adaptively canceled during the intervals and when the first microphone and second microphone are in relatively constant spatial position with respect to the user's mouth.

Type: Application

Filed: October 3, 2011

Publication date: September 13, 2012

Applicant: Starkey Laboratories, Inc.

Inventor: Tao Zhang
SPEECH COMMUNICATION SYSTEM AND METHOD, AND ROBOT APPARATUS

Publication number: 20120232891

Abstract: This invention realizes a speech communication system and method, and a robot apparatus capable of significantly improving entertainment property. A speech communication system with a function to make conversation with a conversation partner is provided with a speech recognition means for recognizing speech of the conversation partner, a conversation control means for controlling conversation with the conversation partner based on the recognition result of the speech recognition means, an image recognition means for recognizing the face of the conversation partner, and a tracking control means for tracing the existence of the conversation partner based on one or both of the recognition result of the image recognition means and the recognition result of the speech recognition means. The conversation control means controls conversation so as to continue depending on tracking of the tracking control means.

Type: Application

Filed: May 16, 2012

Publication date: September 13, 2012

Applicant: SONY CORPORATION

Inventors: Kazumi Aoyama, Hideki Shimomura
METHOD AND APPARATUS FOR CORRECTING A WORD IN SPEECH INPUT TEXT

Publication number: 20120232904

Abstract: A method and apparatus for correcting a named entity word in a speech input text. The method includes recognizing a speech input signal from a user, obtaining a recognition result including named entity vocabulary mark-up information, determining a named entity word recognized incorrectly in the recognition result according to the named entity vocabulary mark-up information, displaying the named entity word recognized incorrectly, and correcting the named entity word recognized incorrectly.

Type: Application

Filed: March 12, 2012

Publication date: September 13, 2012

Applicant: Samsung Electronics Co., Ltd.

Inventors: Xuan ZHU, Hua Zhang, Tengrong Su, Ki-Wan Eom, Jae-Won Lee
Locating Products in Stores Using Voice Search From a Communication Device

Publication number: 20120232897

Abstract: A user can locate products by dialing a number from any phone and accessing an automatic voice recognition system. Reply is made to the user with information locating the product using a store's product location data converted to automatic voice responses. Smart phone and mobile web access to a product database is enabled using voice-to-text and text search. A taxonomy enables product search requests by product descriptions and/or product brand names, and enable synonyms and phonetic enhancements to the system. Search results are related to products and product categories with concise organization. Relevant advertisements, promotional offers and coupons are delivered based upon search and taxonomy elements. Search requests generate dynamic interior maps of a products location inside the shoppers' location, assisting a shopper to efficiently shop the location for listed items. Business intelligence of product categories enable rapid scaling across retail segments.

Type: Application

Filed: May 1, 2012

Publication date: September 13, 2012

Inventors: Nathan Pettyjohn, Matthew Kulig, Niarcas Jeffrey, Edward Saunders
SYSTEM AND METHOD FOR ISOLATING AND PROCESSING COMMON DIALOG CUES

Publication number: 20120232892

Abstract: A method, system and machine-readable medium are provided. Speech input is received at a speech recognition component and recognized output is produced. A common dialog cue from the received speech input or input from a second source is recognized. An action is performed corresponding to the recognized common dialog cue. The performed action includes sending a communication from the speech recognition component to the speech generation component while bypassing a dialog component.

Type: Application

Filed: May 21, 2012

Publication date: September 13, 2012

Applicant: AT&T Intellectual Property II, L.P.

Inventors: Vincent J. Goffin, Sarangarajan Parthasarathy
SYSTEM AND METHOD FOR IDENTIFICATION OF A SPEAKER BY PHONOGRAMS OF SPONTANEOUS ORAL SPEECH AND BY USING FORMANT EQUALIZATION

Publication number: 20120232899

Abstract: A system and method for identification of a speaker by phonograms of oral speech is disclosed. Similarity between a first phonogram of the speaker and a second, or sample, phonogram is evaluated by matching formant frequencies in referential utterances of a speech signal, where the utterances for comparison are selected from the first phonogram and the second phonogram. Referential utterances of speech signals are selected from the first phonogram and the second phonogram, where the referential utterances include formant paths of at least three formant frequencies. The selected referential utterances including at least two identical formant frequencies are compared therebetween. Similarity of the compared referential utterances from matching other formant frequencies is evaluated, where similarity of the phonograms is determined from evaluation of similarity of all the compared referential utterances.

Type: Application

Filed: March 23, 2012

Publication date: September 13, 2012

Applicant: Obschestvo s orgranichennoi otvetstvennost'yu "Centr Rechevyh Technologij"

Inventor: Sergey Lvovich Koval
METHOD AND AN APPARATUS FOR VOICE ACTIVITY DETECTION

Publication number: 20120232896

Abstract: A voice activity detection apparatus (1) comprising: a signal condition analyzing unit (3) which analyses at least one signal parameter of an input signal to detect a signal condition SC of said input signal; at least two voice activity detection units (4-i) comprising different voice detection characteristics, wherein each voice activity detection unit (4-i) performs separately a voice activity detection of said input signal to provide a voice activity detection decision VADD; and a decision combination unit (5) which combines the voice activity detection decisions VADDs provided by said voice activity detection units (4-i) depending on the detected signal condition SC to provide a combined voice activity detection decision cVADD.

Type: Application

Filed: May 21, 2012

Publication date: September 13, 2012

Applicant: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Anisse TALEB, Zhe WANG, Jianfeng XU, Lei MIAO
AUTOMATIC SPOKEN LANGUAGE IDENTIFICATION BASED ON PHONEME SEQUENCE PATTERNS

Publication number: 20120232901

Abstract: A language identification system that includes a universal phoneme decoder (UPD) is described. The UPD contains a universal phoneme set representing both 1) all phonemes occurring in the set of two or more spoken languages, and 2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio files in the set of two or more potential languages in which the UPD was trained on. Each statistical language model (SLM) uses the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of languages. The run-time language identifier module identifies a particular human language being spoken by utilizing the linguistic probabilities supplied by the SLMs that are based on the set of unique phoneme patterns created for each language.

Type: Application

Filed: May 24, 2012

Publication date: September 13, 2012

Applicant: Autonomy Corporation Ltd.

Inventors: Mahapathy Kadirkamanathan, Christopher John Waple
APPARATUS AND METHOD FOR DISCRIMINATING SPEECH, AND COMPUTER READABLE MEDIUM

Publication number: 20120232895

Abstract: According to one embodiment, an apparatus for discriminating speech/non-speech of a first acoustic signal includes a weight assignment unit, a feature extraction unit, and a speech/non-speech discrimination unit. The weight assignment unit is configured to assign a weight to each frequency band, based on a frequency spectrum of the first acoustic signal including a user's speech and a frequency spectrum of a second acoustic signal including a disturbance sound. The feature extraction unit is configured to extract a feature from the frequency spectrum of the first acoustic signal, based on the weight of each frequency band. The speech/non-speech discrimination unit is configured to discriminate speech/non-speech of the first acoustic signal, based on the feature.

Type: Application

Filed: September 14, 2011

Publication date: September 13, 2012

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Kaoru Suzuki, Masaru Sakai, Yusuke Kida

prev 1 2 3 4 5 6 7 8 … next