Speech Recognition (epo) Patents (Class 704/E15.001)

  • Publication number: 20120310646
    Abstract: A speech recognition device and a speech recognition method thereof are disclosed. In the speech recognition method, a key phrase containing at least one key word is received. The speech recognition method comprises steps: receiving a sound source signal of a key word and generating a plurality of audio signals; transforming the audio signals into a plurality of frequency signals; receiving the frequency signals to obtain a space-frequency spectrum and an angular estimation value thereof; receiving the space-frequency spectrum to define and output at least one spatial eigenparameter and, and using the angular estimation value and the frequency signals to perform spotting and evaluation and outputting a Bhattacharyya distance; and receiving the spatial eigenparameter and the Bhattacharyya distance and using corresponding thresholds to determine correctness of the key phrase. Thereby this invention robustly achieves high speech recognition rate under very low SNR conditions.
    Type: Application
    Filed: July 7, 2011
    Publication date: December 6, 2012
    Applicant: NATIONAL CHIAO TUNG UNIVERSITY
    Inventors: JWU-SHENG HU, MING-TANG LEE, TING-CHAO WANG, CHIA HSIN YANG
  • Publication number: 20120303365
    Abstract: Techniques are disclosed for automatically de-identifying spoken audio signals. In particular, techniques are disclosed for automatically removing personally identifying information from spoken audio signals and replacing such information with non-personally identifying information. De-identification of a spoken audio signal may be performed by automatically generating a report based on the spoken audio signal. The report may include concept content (e.g., text) corresponding to one or more concepts represented by the spoken audio signal. The report may also include timestamps indicating temporal positions of speech in the spoken audio signal that corresponds to the concept content. Concept content that represents personally identifying information is identified. Audio corresponding to the personally identifying concept content is removed from the spoken audio signal. The removed audio may be replaced with non-personally identifying audio.
    Type: Application
    Filed: November 23, 2011
    Publication date: November 29, 2012
    Inventors: Michael Finke, Detlef Koll
  • Publication number: 20120303370
    Abstract: Multimodal utterances contain a number of different modes. These modes can include speech, gestures, and pen, haptic, and gaze inputs, and the like. This invention use recognition results from one or more of these modes to provide compensation to the recognition process of one or more other ones of these modes. In various exemplary embodiments, a multimodal recognition system inputs one or more recognition lattices from one or more of these modes, and generates one or more models to be used by one or more mode recognizers to recognize the one or more other modes. In one exemplary embodiment, a gesture recognizer inputs a gesture input and outputs a gesture recognition lattice to a multimodal parser. The multimodal parser generates a language model and outputs it to an automatic speech recognition system, which uses the received language model to recognize the speech input that corresponds to the recognized gesture input.
    Type: Application
    Filed: May 31, 2012
    Publication date: November 29, 2012
    Applicant: AT&T INTELLECTUAL PROPERTY II, L.P.
    Inventors: Srinivas Bangalore, Michael J. Johnston
  • Publication number: 20120296645
    Abstract: A speech recognition client sends a speech stream and control stream in parallel to a server-side speech recognizer over a network. The network may be an unreliable, low-latency network. The server-side speech recognizer recognizes the speech stream continuously. The speech recognition client receives recognition results from the server-side recognizer in response to requests from the client. The client may remotely reconfigure the state of the server-side recognizer during recognition.
    Type: Application
    Filed: August 1, 2012
    Publication date: November 22, 2012
    Inventors: Eric Carraux, Detlef Koll
  • Publication number: 20120296651
    Abstract: Methods and system for authenticating a user are disclosed. The present invention includes accessing a collection of personal information related to the user. The present invention also includes performing an authentication operation that is based on the collection of personal information. The authentication operation incorporates at least one dynamic component and prompts the user to give an audible utterance. The audible utterance is compared to a stored voiceprint.
    Type: Application
    Filed: July 26, 2012
    Publication date: November 22, 2012
    Applicant: MICROSOFT CORPORATION
    Inventor: Kuansan Wang
  • Publication number: 20120296648
    Abstract: Systems and methods for identifying the N-best strings of a weighted automaton. A potential for each state of an input automaton to a set of destination states of the input automaton is first determined. Then, the N-best paths are found in the result of an on-the-fly determinization of the input automaton. Only the portion of the input automaton needed to identify the N-best paths is determinized. As the input automaton is determinized, a potential for each new state of the partially determinized automaton is determined and is used in identifying the N-best paths of the determinized automaton, which correspond exactly to the N-best strings of the input automaton.
    Type: Application
    Filed: July 30, 2012
    Publication date: November 22, 2012
    Applicant: AT&T Corp.
    Inventors: Mehryar Mohri, Michael Dennis Riley
  • Publication number: 20120296644
    Abstract: A hybrid speech recognition system uses a client-side speech recognition engine and a server-side speech recognition engine to produce speech recognition results for the same speech. An arbitration engine produces speech recognition output based on one or both of the client-side and server-side speech recognition results.
    Type: Application
    Filed: August 1, 2012
    Publication date: November 22, 2012
    Inventor: Detlef Koll
  • Publication number: 20120296652
    Abstract: A method for obtaining information on an audio video program being presented on a consumer electronics (CE) device includes receiving at the CE device a viewer command to recognize the audio video program being presented on the CE device. The method also includes receiving signals from a microphone representative of audio from the audio video program as sensed by the microphone as the audio is played real time on the CE device. The method then includes executing voice recognition on the signals from the microphone to determine words in the audio from the audio video program as sensed by the microphone. Words are then uploaded to an Internet server, where they are correlated to at least one audio video script. The method then includes receiving back from the Internet server information correlated by the server using the words to the audio video program.
    Type: Application
    Filed: May 18, 2011
    Publication date: November 22, 2012
    Inventors: Seth Hill, Frederick J. Zustak
  • Publication number: 20120296638
    Abstract: In embodiments of the present invention, capabilities are described for understanding and responding to the user intent and questions quickly wherein the understanding is based on supervised system learning, Intelligent layered semantic and syntactic information processing and personalized adaptive semantic interface. Supervised system learning creates reference pattern set for the intent repository and possible question categories. Each layer in the layered processing increases the probability of the intent/question recognition. Personalized adaptive voice interface learns from user's interactions over time by enriching the pattern sets and personal index for successfully resolved user intents and questions. Collectively, all these technologies improve the response time for correctly recognizing and responding to user's intents and questions.
    Type: Application
    Filed: May 18, 2012
    Publication date: November 22, 2012
    Inventor: Ashish Patwa
  • Publication number: 20120296655
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing predictive pre-recording of audio for voice input. In one aspect, a method includes obtaining sensor data from one or more sensors of a mobile device while the mobile device is operating in an inactive state, determining that a user of the mobile device is interacting with the mobile device based on the sensor data, invoking voice input functionality of the mobile device in response to determining that the user of the mobile device is interacting with the mobile device, detecting a voice input, and activating the mobile device in response to detecting the voice input.
    Type: Application
    Filed: July 31, 2012
    Publication date: November 22, 2012
    Applicant: GOOGLE, INC.
    Inventors: Trausti Kristjansson, Matthew I. Lloyd
  • Publication number: 20120290302
    Abstract: A Chinese speech recognition system and method is disclosed. Firstly, a speech signal is received and recognized to output a word lattice. Next, the word lattice is received, and word arcs of the word lattice are rescored and reranked with a prosodic break model, a prosodic state model, a syllable prosodic-acoustic model, a syllable-juncture prosodic-acoustic model and a factored language model, so as to output a language tag, a prosodic tag and a phonetic segmentation tag, which correspond to the speech signal. The present invention performs rescoring in a two-stage way to promote the recognition rate of basic speech information and labels the language tag, prosodic tag and phonetic segmentation tag to provide the prosodic structure and language information for the rear-stage voice conversion and voice synthesis.
    Type: Application
    Filed: April 13, 2012
    Publication date: November 15, 2012
    Inventors: Jyh-Her YANG, Chen-Yu Chiang, Ming-Chieh Liu, Yih-Ru Wang, Yuan-Fu Liao, Sin-Horng Chen
  • Patent number: 8311233
    Abstract: A multi-channel audio system having multiple loudspeakers is used to obtain information on the location of one or more independent noise sources within an area covered by the loudspeakers. Within the multi-channel audio system, an audio output device has an input for coupling to and receiving audio signals from one or more audio sources; an audio processing module for generating a audio drive signals and providing them on respective outputs to a number of loudspeakers. A sensing module has inputs connected to respective outputs of the audio processing module, for receiving signals corresponding to sound sensed by the loudspeakers. The sensing module includes a discriminator for discriminating between signals corresponding to the audio drive signals and sensed signals from an independent noise source within range of the loudspeakers. A position computation module determines a two or three dimensional position of each independent noise source sensed, relative to the loudspeakers.
    Type: Grant
    Filed: November 30, 2005
    Date of Patent: November 13, 2012
    Assignee: Koninklijke Philips Electronics N.V.
    Inventor: John Kinghorn
  • Publication number: 20120284022
    Abstract: Speech detection is a technique to determine and classify periods of speech. In a normal conversation, each speaker speaks less than half the time. The remaining time is devoted to listening to the other end and pauses between speech and silence. Embodiments of the current invention provide systems and methods that may be implemented in a communication device. A system may include one or more sensors for detecting information corresponding to a user. The user is in a state of verbal communication. The system further includes one or more sensors for determining periods of speech and non-speech, in the verbal communication, based on the detected information and the audio signal captured by the microphones. The determined periods of speech and non-speech may be used in the coding, compression, noise reduction and other aspects of signal processing.
    Type: Application
    Filed: July 18, 2012
    Publication date: November 8, 2012
    Inventor: Alon Konchitsky
  • Publication number: 20120278076
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for disambiguating contact information are described. A method includes determining, for each of multiple communications that were initiated by a user of a mobile device, a time when the communication was initiated or received; determining, for each of multiple contacts associated with the user, a probability associated with the contact based at least on the times when the communications were initiated or received; weighting a contact disambiguation grammar according to the probabilities; and processing audio data using the contact disambiguation grammar to select a particular contact.
    Type: Application
    Filed: July 10, 2012
    Publication date: November 1, 2012
    Applicant: GOOGLE INC.
    Inventors: Matthew I. Lloyd, Willard Van Tuyl Rusch, II
  • Publication number: 20120278075
    Abstract: A system and method for collecting from an ASR, a first rating of an intelligibility of human speech, and collecting another intelligibility rating of such speech from networked listeners to such speech. The first rating and the second rating are weighed based on an importance to a user of the ratings, and a third rating is created from such weighted two ratings.
    Type: Application
    Filed: April 25, 2012
    Publication date: November 1, 2012
    Inventors: Sherrie Ellen Shammass, Eyal Eshed, Ariel Velikovsky
  • Publication number: 20120278078
    Abstract: Methods and systems for providing contextually relevant information to a user are provided. In particular, a user context is determined. The determination of the user context can be made from information stored on or entered in a user device. The determined user context is provided to an automatic speech recognition (ASR) engine as a watch list. A voice stream is monitored by the ASR engine. In response to the detection of a word on the watch list by the ASR engine, the context engine is notified. The context engine then modifies a display presented to the user, to provide a selectable item that the user can select to access relevant information.
    Type: Application
    Filed: April 26, 2011
    Publication date: November 1, 2012
    Applicant: AVAYA INC.
    Inventors: Christopher Ricci, Shane Ricci
  • Publication number: 20120271636
    Abstract: A voice input device includes: a mastery level identifying device identifying a mastery level of a user with respect to voice input; and an input mode setting device switching a voice input mode between a guided input mode and an unguided input mode. In the guided input mode, preliminary registered contents of the voice input are presented to the user. The input mode setting device sets the voice input mode to the unguided input mode at a starting time when the voice input device starts to receive the voice input. The input mode setting device switches the voice input mode from the unguided input mode to the guided input mode at a switching time. The input mode setting device sets a time interval between the starting time and the switching time in proportion to the mastery level.
    Type: Application
    Filed: April 16, 2012
    Publication date: October 25, 2012
    Applicant: DENSO CORPORATION
    Inventor: Yuki Fujisawa
  • Publication number: 20120271632
    Abstract: Speaker identification techniques are described. In one or more implementations, sample data is received at a computing device of one or more user utterances captured using a microphone. The sample data is processed by the computing device to identify a speaker of the one or more user utterances. The processing involving use of a feature set that includes features obtained using a filterbank having filters that space linearly at higher frequencies and logarithmically at lower frequencies, respectively, features that model the speaker's vocal tract transfer function, and features that indicate a vibration rate of vocal folds of the speaker of the sample data.
    Type: Application
    Filed: April 25, 2011
    Publication date: October 25, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Hoang T. Do, Ivan J. Tashev, Alejandro Acero, Jason S. Flaks, Robert N. Heitkamp, Molly R. Suver
  • Publication number: 20120262296
    Abstract: A speaker intent analysis system and method for validating the truthfulness and intent of a plurality of participants' responses to questions. A computer stores, retrieves, and transmits a series of questions to be answered audibly by participants. The participants' answers are received by a data processor. The data processor analyzes and records the participants' speech parameters for determining the likelihood of dishonesty. In addition to analyzing participants' speech parameters for distinguishing stress or other abnormality, the processor may be equipped with voice recognition software to screen responses that while not dishonest, are indicative of possible malfeasance on the part of the participants. Once the responses are analyzed, the processor produces an output that is indicative of the participant's credibility. The output may be sent to proper parties and/or devices such as a web page, computer, e-mail, PDA, pager, database, report, etc. for appropriate action.
    Type: Application
    Filed: June 12, 2012
    Publication date: October 18, 2012
    Inventor: DAVID BEZAR
  • Publication number: 20120265538
    Abstract: A device may include a display and logic. The logic may be configured to receive, from a user, a selection f a first control action associated with an application stored in the device, provide, via the display, a number of choices associated with the first control action, and receive, from the user, a word or a phrase to use as a voice command corresponding to the first control action, wherein the word or phrase is selected from the choices.
    Type: Application
    Filed: May 10, 2012
    Publication date: October 18, 2012
    Applicant: SONY MOBILE COMMUNICATIONS AB
    Inventors: Mats Gustafsson, Julian Charles Hope
  • Publication number: 20120265535
    Abstract: A personal voice operated reminder system. In one embodiment, the system is worn as a device on the body in a form similar to a watch, bracelet or necklace. In another embodiment the system is a device normally held in a person's pocket or purse, and in another embodiment the system is a method added as an application to already existing devices such as PDAs or cellular telephones. This device is configured to record reminders using speech recognition and to play back the reminder message in accordance with directions received using speech recognition, and/or position and/or motion inputs.
    Type: Application
    Filed: September 6, 2010
    Publication date: October 18, 2012
    Inventors: Donald Ray Bryant-Rich, Diana Eve Barshaw-Rich
  • Publication number: 20120264487
    Abstract: According to an aspect, a mobile electronic device, includes a display unit, a detection unit, and a display control unit. The display unit displays a standard screen on which objects are superimposed. The detection unit detects occurrence of a predetermined event. The display control unit changes arrangement of the objects displayed on the display unit in accordance with the predetermined event detected by the detection unit.
    Type: Application
    Filed: December 13, 2010
    Publication date: October 18, 2012
    Applicant: KYOCERA CORPORATION
    Inventors: Nayu Nomachi, Takayuki Sato
  • Publication number: 20120265530
    Abstract: A rule-based end-pointer isolates spoken utterances contained within an audio stream from background noise and non-speech transients. The rule-based end-pointer includes a plurality of rules to determine the beginning and/or end of a spoken utterance based on various speech characteristics. The rules may analyze an audio stream or a portion of an audio stream based upon an event, a combination of events, the duration of an event, or a duration relative to an event. The rules may be manually or dynamically customized depending upon factors that may include characteristics of the audio stream itself, an expected response contained within the audio stream, or environmental conditions.
    Type: Application
    Filed: April 25, 2012
    Publication date: October 18, 2012
    Inventors: Phil Hetherington, Alex Escott
  • Publication number: 20120259640
    Abstract: A voice control unit controlling and outputting a first voice signal includes an analysis unit configured to calculate an average value of a gradient of spectrum at a high frequency of an inputted second voice signal as a voice characteristic, a determination unit configured to determine an amplification band and an amplification amount of a spectrum of the first voice signal based on the gradient, and an amplification unit configured to amplify the spectrum of the first voice signal to realize the determined amplification band and the determined amplification amount.
    Type: Application
    Filed: June 20, 2012
    Publication date: October 11, 2012
    Applicant: FUJITSU LIMITED
    Inventors: Taro TOGAWA, Takeshi OTANI, Masanao SUZUKI, Yasuji OTA
  • Publication number: 20120259630
    Abstract: The voice conversion method of a display apparatus includes: in response to the receipt of a first video frame, detecting one or more entities from the first video frame; in response to the selection of one of the detected entities, storing the selected entity; in response to the selection of one of a plurality of previously-stored voice samples, storing the selected voice sample in connection with the selected entity; and in response to the receipt of a second video frame including the selected entity, changing a voice of the selected entity based on the selected voice sample and outputting the changed voice.
    Type: Application
    Filed: April 11, 2012
    Publication date: October 11, 2012
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Aditi GARG, Kasthuri Jayachand YADLAPALLI
  • Publication number: 20120259637
    Abstract: An electronic apparatus and method for retrieving a song, and a storage medium. The electronic apparatus includes: a storage unit which stores a plurality of songs; a user input unit which receives a hummed query which is inputted for retrieving a song; and a song retrieving unit which retrieves a song based on the hummed query from among the plurality of stored songs when the hummed query is received. The song retrieving unit extracts a pitch and a duration of the hummed query, converts each of the extracted pitch and duration into multi-level symbols, calculates a string edit distance between the hummed query and one of the plurality of songs based on the symbols, and determines a similarity between the hummed query and a song based on edit operations which are performed within the calculated string edit distance.
    Type: Application
    Filed: April 11, 2012
    Publication date: October 11, 2012
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: A. Srinivas, P. Krishnamoorthy, Rajen Bhatt, Sarvesh Kumar
  • Publication number: 20120259641
    Abstract: Methods and apparatus for initiating an action using a voice-controlled human interface. The interface provides a hands free, voice driven environment to control processes and applications. According to one embodiment, a method comprises electronically receiving first user input, parsing the first user input to determine whether the first user input contains a command activation statement that cues a voice-controlled human interface to enter a command mode in which a second user input comprising a voice signal is processed to identify at least one executable command and, in response to determining that the first user input comprises the command activation statement, identifying the at least one executable command in the second user input.
    Type: Application
    Filed: April 11, 2012
    Publication date: October 11, 2012
    Applicant: Nuance Communications, Inc.
    Inventors: Richard Grant, Pedro E. McGregor
  • Publication number: 20120259639
    Abstract: A television, or other device with television tuner, can be controlled to directly tune to a specific channel name, such as a broadcaster's station name, by using EPG metadata to provide a correlation between a channel number and channel name.
    Type: Application
    Filed: June 8, 2011
    Publication date: October 11, 2012
    Inventors: Sabrina Tai-Chen Yeh, David Young, Steven Friedlander
  • Publication number: 20120259627
    Abstract: A method for speech recognition is described that uses an initial recognizer to perform an initial speech recognition pass on an input speech utterance to determine an initial recognition result corresponding to the input speech utterance, and a reliability measure reflecting a per word reliability of the initial recognition result. For portions of the initial recognition result where the reliability of the result is low, a re-evaluation recognizer is used to perform a re-evaluation recognition pass on the corresponding portions of the input speech utterance to determine a re-evaluation recognition result corresponding to the re-evaluated portions of the input speech utterance. The initial recognizer and the re-evaluation recognizer are complementary so as to make different recognition errors. A final recognition result is determined based on the re-evaluation recognition result if any, and otherwise based on the initial recognition result.
    Type: Application
    Filed: May 27, 2010
    Publication date: October 11, 2012
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Daniel Willett, Venkatesh Nagesha
  • Publication number: 20120253800
    Abstract: The system provides a speech recognition program, an update website for updating a speech recognition program, and a way of storing data. A user may utilize an update website, to add, modify, and delete items that may comprise speech commands, dll's, multimedia files, executable code, and other information. Speech recognition program may communicate with update website to request information about possible updates. Update website may send a response consisting of information to speech recognition program. Speech recognition program may utilize received information to decide what items to download. A speech recognition program may send one or more requests to update website to download items. Update website may respond by transmitting, requested items to a speech recognition program that overwrite existing items with newly received items.
    Type: Application
    Filed: September 23, 2011
    Publication date: October 4, 2012
    Inventors: Michael D. Goller, Stuart E. Goller
  • Publication number: 20120253817
    Abstract: A system and method for connecting to a telephone extension listed in a telephone number database is disclosed. The method comprises recording an audio token on a mobile communication device. The audio token is associated with a telephone number included in the database. The audio token is transmitted from the mobile communication device to a server over a digital channel. The telephone number in the database that is associated with the audio token is selected using speech recognition. The mobile communication device is then connected with the telephone number.
    Type: Application
    Filed: April 4, 2011
    Publication date: October 4, 2012
    Inventor: Trung (Tim) Trinh
  • Publication number: 20120253799
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating domain-specific speech recognition models for a domain of interest by combining and tuning existing speech recognition models when a speech recognizer does not have access to a speech recognition model for that domain of interest and when available domain-specific data is below a minimum desired threshold to create a new domain-specific speech recognition model. A system configured to practice the method identifies a speech recognition domain and combines a set of speech recognition models, each speech recognition model of the set of speech recognition models being from a respective speech recognition domain. The system receives an amount of data specific to the speech recognition domain, wherein the amount of data is less than a minimum threshold to create a new domain-specific model, and tunes the combined speech recognition model for the speech recognition domain based on the data.
    Type: Application
    Filed: March 28, 2011
    Publication date: October 4, 2012
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Srinivas BANGALORE, Robert Bell, Diamantino Antonio Caseiro, Mazin Gilbert, Patrick Haffner
  • Publication number: 20120253807
    Abstract: A speaker state detecting apparatus comprises: an audio input unit for acquiring, at least, a first voice emanated by a first speaker and a second voice emanated by a second speaker; a speech interval detecting unit for detecting an overlap period between a first speech period of the first speaker included in the first voice and a second speech period of the second speaker included in the second voice, which starts before the first speech period, or an interval between the first speech period and the second speech period; a state information extracting unit for extracting state information representing a state of the first speaker from the first speech period; and a state detecting unit for detecting the state of the first speaker in the first speech period based on the overlap period or the interval and the first state information.
    Type: Application
    Filed: February 3, 2012
    Publication date: October 4, 2012
    Applicant: FUJITSU LIMITED
    Inventor: Akira KAMANO
  • Publication number: 20120253810
    Abstract: Authenticating a purported user attempting to access a secure resource includes enrolling a user's voice sample by requiring the user to orally speak preselected enrollment utterances, generating prompts and respective predetermined correct responses where each question has only one correct response, presenting a prompt to the user in real time, and analyzing the user's real time live response to determine if the live response matches the predetermined correct response and if voice characteristics of the user's live voice sample match characteristics of the enrolled voice sample.
    Type: Application
    Filed: March 29, 2012
    Publication date: October 4, 2012
    Inventors: Timothy S. Sutton, Stephen T. Dispensa
  • Publication number: 20120253805
    Abstract: Systems, methods, and media for determining fraud risk from audio signals and non-audio data are provided herein. Some exemplary methods include receiving an audio signal and an associated audio signal identifier, receiving a fraud event identifier associated with a fraud event, determining a speaker model based on the received audio signal, determining a channel model based on a path of the received audio signal, using a server system, updating a fraudster channel database to include the determined channel model based on a comparison of the audio signal identifier and the fraud event identified, and updating a fraudster voice database to include the determined speaker model based on a comparison of the audio signal identifier and the fraud event identifier.
    Type: Application
    Filed: March 8, 2012
    Publication date: October 4, 2012
    Inventors: Anthony Rajakumar, Torsten Zeppenfeld, Lisa Guerra, Vipul Vyas
  • Publication number: 20120253806
    Abstract: A system and method for distributed speech recognition is provided. Audio data is obtained from a caller participating in a call with an agent. A main recognizer receives a main grammar template and the audio data. A plurality of secondary recognizers each receive the audio data and a reference that identifies a secondary grammar, which is a non-overlapping section of the main grammar template. Speech recognition is performed on each of the secondary recognizers and speech recognition results are identified by applying the secondary grammar to the audio data. An n number of most likely speech recognition results are selected. The main recognizer constructs a new grammar based on the main grammar template using the speech recognition results from each of the secondary recognizers as a new vocabulary. Further speech recognition results are identified by applying the new grammar to the audio data.
    Type: Application
    Filed: June 18, 2012
    Publication date: October 4, 2012
    Inventor: Gilad Odinak
  • Publication number: 20120253809
    Abstract: A voice verification module 308, for example for an interactive voice response system, is disclosed. The voice verification module 308 is configured to select from a store 310 of verification words one or more verification words responsive to a request for a verification phrase and form a verification phrase including said one or more verification words distributed throughout said verification phrase.
    Type: Application
    Filed: March 26, 2012
    Publication date: October 4, 2012
    Applicant: BIOMETRIC SECURITY LTD
    Inventors: Trevor Thomas, Nicholas Wise, David Cowell
  • Publication number: 20120245932
    Abstract: According to one embodiment, a voice recognition apparatus includes a determination unit, an estimating unit, and a voice recognition unit. The determination unit determines whether a component with a frequency of not less than 1000 Hz and with a level not lower than a predetermined level is included in a sound input from a plurality of microphones. The estimating unit estimates a sound source direction of the sound when the determination unit determines that the component is included in the sound. The voice recognition unit recognizes whether the sound obtained in the sound source direction coincides with a voice model registered beforehand.
    Type: Application
    Filed: March 26, 2012
    Publication date: September 27, 2012
    Inventors: Kazushige OUCHI, Toshiyuki Koga, Daisuke Yamamoto, Miwako Doi
  • Publication number: 20120239393
    Abstract: A multiple audio/video data stream simulation method and system. A computing system receives first audio and/or video data streams. The first audio and/or video data streams include data associated with a first person and a second person. The computing system monitors the first audio and/or video data streams. The computing system identifies emotional attributes comprised by the first audio and/or video data streams. The computing system generates second audio and/or video data streams associated with the first audio and/or video data streams. The second audio and/or video data streams include the first audio and/or video data streams data without the emotional attributes. The computing system stores the second audio and/or video data streams.
    Type: Application
    Filed: May 31, 2012
    Publication date: September 20, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sara H. Basson, Dimitri Kanevsky, Edward Emile Kelley, Bhuvana Ramabhadran
  • Publication number: 20120239383
    Abstract: A system and method are disclosed that improve automatic speech recognition in a spoken dialog system. The method comprises partitioning speech recognizer output into self-contained clauses, identifying a dialog act in each of the self-contained clauses, qualifying dialog acts by identifying a current domain object and/or a current domain action, and determining whether further qualification is possible for the current domain object and/or current domain action. If further qualification is possible, then the method comprises identifying another domain action and/or another domain object associated with the current domain object and/or current domain action, reassigning the another domain action and/or another domain object as the current domain action and/or current domain object and then recursively qualifying the new current domain action and/or current object. This process continues until nothing is left to qualify.
    Type: Application
    Filed: May 25, 2012
    Publication date: September 20, 2012
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Srinivas Bangalore, Narendra K. Gupta, Mazin G. Rahim
  • Publication number: 20120232890
    Abstract: According to one embodiment, an apparatus for discriminating speech/non-speech of a first acoustic signal includes a weight assignment unit, a feature extraction unit, and a speech/non-speech discrimination unit. The first acoustic signal includes a user's speech and a reproduced sound. The reproduced sound is a system sound having a plurality of channels reproduced from a plurality of speakers. The weight assignment unit is configured to assign a weight to each frequency band based on the system sound. The feature extraction unit is configured to extract a feature from a second acoustic signal based on the weight of each frequency band. The second acoustic signal is the first acoustic signal in which the reproduced sound is suppressed. The speech/non-speech discrimination unit is configured to discriminate speech/non-speech of the first acoustic signal based on the feature.
    Type: Application
    Filed: September 14, 2011
    Publication date: September 13, 2012
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Kaoru Suzuki, Masaru Sakai, Yusuke Kida
  • Publication number: 20120230526
    Abstract: Method and apparatus for microphone matching for wearable directional hearing assistance devices are provided. An embodiment includes a method for matching at least a first microphone to a second microphone, using a user's voice from the user's mouth. The user's voice is processed as received by at least one microphone to determine a frequency profile associated with voice of the user. Intervals are detected where the user is speaking using the frequency profile. Variations in microphone reception between the first microphone and the second microphone are adaptively canceled during the intervals and when the first microphone and second microphone are in relatively constant spatial position with respect to the user's mouth.
    Type: Application
    Filed: October 3, 2011
    Publication date: September 13, 2012
    Applicant: Starkey Laboratories, Inc.
    Inventor: Tao Zhang
  • Publication number: 20120232891
    Abstract: This invention realizes a speech communication system and method, and a robot apparatus capable of significantly improving entertainment property. A speech communication system with a function to make conversation with a conversation partner is provided with a speech recognition means for recognizing speech of the conversation partner, a conversation control means for controlling conversation with the conversation partner based on the recognition result of the speech recognition means, an image recognition means for recognizing the face of the conversation partner, and a tracking control means for tracing the existence of the conversation partner based on one or both of the recognition result of the image recognition means and the recognition result of the speech recognition means. The conversation control means controls conversation so as to continue depending on tracking of the tracking control means.
    Type: Application
    Filed: May 16, 2012
    Publication date: September 13, 2012
    Applicant: SONY CORPORATION
    Inventors: Kazumi Aoyama, Hideki Shimomura
  • Publication number: 20120232904
    Abstract: A method and apparatus for correcting a named entity word in a speech input text. The method includes recognizing a speech input signal from a user, obtaining a recognition result including named entity vocabulary mark-up information, determining a named entity word recognized incorrectly in the recognition result according to the named entity vocabulary mark-up information, displaying the named entity word recognized incorrectly, and correcting the named entity word recognized incorrectly.
    Type: Application
    Filed: March 12, 2012
    Publication date: September 13, 2012
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Xuan ZHU, Hua Zhang, Tengrong Su, Ki-Wan Eom, Jae-Won Lee
  • Publication number: 20120232897
    Abstract: A user can locate products by dialing a number from any phone and accessing an automatic voice recognition system. Reply is made to the user with information locating the product using a store's product location data converted to automatic voice responses. Smart phone and mobile web access to a product database is enabled using voice-to-text and text search. A taxonomy enables product search requests by product descriptions and/or product brand names, and enable synonyms and phonetic enhancements to the system. Search results are related to products and product categories with concise organization. Relevant advertisements, promotional offers and coupons are delivered based upon search and taxonomy elements. Search requests generate dynamic interior maps of a products location inside the shoppers' location, assisting a shopper to efficiently shop the location for listed items. Business intelligence of product categories enable rapid scaling across retail segments.
    Type: Application
    Filed: May 1, 2012
    Publication date: September 13, 2012
    Inventors: Nathan Pettyjohn, Matthew Kulig, Niarcas Jeffrey, Edward Saunders
  • Publication number: 20120232892
    Abstract: A method, system and machine-readable medium are provided. Speech input is received at a speech recognition component and recognized output is produced. A common dialog cue from the received speech input or input from a second source is recognized. An action is performed corresponding to the recognized common dialog cue. The performed action includes sending a communication from the speech recognition component to the speech generation component while bypassing a dialog component.
    Type: Application
    Filed: May 21, 2012
    Publication date: September 13, 2012
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Vincent J. Goffin, Sarangarajan Parthasarathy
  • Publication number: 20120232899
    Abstract: A system and method for identification of a speaker by phonograms of oral speech is disclosed. Similarity between a first phonogram of the speaker and a second, or sample, phonogram is evaluated by matching formant frequencies in referential utterances of a speech signal, where the utterances for comparison are selected from the first phonogram and the second phonogram. Referential utterances of speech signals are selected from the first phonogram and the second phonogram, where the referential utterances include formant paths of at least three formant frequencies. The selected referential utterances including at least two identical formant frequencies are compared therebetween. Similarity of the compared referential utterances from matching other formant frequencies is evaluated, where similarity of the phonograms is determined from evaluation of similarity of all the compared referential utterances.
    Type: Application
    Filed: March 23, 2012
    Publication date: September 13, 2012
    Applicant: Obschestvo s orgranichennoi otvetstvennost'yu "Centr Rechevyh Technologij"
    Inventor: Sergey Lvovich Koval
  • Publication number: 20120232896
    Abstract: A voice activity detection apparatus (1) comprising: a signal condition analyzing unit (3) which analyses at least one signal parameter of an input signal to detect a signal condition SC of said input signal; at least two voice activity detection units (4-i) comprising different voice detection characteristics, wherein each voice activity detection unit (4-i) performs separately a voice activity detection of said input signal to provide a voice activity detection decision VADD; and a decision combination unit (5) which combines the voice activity detection decisions VADDs provided by said voice activity detection units (4-i) depending on the detected signal condition SC to provide a combined voice activity detection decision cVADD.
    Type: Application
    Filed: May 21, 2012
    Publication date: September 13, 2012
    Applicant: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Anisse TALEB, Zhe WANG, Jianfeng XU, Lei MIAO
  • Publication number: 20120232901
    Abstract: A language identification system that includes a universal phoneme decoder (UPD) is described. The UPD contains a universal phoneme set representing both 1) all phonemes occurring in the set of two or more spoken languages, and 2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio files in the set of two or more potential languages in which the UPD was trained on. Each statistical language model (SLM) uses the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of languages. The run-time language identifier module identifies a particular human language being spoken by utilizing the linguistic probabilities supplied by the SLMs that are based on the set of unique phoneme patterns created for each language.
    Type: Application
    Filed: May 24, 2012
    Publication date: September 13, 2012
    Applicant: Autonomy Corporation Ltd.
    Inventors: Mahapathy Kadirkamanathan, Christopher John Waple
  • Publication number: 20120232895
    Abstract: According to one embodiment, an apparatus for discriminating speech/non-speech of a first acoustic signal includes a weight assignment unit, a feature extraction unit, and a speech/non-speech discrimination unit. The weight assignment unit is configured to assign a weight to each frequency band, based on a frequency spectrum of the first acoustic signal including a user's speech and a frequency spectrum of a second acoustic signal including a disturbance sound. The feature extraction unit is configured to extract a feature from the frequency spectrum of the first acoustic signal, based on the weight of each frequency band. The speech/non-speech discrimination unit is configured to discriminate speech/non-speech of the first acoustic signal, based on the feature.
    Type: Application
    Filed: September 14, 2011
    Publication date: September 13, 2012
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Kaoru Suzuki, Masaru Sakai, Yusuke Kida