Voice Recognition Patents (Class 704/246)
  • Patent number: 8775454
    Abstract: A system and method for collecting data may include a data collection device to obtain the data from a user, an apparatus for obtaining metadata for each word of the data from the user, an apparatus for obtaining a searchable transcript of the data and a device to store the searchable transcript. The metadata may be date data, time data, name data or location data and the data collection device may include a speech recognition engine to translate speech into searchable words. The speech recognition engine may provide a confidence level corresponding to the translation of the speech into searchable words.
    Type: Grant
    Filed: July 29, 2008
    Date of Patent: July 8, 2014
    Inventor: James L. Geer
  • Patent number: 8775179
    Abstract: The illustrative embodiments described herein provide systems and methods for authenticating a speaker. In one embodiment, a method includes receiving reference speech input including a reference passphrase to form a reference recording, and receiving test speech input including a test passphrase to form a test recording. The method includes determining whether the test passphrase matches the reference passphrase, and determining whether one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase. The method authenticates the speaker of the test speech input in response to determining that the reference passphrase matches the test passphrase and that one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase.
    Type: Grant
    Filed: May 6, 2010
    Date of Patent: July 8, 2014
    Assignee: Senam Consulting, Inc.
    Inventor: Serge Olegovich Seyfetdinov
  • Patent number: 8775181
    Abstract: Interpretation from a first language to a second language via one or more communication devices is performed through a communication network (e.g. phone network or the internet) using a server for performing recognition and interpretation tasks, comprising the steps of: receiving an input speech utterance in a first language on a first mobile communication device; conditioning said input speech utterance; first transmitting said conditioned input speech utterance to a server; recognizing said first transmitted speech utterance to generate one or more recognition results; interpreting said recognition results to generate one or more interpretation results in an interlingua; mapping the interlingua to a second language in a first selected format; second transmitting said interpretation results in the first selected format to a second mobile communication device; and presenting said interpretation results in a second selected format on said second communication device.
    Type: Grant
    Filed: July 2, 2013
    Date of Patent: July 8, 2014
    Assignee: Fluential, LLC
    Inventors: Farzad Ehsani, Demitrios Master, Elaine Drom Zuber
  • Patent number: 8775189
    Abstract: A wireless communication device is disclosed that accepts recorded audio data from an end-user. The audio data can be in the form of a command requesting user action. Likewise, the audio data can be converted into a text file. The audio data is reduced to a digital file in a format that is supported by the device hardware, such as a .wav, .mp3, .vnf file, or the like. The digital file is sent via secured or unsecured wireless communication to one or more server computers for further processing. In accordance with an important aspect of the invention, the system evaluates the confidence level of the of the speech recognition process. If the confidence level is high, the system automatically builds the application command or creates the text file for transmission to the communication device.
    Type: Grant
    Filed: August 9, 2006
    Date of Patent: July 8, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Stephen S. Burns, Mickey W. Kowitz
  • Patent number: 8775166
    Abstract: An encoding method includes: extracting core layer characteristic parameters and enhancement layer characteristic parameters of a background noise signal, encoding the core layer characteristic parameters and enhancement layer characteristic parameters to obtain a core layer codestream and an enhancement layer codestream. The disclosure also provides an encoding device, a decoding device and method, an encapsulating method, a reconstructing method, an encoding-decoding system and an encoding-decoding method. By describing the background noise signal with the enhancement layer characteristic parameters, the background noise signal can be processed by using more accurate encoding and decoding method, so as to improve the quality of encoding and decoding the background noise signal.
    Type: Grant
    Filed: August 14, 2009
    Date of Patent: July 8, 2014
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Hualin Wan, Libin Zhang
  • Patent number: 8775180
    Abstract: Apparatus and methods are provided for using automatic speech recognition to analyze a voice interaction and verify compliance of an agent reading a script to a client during the voice interaction. In one aspect of the invention, a communications system includes a user interface, a communications network, and a call center having an automatic speech recognition component. In other aspects of the invention, a script compliance method includes the steps of conducting a voice interaction between an agent and a client and evaluating the voice interaction with an automatic speech recognition component adapted to analyze the voice interaction and determine whether the agent has adequately followed the script. In yet still further aspects of the invention, the duration of a given interaction can be analyzed, either apart from or in combination with the script compliance analysis above, to seek to identify instances of agent non-compliance, of fraud, or of quality-analysis issues.
    Type: Grant
    Filed: November 26, 2012
    Date of Patent: July 8, 2014
    Assignee: West Corporation
    Inventors: Mark J. Pettay, Fonda J. Narke
  • Patent number: 8775178
    Abstract: Updating a voice template for recognizing a speaker on the basis of a voice uttered by the speaker is disclosed. Stored voice templates indicate distinctive characteristics of utterances from speakers. Distinctive characteristics are extracted for a specific speaker based on a voice message utterance received from that speaker. The distinctive characteristics are compared to the characteristics indicated by the stored voice templates to selected a template that matches within a predetermined threshold. The selected template is updated on the basis of the extracted characteristics.
    Type: Grant
    Filed: October 27, 2009
    Date of Patent: July 8, 2014
    Assignee: International Business Machines Corporation
    Inventors: Yukari Miki, Masami Noguchi
  • Patent number: 8774392
    Abstract: A system and method for processing calls in a call center are described. A call session from a caller via a session manager and including incoming text messages of a verbal speech stream is assigned. The incoming text messages are progressively visually presented throughout the call session to a live agent on an agent console operatively coupled to the session manager. The incoming text messages are progressively processed through a customer support scenario interactively monitored and controlled by the live agent via the agent console. The incoming text messages are processed through automated script execution in concert with the live agent. Outgoing text messages are converted into a synthesized speech stream. The synthesized speech stream is sent via the agent console to the caller.
    Type: Grant
    Filed: June 17, 2013
    Date of Patent: July 8, 2014
    Assignee: Intellisist, Inc.
    Inventors: Gilad Odinak, Alastair Sutherland, William A. Tolhurst
  • Publication number: 20140188468
    Abstract: An apparatus, system and method for calculating passphrase variability are disclosed. The passphrase variability value can then be used for generating phonetically rich passwords in text-dependent speaker recognition systems, or for estimating the variability of the input passphrase in text-independent system during the enrolling process in a speech recognition security system.
    Type: Application
    Filed: December 28, 2012
    Publication date: July 3, 2014
    Inventors: Dmitry Dyrmovskiy, Mikhail Khitrov
  • Publication number: 20140188471
    Abstract: This is directed to processing voice inputs received by an electronic device. In particular, this is directed to receiving a voice input and identifying the user providing the voice input. The voice input can be processed using a subset of words from a library used to identify the words or phrases of the voice input. The particular subset can be selected such that voice inputs provided by the user are more likely to include words from the subset. The subset of the library can be selected using any suitable approach, including for example based on the user's interests and words that relate to those interests. For example, the subset can include one or more words related to media items selected by the user for storage on the electronic device, names of the user's contacts, applications or processes used by the user, or any other words relating to the user's interactions with the device.
    Type: Application
    Filed: March 4, 2014
    Publication date: July 3, 2014
    Applicant: Apple Inc.
    Inventor: Allen P. HAUGHAY
  • Patent number: 8768711
    Abstract: A method of voice-enabling an application for command and control and content navigation can include the application dynamically generating a markup language fragment specifying a command and control and content navigation grammar for the application, instantiating an interpreter from a voice library, and providing the markup language fragment to the interpreter. The method also can include the interpreter processing a speech input using the command and control and content navigation grammar specified by the markup language fragment and providing an event to the application indicating an instruction representative of the speech input.
    Type: Grant
    Filed: June 17, 2004
    Date of Patent: July 1, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, Jr., Brien H. Muschett
  • Publication number: 20140180689
    Abstract: Disclosed are an apparatus for recognizing voice using multiple acoustic models according to the present invention and a method thereof. An apparatus for recognizing voice using multiple acoustic models includes a voice data database (DB) configured to store voice data collected in various noise environments; a model generating means configured to perform classification for each speaker and environment based on the collected voice data, and to generate an acoustic model of a binary tree structure as the classification result; and a voice recognizing means configured to extract feature data of voice data when the voice data is received from a user, to select multiple models from the generated acoustic model based on the extracted feature data, to parallel recognize the voice data based on the selected multiple models, and to output a word string corresponding to the voice data as the recognition result.
    Type: Application
    Filed: March 18, 2013
    Publication date: June 26, 2014
    Applicant: Electronics and Telecommunications Research Institute
    Inventor: Electronics and Telecommunications Research Institute
  • Patent number: 8762138
    Abstract: The present invention relates to a method as well as to a computing device (20) for editing a noise-database (13) containing noise information, said noise information being derived from noise signals within an audio stream (19). In order to enhance possibilities to create and utilize context information which emerge from tracking noise signals from an audio stream, for example a telephone call, the above method is characterized by the following steps: A) in a localizing step (14), determining geographical data of the location the noise signals origin from; B) in an analyzing step (15), analyzing the noise signals with reference to the noise content; C) in a linking step, linking the analyzed noise signals to said geographical data to create noise information; D) in a storing step, storing said noise information within said noise-database (13).
    Type: Grant
    Filed: August 30, 2010
    Date of Patent: June 24, 2014
    Assignee: Vodafone Holding GmbH
    Inventors: Stefan Holtel, Jad Noueihed
  • Patent number: 8762156
    Abstract: A speech control system that can recognize a spoken command and associated words (such as “call mom at home”) and can cause a selected application (such as a telephone dialer) to execute the command to cause a data processing system, such as a smartphone, to perform an operation based on the command (such as look up mom's phone number at home and dial it to establish a telephone call). The speech control system can use a set of interpreters to repair recognized text from a speech recognition system, and results from the set can be merged into a final repaired transcription which is provided to the selected application.
    Type: Grant
    Filed: September 28, 2011
    Date of Patent: June 24, 2014
    Assignee: Apple Inc.
    Inventor: Lik Harry Chen
  • Patent number: 8762147
    Abstract: A signal portion is extracted from an input signal for each frame having a specific duration to generate a per-frame input signal. The per-frame input signal in a time domain is converted into a per-frame input signal in a frequency domain, thereby generating a spectral pattern. Subband average energy is derived in each of subbands adjacent one another in the spectral pattern. The subband average energy is compared in at least one subband pair of a first subband and a second subband that is a higher frequency band than the first subband, the first and second subbands being consecutive subbands in the spectral pattern. It is determined that the per-frame input signal includes a consonant segment if the subband average energy of the second subband is higher than the subband average energy of the first subband.
    Type: Grant
    Filed: February 1, 2012
    Date of Patent: June 24, 2014
    Assignee: JVC KENWOOD Corporation
    Inventors: Akiko Akechi, Takaaki Yamabe
  • Patent number: 8762149
    Abstract: The present invention refers to a method for verifying the identity of a speaker based on the speakers voice comprising the steps of: a) receiving a voice utterance; b) using biometric voice data to verify (10) that the speakers voice corresponds to the speaker the identity of which is to be verified based on the received voice utterance; and c) verifying (12, 13) that the received voice utterance is not falsified, preferably after having verified the speakers voice; d) accepting (16) the speakers identity to be verified in case that both verification steps give a positive result and not accepting (15) the speakers identity to be verified if any of the verification steps give a negative result. The invention further refers to a corresponding computer readable medium and a computer.
    Type: Grant
    Filed: December 10, 2008
    Date of Patent: June 24, 2014
    Inventors: Marta Sánchez Asenjo, Alfredo Gutiérrez Navarro, Alberto Martín de los Santos de las Heras, Marta García Gomar
  • Patent number: 8756057
    Abstract: A speech analysis system and method for analyzing speech. The system includes: a voice recognition system for converting inputted speech to text; an analytics system for generating feedback information by analyzing the inputted speech and text; and a feedback system for outputting the feedback information.
    Type: Grant
    Filed: November 2, 2005
    Date of Patent: June 17, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Steven Michael Miller, Anne R. Sand
  • Publication number: 20140163984
    Abstract: A method of voice recognition and an electronic apparatus are described with the method of voice recognition being applied in an electronic apparatus. The method includes taking i=1 and detecting corresponding i-th voice sub-information at a moment Ti when the electronic apparatus detects that a user starts to talk at a moment T0, wherein the i-th voice sub-information is corresponding voice information from the moment T0 to the moment Ti, the i-th voice sub-information is partial voice information of voice information with integral semantic corresponding to a moment Tj after the moment T0 to the moment Ti, and i is an integer greater than or equal to 1; and analyzing the i-th voice sub-information to obtain M results of analysis, M being an integer greater than or equal to 1.
    Type: Application
    Filed: December 10, 2013
    Publication date: June 12, 2014
    Applicants: Lenovo (Beijing) Co., Ltd., Beijing Lenovo Software Ltd.
    Inventors: Haisheng Dai, Qianying Wang, Hao Wang
  • Publication number: 20140163985
    Abstract: A first gender-specific speaker adaptation technique may be selected based on characteristics of a first set of feature vectors that correspond to a first unit of input speech. The first set of feature vectors may be configured for use in automatic speech recognition (ASR) of the first unit of input speech. A second set of feature vectors, which correspond to a second unit of input speech, may be modified based on the first gender-specific speaker adaptation technique. The modified second set of feature vectors may be configured for use in ASR of the second unit of input speech. A first speaker-dependent speaker adaptation technique may be selected based on characteristics of the second set of feature vectors. A third set of feature vectors, which correspond to a third unit of speech, may be modified based on the first speaker-dependent speaker adaptation technique.
    Type: Application
    Filed: February 17, 2014
    Publication date: June 12, 2014
    Applicant: Google Inc.
    Inventors: Petar Aleksic, Xin Lei
  • Patent number: 8751241
    Abstract: The current invention provides a method and system for enabling a device function of a vehicle. A speech input stream is received at a telematics unit. A speech input context is determined for the received speech input stream. The received speech input stream is processed based on the determination and the device function of the vehicle is enabled responsive to the processed speech input stream. A vehicle device in control of the enabled device function of the vehicle is directed based on the processed speech input stream. A computer usable medium with suitable computer program code is employed for enabling a device function of a vehicle.
    Type: Grant
    Filed: April 10, 2008
    Date of Patent: June 10, 2014
    Assignee: General Motors LLC
    Inventors: Christopher L. Oesterling, William E. Mazzara, Jr., Jeffrey M. Stefan
  • Patent number: 8751145
    Abstract: A voice recognition method that is used for finding a street uses a database including information about a plurality of streets. The streets are characterized by respective street names and street types. A user provides a voice input for the street that the user tries to find. The voice input includes a street name and a street type. The street type is recognized by processing the voice input. Streets having the recognized street type are then selected from the database and a street name of at least one of the streets selected from the database is recognized by processing the voice input.
    Type: Grant
    Filed: November 30, 2005
    Date of Patent: June 10, 2014
    Assignees: Volkswagen of America, Inc., Audi AG
    Inventors: Ramon Eduardo Prieto, Carsten Bergmann, William B. Lathrop, M. Kashif Imam, Gerd Gruchalski, Markus Möhrle
  • Patent number: 8751240
    Abstract: A combination and a method are provided. Automatic speech recognition is performed on a received utterance. A meaning of the utterance is determined based, at least in part, on the recognized speech. At least one query is formed based, at least in part, on the determined meaning of the utterance. The at least one query is sent to at least one searching mechanism to search for an address of at least one web page that satisfies the at least one query.
    Type: Grant
    Filed: May 13, 2005
    Date of Patent: June 10, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Steven Hart Lewis, Kenneth H. Rosen
  • Patent number: 8750489
    Abstract: A system and method for automatic call segmentation including steps and means for automatically detecting boundaries between utterances in the call transcripts; automatically classifying utterances into target call sections; automatically partitioning the call transcript into call segments; and outputting a segmented call transcript. A training method and apparatus for training the system to perform automatic call segmentation includes steps and means for providing at least one training transcript with annotated call sections; normalizing the at least one training transcript; and performing statistical analysis on the at least one training transcript.
    Type: Grant
    Filed: October 23, 2008
    Date of Patent: June 10, 2014
    Assignee: International Business Machines Corporation
    Inventor: Youngja Park
  • Patent number: 8751233
    Abstract: A speaker-verification digital signature system is disclosed that provides greater confidence in communications having digital signatures because a signing party may be prompted to speak a text-phrase that may be different for each digital signature, thus making it difficult for anyone other than the legitimate signing party to provide a valid signature.
    Type: Grant
    Filed: July 31, 2012
    Date of Patent: June 10, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Pradeep K. Bansal, Lee Begeja, Carroll W. Creswell, Jeffrey Farah, Benjamin J. Stern, Jay Wilpon
  • Publication number: 20140156276
    Abstract: A dialogue system which correctly identifies an utterance directed to a dialogue system by using various pieces of information including information other than a voice recognition result without requiring a special signal is provided. A dialogue system includes an utterance detection/voice recognition unit that detects an utterance and recognizes a voice and an utterance feature extraction unit that extracts features of an utterance. The utterance feature extraction unit determines whether or not a target utterance is directed to the dialogue system based on features including a length of the target utterance, time relation between the target utterance and a previous utterance, and a system state.
    Type: Application
    Filed: May 23, 2013
    Publication date: June 5, 2014
    Applicant: Honda Motor Co., Ltd.
    Inventors: Mikio NAKANO, Kauznori KOMATANI, Akira HIRANO
  • Patent number: 8744850
    Abstract: Challenge items for an audible based electronic challenge system are generated using a variety of techniques to identify optimal candidates. The challenge items are intended for use in a computing system that discriminates between humans and text to speech (TTS) system.
    Type: Grant
    Filed: January 14, 2013
    Date of Patent: June 3, 2014
    Assignee: John Nicholas and Kristin Gross
    Inventor: John Nicholas Gross
  • Patent number: 8738376
    Abstract: Techniques disclosed herein include using a Maximum A Posteriori (MAP) adaptation process that imposes sparseness constraints to generate acoustic parameter adaptation data for specific users based on a relatively small set of training data. The resulting acoustic parameter adaptation data identifies changes for a relatively small fraction of acoustic parameters from a baseline acoustic speech model instead of changes to all acoustic parameters. This results in user-specific acoustic parameter adaptation data that is several orders of magnitude smaller than storage amounts otherwise required for a complete acoustic model. This provides customized acoustic speech models that increase recognition accuracy at a fraction of expected data storage requirements.
    Type: Grant
    Filed: October 28, 2011
    Date of Patent: May 27, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Vaibhava Goel, Peder A. Olsen, Steven J. Rennie, Jing Huang
  • Patent number: 8731926
    Abstract: A spoken term detection apparatus includes: processing performed by a processor includes a feature extraction process extracting an acoustic feature from speech data accumulated in an accumulation part and storing an extracted acoustic feature in an acoustic feature storage, a first calculation process calculating a standard score from a similarity between an acoustic feature stored in the acoustic feature storage and an acoustic model stored in the acoustic model storage part, a second calculation process for comparing an acoustic model corresponding to an input keyword with the acoustic feature stored in the acoustic feature storage part to calculate a score of the keyword, and a retrieval process retrieving speech data including the keyword from speech data accumulated in the accumulation part based on the score of the keyword calculated by the second calculation process and the standard score stored in the standard score storage part.
    Type: Grant
    Filed: March 3, 2011
    Date of Patent: May 20, 2014
    Assignee: Fujitsu Limited
    Inventors: Nobuyuki Washio, Shouji Harada
  • Patent number: 8731925
    Abstract: The present invention can include a speech enrollment system including an ordered stack of grammars and a recognition engine. The ordered stack of grammars can include an application grammars layer, a confusable grammar layer, a personal grammar layer, a phrase enrolled grammar layer, and an enrollment grammar layer. The recognition engine can return recognition results for speech input by processing the input using the ordered stack of grammars. The processing can occur from the topmost layer in the stack to the bottommost layer in the stack. Each layer in the stack can includes exit criteria based upon a defined condition. When the exit criteria is satisfied, a result can be returned based upon that layer and lower layers of the ordered stack can be ignored.
    Type: Grant
    Filed: December 22, 2006
    Date of Patent: May 20, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: William V. Da Palma, Brien H. Muschett
  • Patent number: 8731935
    Abstract: A method, system, and computer program product for issuing an alert in response to detecting a content of interest in a conference. A listening logic comprising multiple conference engines monitors speakers, topics, and words spoken during a conference. A speech-to-text engine monitors the conference and records a transcription. A word emphasis engine monitors the transcription for key words. A voice identification engine monitors the live conversation and the recorded transcript, in real time, for a particular individual to begin speaking. An outline engine may create an outline of transcription. The listening device may issue an alert upon detecting a content of interest in the conference. The listening device may additionally display an outline or a selected portion of the transcript regarding a particular content of interest to inform a user of the listening device of a portion of content of the conference that may have been missed.
    Type: Grant
    Filed: September 10, 2009
    Date of Patent: May 20, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Timothy R. Chavez, Jacob Daniel Eisinger, Jennifer Elizabeth King, William R. Reichert
  • Patent number: 8731929
    Abstract: Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcomes the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command.
    Type: Grant
    Filed: February 4, 2009
    Date of Patent: May 20, 2014
    Assignee: VoiceBox Technologies Corporation
    Inventors: Robert A. Kennewick, David Locke, Michael R. Kennewick, Sr., Michael R. Kennewick, Jr., Richard Kennewick, Tom Freeman
  • Patent number: 8731940
    Abstract: A method of controlling a system which includes the steps of obtaining at least one signal representative of information communicated by a user via an input device in an environment of the user, wherein a signal from a first source is available in a perceptible form in the environment; estimating at least a point in time when a transition between information flowing from the first source and information flowing from the user is expected to occur; and timing the performance of a function by the system in relation to the estimated time.
    Type: Grant
    Filed: September 11, 2009
    Date of Patent: May 20, 2014
    Assignee: Koninklijke Philips N.V.
    Inventor: Aki Sakari Harma
  • Publication number: 20140136203
    Abstract: Some implementations provide a method for identifying a speaker. The method determines position and orientation of a second device based on data from a first device that is for capturing the position and orientation of the second device. The second device includes several microphones for capturing sound. The second device has movable position and movable orientation. The method assigns an object as a representation of a known user. The object has a moveable position. The method receives a position of the object. The position of the object corresponds to a position of the known user. The method processes the captured sound to identify a sound originating from the direction of the object. The direction of the object is relative to the position and the orientation of the second device. The method identifies the sound originating from the direction of the object as belonging to the known user.
    Type: Application
    Filed: December 21, 2012
    Publication date: May 15, 2014
    Applicant: QUALCOMM Incorporated
    Inventors: Kexi Liu, Pei Xiang
  • Publication number: 20140136194
    Abstract: The methods, apparatus, and systems described herein are designed to identify fraudulent callers. A voice print of a call is created and compared to known voice prints to determine if it matches one or more of the known voice prints. The methods include a pre-processing step to separate speech from non-speech, selecting a number of elements that affect the voice print the most, and/or computing an adjustment factor based on the scores of each received voice print against known voice prints.
    Type: Application
    Filed: November 9, 2012
    Publication date: May 15, 2014
    Applicant: Mattersight Corporation
    Inventors: Roger Warford, Douglas Brown, Christopher Danson, David Gustafson
  • Patent number: 8725508
    Abstract: A computer-implemented method and apparatus for searching for an element sequence, the method comprising: receiving a signal; determining an initial segment of the signal; inputting the initial segment into an element extraction engine to obtain a first element sequence; determining one or more second segments, each of the second segments at least partially overlapping with the initial segment; inputting the second segments into the element extraction engine to obtain at least one second element sequence; and searching for an element subsequence common to at least a predetermined number of sequences of the first element sequence and the second element sequences.
    Type: Grant
    Filed: March 27, 2012
    Date of Patent: May 13, 2014
    Assignee: Novospeech
    Inventor: Yossef Ben-Ezra
  • Publication number: 20140129223
    Abstract: A method and apparatus for voice recognition are disclosed. The apparatus includes: a voice receiver which receives a user's voice signal; a first voice recognition engine which receives the voice signal and recognizes voice based on the voice signal; a communicator which receives and transmits the voice signal to an external second voice recognition engine; and a controller which transmits the voice signal from the voice receiver to the first voice recognition engine, and in response to the first voice recognition engine being capable of recognizing voice from the voice signal, the controller outputs the voice recognition results of the first voice recognition engine, and in response to the first voice recognition engine being incapable of recognizing voice from the voice signal, the controller controls transmission of the voice signal to the second voice recognition engine through the communicator.
    Type: Application
    Filed: October 3, 2013
    Publication date: May 8, 2014
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Eun-sang BAK
  • Patent number: 8719017
    Abstract: Speech recognition models are dynamically re-configurable based on user information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. The techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.
    Type: Grant
    Filed: May 15, 2008
    Date of Patent: May 6, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Richard C Rose, Bojana Gajic
  • Patent number: 8719019
    Abstract: Speaker identification techniques are described. In one or more implementations, sample data is received at a computing device of one or more user utterances captured using a microphone. The sample data is processed by the computing device to identify a speaker of the one or more user utterances. The processing involving use of a feature set that includes features obtained using a filterbank having filters that space linearly at higher frequencies and logarithmically at lower frequencies, respectively, features that model the speaker's vocal tract transfer function, and features that indicate a vibration rate of vocal folds of the speaker of the sample data.
    Type: Grant
    Filed: April 25, 2011
    Date of Patent: May 6, 2014
    Assignee: Microsoft Corporation
    Inventors: Hoang T. Do, Ivan J. Tashev, Alejandro Acero, Jason S. Flaks, Robert N. Heitkamp, Molly R. Suver
  • Patent number: 8719020
    Abstract: Embodiments of the present invention provide systems, methods, and computer-readable media for generating a voice characteristic profile based on detected sound components. In embodiments, a call is initiated between a first caller and a second caller. Information communicated during the call is monitored to determine that sound components have been spoken by the first caller. The sound components are determined to be associated with a language dialect. Further, the sound components are stored in association with the first caller. In particular, the sound components are stored in association with the first caller in a voice characteristic profile of the first caller.
    Type: Grant
    Filed: January 7, 2013
    Date of Patent: May 6, 2014
    Assignee: Sprint Communications Company L.P.
    Inventors: Mark D. Peden, Simon Youngs, Gary D. Koller, Piyush Jethwa
  • Patent number: 8719023
    Abstract: An apparatus to improve robustness to environmental changes of a context dependent speech recognizer for an application, that includes a training database to store sounds for speech recognition training, a dictionary to store words supported by the speech recognizer, and a speech recognizer training module to train a set of one or more multiple state Hidden Markov Models (HMMs) with use of the training database and the dictionary. The speech recognizer training module performs a non-uniform state clustering process on each of the states of each HMM, which includes using a different non-uniform cluster threshold for at least some of the states of each HMM to more heavily cluster and correspondingly reduce a number of observation distributions for those of the states of each HMM that are less empirically affected by one or more contextual dependencies.
    Type: Grant
    Filed: May 21, 2010
    Date of Patent: May 6, 2014
    Assignee: Sony Computer Entertainment Inc.
    Inventors: Xavier Menendez-Pidal, Ruxin Chen
  • Patent number: 8719016
    Abstract: A method for converting speech to text in a speech analytics system is provided. The method includes receiving audio data containing speech made up of sounds from an audio source, processing the sounds with a phonetic module resulting in symbols corresponding to the sounds, and processing the symbols with a language module and occurrence table resulting in text. The method also includes determining a probability of correct translation for each word in the text, comparing the probability of correct translation for each word in the text to the occurrence table, and adjusting the occurrence table based on the probability of correct translation for each word in the text.
    Type: Grant
    Filed: April 7, 2010
    Date of Patent: May 6, 2014
    Assignee: Verint Americas Inc.
    Inventors: Omer Ziv, Ran Achituv, Ido Shapira
  • Patent number: 8719018
    Abstract: A biometric speaker-identification apparatus is disclosed that generates ordered speaker-identity candidates for a probe based on prototypes. Probe match scores are clustered, and templates that correspond to clusters having top M probe match scores are compared with the prototypes to obtain template-prototype match scores. The probe is also compared with the prototypes, and those templates corresponding to template-prototype match scores that are nearest to probe-prototype match scores are selected as speaker-identity candidates. The speaker-identity candidates are ordered based on their similarity to the probe.
    Type: Grant
    Filed: October 25, 2010
    Date of Patent: May 6, 2014
    Assignee: Lockheed Martin Corporation
    Inventor: Jonathan J. Dinerstein
  • Publication number: 20140122075
    Abstract: A voice recognition apparatus is provided. The voice recognition apparatus comprises: a voice receiver which receives a user's voice signal; a first voice recognition engine which receives the voice signal and performs a voice recognition process; a communication unit which receives the voice signal and transmits the voice signal to an external second voice recognition engine; and a controller which transmits the voice signal received through the voice receiver to at least one of the first voice recognition engine and the communication unit.
    Type: Application
    Filed: August 1, 2013
    Publication date: May 1, 2014
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Eun-sang BAK, Myung-jae KIM, Yu LIU, Geo-geun PARK
  • Publication number: 20140122076
    Abstract: A voice command system for a stitcher includes a tablet device in operative communication with the stitcher; the tablet device further comprising a display screen; a memory; a microprocessor; a communication module; and a microphone; and a speech recognition algorithm operatively communicating with said tablet device. An associated method includes the steps of digitizing a user's spoken command; transmitting the digitized spoken command to the speech recognition algorithm; producing a list of words possibly comprising the spoken command; parsing the list of possible words to identify the spoken command; and initiating execution of the spoken command.
    Type: Application
    Filed: October 25, 2013
    Publication date: May 1, 2014
    Applicant: GAMMILL, INC.
    Inventors: Theodore Stokes, Joseph W. Bauman
  • Publication number: 20140118472
    Abstract: In one embodiment, a method includes receiving requests to join a conference from a plurality of user devices proximate a first endpoint. The requests include a username. The method also includes receiving an audio signal for the conference from the first endpoint. The first endpoint is operable to capture audio proximate the first endpoint. The method also includes transmitting the audio signal to a second endpoint, remote from the first endpoint. The method also includes identifying, by a processor, an active speaker proximate the first endpoint based on information received from the plurality of user devices.
    Type: Application
    Filed: October 31, 2012
    Publication date: May 1, 2014
    Inventors: Yanghua Liu, Weidong Chen, Biren Gandhi, Raghurama Bhat, Joseph Fouad Khouri, John Joseph Houston, Brian Thomas Toombs
  • Publication number: 20140122074
    Abstract: In one exemplary embodiment, a computer-implemented method includes the step of determining an age group of a first user. Media content available to the first user is identified. It is determined whether the user has permission to listen to the media content. The media content is jammed with a sound wave at a frequency that can be heard by the user when the user does not have permission to listen to the media content. Optionally, a voice age-recognition algorithm to determine the age group of the first user. An age-group of a second user can be determined. The first user and the second user may be proximate to a media player providing the ambient sound stream.
    Type: Application
    Filed: October 29, 2012
    Publication date: May 1, 2014
    Inventors: Amit V. Karmarkar, Richard Ross Peters
  • Patent number: 8713542
    Abstract: Pausing a VoiceXML dialog of a multimodal application, including generating by the multimodal application a pause event; responsive to the pause event, temporarily pausing the dialogue by the VoiceXML interpreter; generating by the multimodal application a resume event; and responsive to the resume event, resuming the dialog. Embodiments are implemented with the multimodal application operating on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal application is operatively coupled to a VoiceXML interpreter, and the VoiceXML interpreter is interpreting the VoiceXML dialog to be paused.
    Type: Grant
    Filed: February 27, 2007
    Date of Patent: April 29, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, Jr., David Jaramillo, Gerald M. McCobb
  • Patent number: 8706493
    Abstract: In one embodiment of a controllable prosody re-estimation system, a TTS/STS engine consists of a prosody prediction/estimation module, a prosody re-estimation module and a speech synthesis module. The prosody prediction/estimation module generates predicted or estimated prosody information. And then the prosody re-estimation module re-estimates the predicted or estimated prosody information and produces new prosody information, according to a set of controllable parameters provided by a controllable prosody parameter interface. The new prosody information is provided to the speech synthesis module to produce a synthesized speech.
    Type: Grant
    Filed: July 11, 2011
    Date of Patent: April 22, 2014
    Assignee: Industrial Technology Research Institute
    Inventors: Cheng-Yuan Lin, Chien-Hung Huang, Chih-Chung Kuo
  • Patent number: 8706485
    Abstract: The present invention pertains to method and a communication device (100) for associating a contact record pertaining to a remote speaker (220) with a mnemonic image (191) based on attributes of the speaker (220). The method comprises receiving voice data of the speaker (220); in a communication session with a source device (200). A source determination representing the speaker (220) is registered, and then the received voice data is analyzed so that voice data characteristics can be extracted. Based on these voice data characteristics a mnemonic image (191) can be selected, and associated to a contact record in which the source determination is stored. The mnemonic image (191) may be selected among images previously stored in the device, or derived through editing of such images.
    Type: Grant
    Filed: May 17, 2011
    Date of Patent: April 22, 2014
    Assignees: Sony Corporation, Sony Mobile Communications AB
    Inventor: Joakim Martensson
  • Patent number: 8706488
    Abstract: In one aspect, a method of processing a voice signal to extract information to facilitate training a speech synthesis model is provided. The method comprises acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison. In another aspect, the method is performed by executing a program encoded on a computer readable medium. In another aspect, a speech synthesis model is provided by, at least in part, performing the method.
    Type: Grant
    Filed: February 27, 2013
    Date of Patent: April 22, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Michael D. Edgington, Laurence Gillick, Jordan R. Cohen