Specialized Equations Or Comparisons Patents (Class 704/236)

Correlation (Class 704/237)

Distance (Class 704/238)

Similarity (Class 704/239)

Probability (Class 704/240)

Dynamic time warping (Class 704/241)

Viterbi trellis (Class 704/242)

System and method for robust access and entry to large structured data using voice form-filling

Patent number: 8924212

Abstract: A method, apparatus and machine-readable medium are provided. A phonotactic grammar is utilized to perform speech recognition on received speech and to generate a phoneme lattice. A document shortlist is generated based on using the phoneme lattice to query an index. A grammar is generated from the document shortlist. Data for each of at least one input field is identified based on the received speech and the generated grammar.

Type: Grant

Filed: August 26, 2005

Date of Patent: December 30, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Cyril Georges Luc Allauzen, Sarangarajan Parthasarathy
Intelligent analysis queue construction

Patent number: 8918406

Abstract: A method of processing content files may include receiving the content file, employing processing circuitry to determine an identity score of a source of a portion of at least a portion the content file, to determine a word score based for the content file and to determine a metadata score for the content file, determining a composite priority score based on the identity score, the word score and the metadata score, and associating the composite priority score with the content file for electronic provision of the content file together with the composite priority score to a human analyst.

Type: Grant

Filed: December 14, 2012

Date of Patent: December 23, 2014

Assignee: Second Wind Consulting LLC

Inventor: Donna Rober
Speech recognition device and speech recognition method using space-frequency spectrum

Patent number: 8918319

Abstract: In a speech recognition device and a speech recognition method, a key phrase containing at least one key word is received. The speech recognition method comprises steps: receiving a sound source signal of a key word and generating a plurality of audio signals; transforming the audio signals into a plurality of frequency signals; receiving the frequency signals to obtain a space-frequency spectrum and an angular estimation value thereof; receiving the space-frequency spectrum to define and output at least one spatial eigenparameter, and using the angular estimation value and the frequency signals to perform spotting and evaluation and outputting a Bhattacharyya distance; and receiving the spatial eigenparameter and the Bhattacharyya distance and using corresponding thresholds to determine correctness of the key phrase. Thereby this invention robustly achieves high speech recognition rate under very low SNR conditions.

Type: Grant

Filed: July 7, 2011

Date of Patent: December 23, 2014

Assignee: National Chiao University

Inventors: Jwu-Sheng Hu, Ming-Tang Lee, Ting-Chao Wang, Chia Hsin Yang
Voice activity detector based upon a detected change in energy levels between sub-frames and a method of operation

Patent number: 8909522

Abstract: A voice activity detector (100) includes a frame divider (201) for dividing frames of an input signal into consecutive sub-frames, an energy level estimator (202) for estimating an energy level of the input signal in each of the consecutive sub-frames, a noise eliminator (203) for analyzing the estimated energy levels of sets of the sub-frames to detect and eliminate from enhancement noise sub-frames and to indicate remaining sub-frames as speech sub-frames, and an energy level enhancer (205) for enhancing the estimated energy level for each of the indicated speech sub-frames by an amount which relates to a detected change of the estimated energy level for a current speech sub-frame relative to that for neighboring speech sub-frames.

Type: Grant

Filed: July 8, 2008

Date of Patent: December 9, 2014

Assignee: Motorola Solutions, Inc.

Inventors: Itzhak Shperling, Sergey Bondarenko, Eitan Koren, Yosi Rahamim, Tomer Yablonka
Speech recognition device and method outputting or rejecting derived words

Patent number: 8903724

Abstract: A speech recognition device includes, a speech recognition section that conducts a search, by speech recognition, on audio data stored in a first memory section to extract word-spoken portions where plural words transferred are each spoken and, of the word-spoken portions extracted, rejects the word-spoken portion for the word designated as a rejecting object; an acquisition section that obtains a derived word of a designated search target word, the derived word being generated in accordance with a derived word generation rule stored in a second memory section or read out from the second memory section; a transfer section that transfers the derived word and the search target word to the speech recognition section, the derived word being set to the outputting object or the rejecting object by the acquisition section; and an output section that outputs the word-spoken portion extracted and not rejected in the search.

Type: Grant

Filed: February 1, 2012

Date of Patent: December 2, 2014

Assignee: Fujitsu Limited

Inventors: Nobuyuki Washio, Shouji Harada
Front-end processor for speech recognition, and speech recognizing apparatus and method using the same

Patent number: 8892436

Abstract: A method of recognizing speech is provided. The method includes the operations of (a) dividing first speech that is input to a speech recognizing apparatus into frames; (b) converting the frames of the first speech into frames of second speech by applying conversion rules to the divided frames, respectively; and (c) recognizing, by the speech recognizing apparatus, the frames of the second speech, wherein (b) comprises converting the frames of the first speech into the frames of the second speech by reflecting at least one frame from among the frames that are previously positioned with respect to a frame of the first speech.

Type: Grant

Filed: October 19, 2011

Date of Patent: November 18, 2014

Assignees: Samsung Electronics Co., Ltd., Seoul National University Industry Foundation

Inventors: Ki-wan Eom, Chang-woo Han, Tae-gyoon Kang, Nam-soo Kim, Doo-hwa Hong, Jae-won Lee, Hyung-joon Lim
Audio analysis terminal and system for emotion estimation of a conversation that discriminates utterance of a user and another person

Patent number: 8892424

Abstract: An audio analysis system includes a terminal apparatus and a host system. The terminal apparatus acquires an audio signal of a sound containing utterances of a user and another person, discriminates between portions of the audio signal corresponding to the utterances of the user and the other person, detects an utterance feature based on the portion corresponding to the utterance of the user or the other person, and transmits utterance information including the discrimination and detection results to the host system. The host system detects a part corresponding to a conversation from the received utterance information, detects portions of the part of the utterance information corresponding to the user and the other person, compares a combination of plural utterance features corresponding to the portions of the part of the utterance information of the user and the other person with relation information to estimate an emotion, and outputs estimation information.

Type: Grant

Filed: February 10, 2012

Date of Patent: November 18, 2014

Assignee: Fuji Xerox Co., Ltd.

Inventors: Haruo Harada, Hirohito Yoneyama, Kei Shimotani, Yohei Nishino, Kiyoshi Iida, Takao Naito
System and method for combining frame and segment level processing, via temporal pooling, for phonetic classification

Patent number: 8886533

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.

Type: Grant

Filed: October 25, 2011

Date of Patent: November 11, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Sumit Chopra, Dimitrios Dimitriadis, Patrick Haffner
Leveraging interaction context to improve recognition confidence scores

Patent number: 8886532

Abstract: On a computing device a speech utterance is received from a user. The speech utterance is a section of a speech dialog that includes a plurality of speech utterances. One or more features from the speech utterance are identified. Each identified feature from the speech utterance is a specific characteristic of the speech utterance. One or more features from the speech dialog are identified. Each identified feature from the speech dialog is associated with one or more events in the speech dialog. The one or more events occur prior to the speech utterance. One or more identified features from the speech utterance and one or more identified features from the speech dialog are used to calculate a confidence score for the speech utterance.

Type: Grant

Filed: October 27, 2010

Date of Patent: November 11, 2014

Assignee: Microsoft Corporation

Inventors: Michael Levit, Bruce Melvin Buntschuh
SEAMLESS AUTHENTICATION AND ENROLLMENT

Publication number: 20140330563

Abstract: Some aspects of the invention may include a computer-implemented method for enrolling voice prints generated from audio streams, in a database. The method may include receiving an audio stream of a communication session and creating a preliminary association between the audio stream and an identity of a customer that has engaged in the communication session based on identification information. The method may further include determining a confidence level of the preliminary association based on authentication information related to the customer and if the confidence level is higher than a threshold, sending a request to compare the audio stream to a database of voice prints of known fraudsters. If the audio stream does not match any known fraudsters, sending a request to generate from the audio stream a current voice print associated with the customer and enrolling the voice print in a customer voice print database.

Type: Application

Filed: May 2, 2013

Publication date: November 6, 2014

Applicant: NICE-SYSTEMS LTD.

Inventors: Shahar FAIANS, Avraham Lousky, Elad Hoffman, Alon Sabban, Jade Tarni Kahn, Roie Mandler
Utterance verification and pronunciation scoring by lattice transduction

Patent number: 8880399

Abstract: In the field of language learning systems, proper pronunciation of words and phrases is an integral aspect of language learning, determining the proximity of the language learner's pronunciation to a standardized, i.e. ‘perfect’, pronunciation is utilized to guide the learner from imperfect toward perfect pronunciation. In this regard, a phoneme lattice scoring system is utilized, whereby an input from a user is transduced into the perfect pronunciation example in a phoneme lattice. The cost of this transduction may be determined based on a summation of substitutions, deletions and insertions of phonemes needed to transducer from the input to the perfect pronunciation of the utterance.

Type: Grant

Filed: September 27, 2010

Date of Patent: November 4, 2014

Assignee: Rosetta Stone, Ltd.

Inventors: Andreas Hagen, Bryan Pellom
ELECTRONIC DEVICE AND VOICE CONTROL METHOD THEREOF

Publication number: 20140324425

Abstract: A voice control method is applied in an electronic device. The electronic device includes a voice input unit, a play unit, and a storage unit storing a conversation database and an association table between different ranges of voice characteristics and styles of response voice. The method includes the following steps. Obtaining voice signals input via the voice input unit. Determining which content is input according to the obtained voice signals. Searching in the conversation database to find a response corresponding to the input content. Analyzing voice characteristics of the obtained voice signals. Comparing the voice characteristics of the obtained voice signals with the pre-stored ranges. Selecting the associated response voice. Finally, outputting the found response using the associated response voice via the play unit.

Type: Application

Filed: May 21, 2013

Publication date: October 30, 2014

Applicants: HON HAI PRECISION INDUSTRY CO., LTD., FU TAI HUA INDUSTRY (SHENZHEN) CO., LTD.

Inventor: REN-WEN HUANG
Distance metrics for universal pattern processing tasks

Patent number: 8856002

Abstract: A universal pattern processing system receives input data and produces output patterns that are best associated with said data. The system uses input means receiving and processing input data, a universal pattern decoder means transforming models using the input data and associating output patterns with original models that are changed least during transforming, and output means outputting best associated patterns chosen by a pattern decoder means.

Type: Grant

Filed: April 11, 2008

Date of Patent: October 7, 2014

Assignee: International Business Machines Corporation

Inventors: Dimitri Kanevsky, David Nahamoo, Tara N Sainath
Adaptive equalization system

Patent number: 8843367

Abstract: An adaptive equalization system that adjusts the spectral shape of a speech signal based on an intelligibility measurement of the speech signal may improve the intelligibility of the output speech signal. Such an adaptive equalization system may include a speech intelligibility measurement module, a spectral shape adjustment module, and an adaptive equalization module. The speech intelligibility measurement module is configured to calculate a speech intelligibility measurement of a speech signal. The spectral shape adjustment module is configured to generate a weighted long-term speech curve based on a first predetermined long-term average speech curve, a second predetermined long-term average speech curve, and the speech intelligibility measurement. The adaptive equalization module is configured to adapt equalization coefficients for the speech signal based on the weighted long-term speech curve.

Type: Grant

Filed: May 4, 2012

Date of Patent: September 23, 2014

Assignee: 8758271 Canada Inc.

Inventors: Phillip Alan Hetherington, Xueman Li
System and method for tracking fraudulent electronic transactions using voiceprints of uncommon words

Patent number: 8831941

Abstract: Disclosed are systems, methods, and computer readable media for comparing customer voice prints comprising of uncommonly spoken words with a database of known fraudulent voice signatures and continually updating the database to decrease the risk of identity theft. The method embodiment comprises comparing a received voice signal against a database of known fraudulent voice signatures, denying the caller's transaction if the voice signal substantially matches the database of known fraudulent voice signatures, adding the caller's voice signal to the database of known fraudulent voice signatures if the voice signal does not substantially match a separate speaker verification database and received additional information is not verified.

Type: Grant

Filed: May 29, 2007

Date of Patent: September 9, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Mazin Gilbert, Jay Wilpon
METHODS, APPARATUS AND COMPUTER PROGRAMS FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20140249816

Abstract: An automatic speech recognition (ASR) system includes a speech-responsive application and a recognition engine. The ASR system generates user prompts to elicit certain spoken inputs, and the speech-responsive application performs operations when the spoken inputs are recognised. The recognition engine compares sounds within an input audio signal with phones within an acoustic model, to identify candidate matching phones. A recognition confidence score is calculated for each candidate matching phone, and the confidence scores are used to help identify one or more likely sequences of matching phones that appear to match a word within the grammar of the speech-responsive application. The per-phone confidence scores are evaluated against predefined confidence score criteria (for example, identifying scores below a ‘low confidence’ threshold) and the results of the evaluation are used to influence subsequent selection of user prompts.

Type: Application

Filed: February 26, 2014

Publication date: September 4, 2014

Applicant: Nuance Communications, Inc.

Inventors: John Brian Pickering, Timothy David Poultney, Benjamin Terrick Staniford, Matthew Whitbourne
System and method for recognizing emotional state from a speech signal

Patent number: 8825479

Abstract: A computerized method, software, and system for recognizing emotions from a speech signal, wherein statistical and MFCC features are extracted from the speech signal, the MFCC features are sorted to provide a basis for comparison between the speech signal and reference samples, the statistical and MFCC features are compared between the speech signal and reference samples, a scoring system is used to compare relative correlation to different emotions, a probable emotional state is assigned to the speech signal based on the scoring system and the probable emotional state is communicated to a user.

Type: Grant

Filed: October 24, 2013

Date of Patent: September 2, 2014

Assignee: Simple Emotion, Inc.

Inventors: Akash Krishnan, Matthew Fernandez
System and method for adapting automatic speech recognition pronunciation by acoustic model restructuring

Patent number: 8812315

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

Type: Grant

Filed: October 1, 2013

Date of Patent: August 19, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
Large language models in machine translation

Patent number: 8812291

Abstract: Systems, methods, and computer program products for machine translation are provided. In some implementations a system is provided. The system includes a language model including a collection of n-grams from a corpus, each n-gram having a corresponding relative frequency in the corpus and an order n corresponding to a number of tokens in the n-gram, each n-gram corresponding to a backoff n-gram having an order of n?1 and a collection of backoff scores, each backoff score associated with an n-gram, the backoff score determined as a function of a backoff factor and a relative frequency of a corresponding backoff n-gram in the corpus.

Type: Grant

Filed: December 10, 2012

Date of Patent: August 19, 2014

Assignee: Google Inc.

Inventors: Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, Jeffrey Dean
System and method for combining speech recognition outputs from a plurality of domain-specific speech recognizers via machine learning

Patent number: 8812321

Abstract: Disclosed herein are systems, methods and non-transitory computer-readable media for performing speech recognition across different applications or environments without model customization or prior knowledge of the domain of the received speech. The disclosure includes recognizing received speech with a collection of domain-specific speech recognizers, determining a speech recognition confidence for each of the speech recognition outputs, selecting speech recognition candidates based on a respective speech recognition confidence for each speech recognition output, and combining selected speech recognition candidates to generate text based on the combination.

Type: Grant

Filed: September 30, 2010

Date of Patent: August 19, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Mazin Gilbert, Srinivas Bangalore, Patrick Haffner, Robert Bell
Error concealment for sub-band coded audio signals

Patent number: 8812923

Abstract: A decoder and method of decoding a sub-band coded digital audio signal. The decoder comprises: an input, for receiving sub-band coefficients for a plurality of sub-bands of the audio signal; an error detection unit, adapted to analyze the content of a sequence of coefficients in one of the sub-bands, to derive for each coefficient an indication of whether the coefficient has been corrupted by an error of a predefined type; an error masking unit, adapted to generate from the sequence a modified sequence of coefficients for the sub-band, wherein errors of the predefined type are attenuated; a coefficient combination unit, adapted to combine the received coefficients and the modified coefficients, in dependence upon the indication of error; and a signal reconstruction unit, adapted to reconstruct the audio signal using the combined coefficients.

Type: Grant

Filed: November 22, 2011

Date of Patent: August 19, 2014

Assignee: NXP, B.V.

Inventor: Christophe Marc Macours
Efficient Incremental Modification of Optimized Finite-State Transducers (FSTs) for Use in Speech Applications

Publication number: 20140229177

Abstract: Methods of incrementally modifying a word-level finite state transducer (FST) are described for adding and removing sentences. A prefix subset of states and arcs in the FST is determined that matches a prefix portion of the sentence. A suffix subset of states and arcs in the FST is determined that matches a suffix portion of the sentence. A new sentence can then be added to the FST by appending a new sequence of states and arcs to the FST corresponding to a remainder of the sentence between the prefix and suffix. An existing sentence can be removed from the FST by removing any arcs and states between the prefix subset and the suffix subset. The resulting modified FST is locally efficient but does not satisfy global optimization criteria such as minimization.

Type: Application

Filed: September 21, 2011

Publication date: August 14, 2014

Applicant: Nuance Communications, Inc.

Inventors: Stephan Kanthak, Oliver Bender
Method and Apparatus for Efficient I-Vector Extraction

Publication number: 20140222423

Abstract: Most speaker recognition systems use i-vectors which are compact representations of speaker voice characteristics. Typical i-vector extraction procedures are complex in terms of computations and memory usage. According an embodiment, a method and corresponding apparatus for speaker identification, comprise determining a representation for each component of a variability operator, representing statistical inter- and intra-speaker variability of voice features with respect to a background statistical model, in terms of an orthogonal operator common to all components of the variability operator and having a first dimension larger than a second dimension of the components of the variability operator; computing statistical voice characteristics of a particular speaker using the determined representations; and employing the statistical voice characteristics of the particular speaker in performing speaker recognition.

Type: Application

Filed: February 7, 2013

Publication date: August 7, 2014

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Sandro Cumani, Pietro Laface
WAVEFORM ANALYSIS OF SPEECH

Publication number: 20140207456

Abstract: A waveform analysis of speech is disclosed. Embodiments include methods for analyzing captured sounds produced by animals, such as human vowel sounds, and accurately determining the sound produced. Some embodiments utilize computer processing to identify the location of the sound within a waveform, select a particular time within the sound, and measure a fundamental frequency and one or more formants at the particular time. Embodiments compare the fundamental frequency and the one or more formants to known thresholds and multiples of the fundamental frequency, such as by a computer-run algorithm. The results of this comparison identify of the sound with a high degree of accuracy.

Type: Application

Filed: March 24, 2014

Publication date: July 24, 2014

Applicant: Waveform Communications, LLC

Inventor: Michael A. Stokes
Language model creation device, language model creation method, and computer-readable storage medium

Patent number: 8788266

Abstract: The present invention uses a language model creation device 200 that creates a new language model using a standard language model created from standard language text. The language model creation device 200 includes a transformation rule storage section 201 that stores transformation rules used for transforming dialect-containing word strings into standard language word strings, and a dialect language model creation section 203 that creates dialect-containing n-grams by applying the transformation rules to word n-grams in the standard language model and, furthermore, creates the new language model (dialect language model) by adding the created dialect-containing n-grams to the word n-grams.

Type: Grant

Filed: March 16, 2010

Date of Patent: July 22, 2014

Assignee: NEC Corporation

Inventors: Tasuku Kitade, Takafumi Koshinaka, Yoshifumi Onishi
METHODS, SYSTEMS, AND CIRCUITS FOR SPEAKER DEPENDENT VOICE RECOGNITION WITH A SINGLE LEXICON

Publication number: 20140200890

Abstract: Embodiments reduce the complexity of speaker dependent speech recognition systems and methods by representing the code word (i.e., the word to be recognized) using a single Gaussian Mixture Model (GMM) which is adapted from a Universal Background Model (UBM). Only the parameters of the GMM need to be stored. Further reduction in computation is achieved by only checking the GMM component that is relevant to the keyword template. In this scheme, keyword template is represented by a sequence of the index of best performing component of the GMM of the keyword model. Only one template is saved by combining the registration template using Longest Common Sequence algorithm. The quality of the word model is continuously updated by performing expectation maximization iteration using the test word which is accepted as keyword model.

Type: Application

Filed: March 31, 2013

Publication date: July 17, 2014

Applicant: STMicroelectronics Asia Pacific Pte Ltd.

Inventors: Evelyn Kurniawati, Sapna George
Reducing false positives in speech recognition systems

Patent number: 8781825

Abstract: Embodiments of the present invention improve methods of performing speech recognition. In one embodiment, the present invention includes a method comprising receiving a spoken utterance, processing the spoken utterance in a speech recognizer to generate a recognition result, determining consistencies of one or more parameters of component sounds of the spoken utterance, wherein the parameters are selected from the group consisting of duration, energy, and pitch, and wherein each component sound of the spoken utterance has a corresponding value of said parameter, and validating the recognition result based on the consistency of at least one of said parameters.

Type: Grant

Filed: August 24, 2011

Date of Patent: July 15, 2014

Assignee: Sensory, Incorporated

Inventors: Jonathan Shaw, Pieter Vermeulen, Stephen Sutton, Robert Savoie
FLEXIBLE ARCHITECTURE FOR ACOUSTIC SIGNAL PROCESSING ENGINE

Publication number: 20140188470

Abstract: A disclosed speech processor includes a front end to receive a speech input and generate a feature vector indicative of a portion of the speech input and a Gaussian mixture (GMM) circuit to receive the feature vector, model any one of a plurality of GMM speech recognition algorithms, and generate a GMM score for the feature vector based on the GMM speech recognition algorithm modeled. In at least one embodiment, the GMM circuit includes a common compute block to generate feature a vector sum indicative of a weighted sum of differences squares between the feature vector and a mixture component of the GMM speech recognition algorithm. In at least one embodiment, the GMM speech recognition algorithm being modeled includes a plurality of Gaussian mixture components and the common compute block is operable to generate feature vector scores corresponding to each of the plurality of mixture components.

Type: Application

Filed: December 31, 2012

Publication date: July 3, 2014

Inventors: Jenny Chang, Michael E. Deisher, Ravishankar Iyer
Method for measuring speech characteristics

Patent number: 8768697

Abstract: In some embodiments, a method includes measuring a disparity between two speech samples by segmenting both a reference speech sample and a student speech sample into speech units. A duration disparity can be determined for units that are not adjacent to each other in the reference speech sample. A duration disparity can also be determined for the corresponding units in the student speech sample. A difference can then be calculated between the student speech sample duration disparity and the reference speech sample duration disparity.

Type: Grant

Filed: January 29, 2010

Date of Patent: July 1, 2014

Assignee: Rosetta Stone, Ltd.

Inventors: Joseph Tepperman, Theban Stanley, Kadri Hacioglu
Content-based audio playback emphasis

Patent number: 8768706

Abstract: Techniques are disclosed for facilitating the process of proofreading draft transcripts of spoken audio streams. In general, proofreading of a draft transcript is facilitated by playing back the corresponding spoken audio stream with an emphasis on those regions in the audio stream that are highly relevant or likely to have been transcribed incorrectly. Regions may be emphasized by, for example, playing them back more slowly than regions that are of low relevance and likely to have been transcribed correctly. Emphasizing those regions of the audio stream that are most important to transcribe correctly and those regions that are most likely to have been transcribed incorrectly increases the likelihood that the proofreader will accurately correct any errors in those regions, thereby improving the overall accuracy of the transcript.

Type: Grant

Filed: August 20, 2010

Date of Patent: July 1, 2014

Assignee: Multimodal Technologies, LLC

Inventors: Kjell Schubert, Juergen Fritsch, Michael Finke, Detlef Koll
Speech recognition method, speech recognition apparatus and computer program

Patent number: 8768692

Abstract: A speech recognition apparatus predicts, based on the occurrence cycle and duration time of impulse noise that occurs periodically, a segment in which impulse noise occurs, and executes speech recognition processing based on the feature components of the remaining frames excluding a feature component of a frame corresponding to the predicted segment, or the feature components extracted from frames created from sound data excluding a part corresponding to the predicted segment.

Type: Grant

Filed: May 3, 2007

Date of Patent: July 1, 2014

Assignee: Fujitsu Limited

Inventor: Shoji Hayakawa
Voice search engine interface for scoring search hypotheses

Patent number: 8768700

Abstract: A system may receive a voice search query and may determine word hypotheses for the voice query. Each word hypothesis may include one or more terms. The system may obtain a search query log and may determine, for each word hypothesis, a quantity of other search queries, in the search query log, that include the one or more terms. The system may determine weights based on the determined quantities. The system may generate, based on the weights, a first search query from the word hypotheses and may obtain a first set of search results. The system may modify, based on the first set of search results, one or more of the weights. The system may generate a second search query from the word hypotheses and obtain, based on the second search query, a second set of search results for the voice query.

Type: Grant

Filed: September 14, 2012

Date of Patent: July 1, 2014

Assignee: Google Inc.

Inventors: Alexander Mark Franz, Monika H. Henzinger, Sergey Brin, Brian Christopher Milch
SPEECH RECOGNITION DEVICE AND SPEECH RECOGNITION METHOD, DATA BASE FOR SPEECH RECOGNITION DEVICE AND CONSTRUCTING METHOD OF DATABASE FOR SPEECH RECOGNITION DEVICE

Publication number: 20140180688

Abstract: A speech recognition device comprises, a corpus processor which includes a refiner to classify collected corpora into domains corresponding to functions of the speech recognition device, and an extractor which extracts collected basic sentences based on functions of the speech recognition device with respect to the corpora in the domains, a database (DB) which stores therein the extracted basic sentences based on functions of the speech recognition device, a corpus receiver which receives a user's corpora, and a controller which compares a received basic sentence extracted by the extractor with collected basic sentences stored in the DB and determines the function intended by the user's corpora.

Type: Application

Filed: September 12, 2013

Publication date: June 26, 2014

Inventors: Oh-yun KWON, Jae-cheol KIM, Seung-il YOON, Cheon-seong LEE
Speech recognition system interactive agent

Patent number: 8762152

Abstract: Methods and systems for performing speech recognition using an electronic interactive agent are disclosed. In embodiments of the invention, an electronic agent is presented in a form perceptible to a user. The electronic agent is used to solicit speech input from a user and to respond to the user's recognized speech, and mimics the behavior of a human agent in a natural language query session with the user. The electronic agent may be implemented in a distributed speech recognition system in which speech recognition tasks are divided between client and server.

Type: Grant

Filed: October 1, 2007

Date of Patent: June 24, 2014

Assignee: Nuance Communications, Inc.

Inventors: Ian Bennett, Bandi Ramesh Babu, Kishor Morkhandikar, Pallaki Gururaj
Consonant-segment detection apparatus and consonant-segment detection method

Patent number: 8762147

Abstract: A signal portion is extracted from an input signal for each frame having a specific duration to generate a per-frame input signal. The per-frame input signal in a time domain is converted into a per-frame input signal in a frequency domain, thereby generating a spectral pattern. Subband average energy is derived in each of subbands adjacent one another in the spectral pattern. The subband average energy is compared in at least one subband pair of a first subband and a second subband that is a higher frequency band than the first subband, the first and second subbands being consecutive subbands in the spectral pattern. It is determined that the per-frame input signal includes a consonant segment if the subband average energy of the second subband is higher than the subband average energy of the first subband.

Type: Grant

Filed: February 1, 2012

Date of Patent: June 24, 2014

Assignee: JVC KENWOOD Corporation

Inventors: Akiko Akechi, Takaaki Yamabe
System And Method For Event Summarization Using Observer Social Media Messages

Publication number: 20140172427

Abstract: A method for processing messages pertaining to an event includes receiving a plurality of messages pertaining to the event from electronic communication devices associated with a plurality of observers of the event, generating a first message stream that includes only a portion of the plurality of messages corresponding to a first participant in the event, identifying a first sub-event in the first message stream with reference to a time distribution of messages and content distribution of messages in the first message stream, generating a sub-event summary with reference to a portion of the plurality of messages in the first message stream that are associated with the first sub-event, and transmitting the sub-event summary to a plurality of electronic communication devices associated with a plurality of users who are not observers of the event.

Type: Application

Filed: December 13, 2013

Publication date: June 19, 2014

Applicant: Robert Bosch GmbH

Inventors: Fei Liu, Fuliang Weng, Chao Shen, Lin Zhao
System and method for handling missing speech data

Patent number: 8751229

Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for handling missing speech data. The computer-implemented method includes receiving speech with a missing segment, generating a plurality of hypotheses for the missing segment, identifying a best hypothesis for the missing segment, and recognizing the received speech by inserting the identified best hypothesis for the missing segment. In another method embodiment, the final step is replaced with synthesizing the received speech by inserting the identified best hypothesis for the missing segment. In one aspect, the method further includes identifying a duration for the missing segment and generating the plurality of hypotheses of the identified duration for the missing segment. The step of identifying the best hypothesis for the missing segment can be based on speech context, a pronouncing lexicon, and/or a language model. Each hypothesis can have an identical acoustic score.

Type: Grant

Filed: November 21, 2008

Date of Patent: June 10, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Alistair D. Conkie
Method for processing noisy speech signal, apparatus for same and computer-readable recording medium

Patent number: 8744845

Abstract: A noise estimation method for a noisy speech signal according to an embodiment of the present invention includes the steps of approximating a transformation spectrum by transforming an input noisy speech signal to a frequency domain, calculating a smoothed magnitude spectrum having a decreased difference in a magnitude of the transformation spectrum between neighboring frames, calculating a search spectrum to represent an estimated noise component of the smoothed magnitude spectrum, and estimating a noise spectrum by using a recursive average method using an adaptive forgetting factor defined by using the search spectrum. According to an embodiment of the present invention, the amount of calculation for noise estimation is small, and large-capacity memory is not required. Accordingly, the present invention can be easily implemented in hardware or software. Further, the accuracy of noise estimation can be increase because an adaptive procedure can be performed on each frequency sub-band.

Type: Grant

Filed: March 31, 2009

Date of Patent: June 3, 2014

Assignee: Transono Inc.

Inventors: Sung Il Jung, Dong Gyung Ha
Methods and apparatus providing call quality testing

Patent number: 8737571

Abstract: A method, apparatus and computer readable medium for call quality testing is presented. A query is transmitted over a communications network from a first location to a second location. The query results in an audio signal at the second location, which is received at the first location. The audio signal is analyzed by comparing the signal with a reference signal clip. A statistical parameter is generated, the statistical parameter indicative of a quality of the received signal.

Type: Grant

Filed: June 29, 2004

Date of Patent: May 27, 2014

Assignee: Empirix Inc.

Inventors: Albert R. Seeley, Nathan David, Zhongyi Chen, Douglas C. Williams, Andrew Ullmann
System and method for the secure, real-time, high accuracy conversion of general quality speech into text

Patent number: 8738374

Abstract: Described is a speech-to-text conversion system and method that provides secure, real-time and high-accuracy conversion of general-quality speech into text. The system is designed to interface with external devices and services, providing a simple and convenient manner to transcribe audio that may be stored elsewhere such as a wireless phone's voice mail, or occurring between two or more parties such as a conference call. The first step in the system's process ensures secure and private transcription by separating an audio stream into many audio shreds, each of which has duration of only a few seconds and cannot reveal the context of the conversation. A workforce of geographically distributed transcription agents who transcribe the audio shreds is able to generate transcription in real time, with many agents working in parallel on a single conversation. No one agent (or group of agents) receives a sufficient number of audio shreds to reconstruct the context of any conversation.

Type: Grant

Filed: May 22, 2009

Date of Patent: May 27, 2014

Assignee: j2 Global Communications, Inc.

Inventor: Jon Jaroker
Speaker adaptation of vocabulary for speech recognition

Patent number: 8731928

Abstract: A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.

Type: Grant

Filed: March 15, 2013

Date of Patent: May 20, 2014

Assignee: Nuance Communications, Inc.

Inventors: Nitendra Rajput, Ashish Verma
Frame erasure concealment technique for a bitstream-based feature extractor

Patent number: 8731921

Abstract: A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.

Type: Grant

Filed: November 30, 2012

Date of Patent: May 20, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Richard Vandervoort Cox, Hong Kook Kim
Automatic speech analysis

Patent number: 8725518

Abstract: A system for providing automatic quality management regarding a level of conformity to a specific accent, including, a recording system, a statistical model database with statistical models representing speech data of different levels of conformity to a specific accent, a speech analysis system, a quality management system. Wherein the recording system is adapted to record one or more samples of a speakers speech and provide it to the speech analysis system for analysis, and wherein the speech analysis system is adapted to provide a score of the speakers speech samples to the quality management system by analyzing the recorded speech samples relative to the statistical models in the statistical model database.

Type: Grant

Filed: April 25, 2006

Date of Patent: May 13, 2014

Assignee: Nice Systems Ltd.

Inventors: Moshe Waserblat, Barak Eilam
Method and system for identifying sound signals

Patent number: 8725829

Abstract: A method and system is described which allows users to identify (pre-recorded) sounds such as music, radio broadcast, commercials, and other audio signals in almost any environment. The audio signal (or sound) must be a recording represented in a database of recordings. The service can quickly identify the signal from just a few seconds of excerption, while tolerating high noise and distortion. Once the signal is identified to the user, the user may perform transactions interactively in real-time or offline using the identification information.

Type: Grant

Filed: April 26, 2004

Date of Patent: May 13, 2014

Assignee: Shazam Investments Limited

Inventors: Avery Li-Chun Wang, Christopher Jacques Penrose Barton, Dheeraj Shankar Mukherjee, Philip Inghelbrecht
Speech recognition dictionary compilation assisting system, speech recognition dictionary compilation assisting method and speech recognition dictionary compilation assisting program

Patent number: 8719021

Abstract: A speech recognition dictionary compilation assisting system can create and update speech recognition dictionary and language models efficiently so as to reduce speech recognition errors by utilizing text data available at a low cost. The system includes speech recognition dictionary storage section 105, language model storage section 106 and acoustic model storage section 107. A virtual speech recognition processing section 102 processes analyzed text data generated by the text analyzing section 101 by making reference to the recognition dictionary, language models and acoustic models so as to generate virtual text data resulted from speech recognition, and compares the virtual text data resulted from speech recognition with the analyzed text data. The update processing section 103 updates the recognition dictionary and language models so as to reduce different point(s) between both sets of text data.

Type: Grant

Filed: February 2, 2007

Date of Patent: May 6, 2014

Assignee: NEC Corporation

Inventor: Takafumi Koshinaka
Method and apparatus for recognizing and reacting to user personality in accordance with speech recognition system

Patent number: 8719035

Abstract: Techniques are disclosed for recognizing user personality in accordance with a speech recognition system. For example, a technique for recognizing a personality trait associated with a user interacting with a speech recognition system includes the following steps/operations. One or more decoded spoken utterances of the user are obtained. The one or more decoded spoken utterances are generated by the speech recognition system. The one or more decoded spoken utterances are analyzed to determine one or more linguistic attributes (morphological and syntactic filters) that are associated with the one or more decoded spoken utterances. The personality trait associated with the user is then determined based on the analyzing step/operation.

Type: Grant

Filed: March 26, 2008

Date of Patent: May 6, 2014

Assignee: Nuance Communications, Inc.

Inventors: Osamuyimen Thompson Stewart, Liwei Dai
Method and system to generate finite state grammars using sample phrases

Patent number: 8712775

Abstract: A method and system for generating a finite state grammar is provided. The method comprises receiving user input of at least two sample phrases; analyzing the sample phrases to determine common words that occur in each of the sample phrases and optional words that occur in only some of the sample phrases; creating a mathematical expression representing the sample phrases, the expression including each word found in the sample phrases and an indication of whether a word is a common word or an optional word; displaying the mathematical expression to a user; allowing the user to alter the mathematical expression; generating a finite state grammar corresponding to the altered mathematical expression; and displaying the finite state grammar to the user.

Type: Grant

Filed: January 31, 2013

Date of Patent: April 29, 2014

Assignee: West Interactive Corporation II

Inventor: Ashok Mitter Khosla
Sparse data compression

Patent number: 8711015

Abstract: The invention relates to compressing of sparse data sets contains sequences of data values and position information therefor. The position information may be in the form of position indices defining active positions of the data values in a sparse vector of length N. The position information is encoded into the data values by adjusting one or more of the data values within a pre-defined tolerance range, so that a pre-defined mapping function of the data values and their positions is close to a target value. In one embodiment, the mapping function is defined using a sub-set of N filler values which elements are used to fill empty positions in the input sparse data vector. At the decoder, the correct data positions are identified by searching though possible sub-sets of filler values.

Type: Grant

Filed: August 24, 2011

Date of Patent: April 29, 2014

Assignee: Her Majesty the Queen in Right of Canada as represented by the Minister of Industry, through the Communications Research Centre Canada

Inventors: Frederic Mustiere, Hossein Najaf-Zadeh, Ramin Pishehvar, Hassan Lahdili, Louis Thibault, Martin Bouchard
System and method for selecting audio contents by using speech recognition

Patent number: 8706489

Abstract: A system and method for selecting audio contents by using the speech recognition to obtain a textual phrase from a series of audio contents are provided. The system includes an output module outputting the audio contents, an input module receiving a speech input from a user, a buffer temporarily storing the audio contents within a desired period and the speech input, and a recognizing module performing a speech recognition between the audio contents within the desired period and the speech input to generate an audio phrase and the corresponding textual phrase matching with the speech input.

Type: Grant

Filed: August 8, 2006

Date of Patent: April 22, 2014

Assignee: Delta Electronics Inc.

Inventors: Jia-lin Shen, Chien-Chou Hung
Vibrotactile device and method using the same

Patent number: 8704649

Abstract: Disclosed herein is a vibrotactile device intuitively providing information by inducing a tactile sense to a user, and a method using the same. The device according to an embodiment includes a vibrating contact panel contacting with a user's hand; a plurality of vibratory modules that are attached to the lower part of the vibrating contact panel and vibrate with different intensities according to the amount of supplied power; and a plurality of vibration isolating links that are coupled, respectively, to an end of each of the modules to support the modules and to isolate the vibration from the modules.

Type: Grant

Filed: January 21, 2009

Date of Patent: April 22, 2014

Assignee: Korea Institute of Science and Technology

Inventors: Dong Seok Ryu, Sung Chul Kang, Gi Hun Yang

prev 1 2 3 4 5 6 7 8 9 … next