Specialized Equations Or Comparisons Patents (Class 704/236)

Correlation (Class 704/237)

Distance (Class 704/238)

Similarity (Class 704/239)

Probability (Class 704/240)

Dynamic time warping (Class 704/241)

Viterbi trellis (Class 704/242)

Automatic Updating of Confidence Scoring Functionality for Speech Recognition Systems

Publication number: 20130275135

Abstract: Automatically adjusting confidence scoring functionality is described for a speech recognition engine. Operation of the speech recognition system is revised so as to change an associated receiver operating characteristic (ROC) curve describing performance of the speech recognition system with respect to rates of false acceptance (FA) versus correct acceptance (CA). Then a confidence scoring functionality related to recognition reliability for a given input utterance is automatically adjusted such that where the ROC curve is better for a given operating point after revising the operation of the speech recognition system, the adjusting reflects a double gain constraint to maintain FA and CA rates at least as good as before revising operation of the speech recognition system.

Type: Application

Filed: January 7, 2011

Publication date: October 17, 2013

Inventors: Nicolas Morales, Dermot Connolly, Andrew Halberstadt
Forced/Predictable Adaptation for Speech Recognition

Publication number: 20130268270

Abstract: A method is described for use with automatic speech recognition using discriminative criteria for speaker adaptation. An adaptation evaluation is performed of speech recognition performance data for speech recognition system users. Adaptation candidate users are identified based on the adaptation evaluation for whom an adaptation process is likely to improve system performance.

Type: Application

Filed: April 5, 2012

Publication date: October 10, 2013

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Dan Ning Jiang, Vaibhava Goel, Dimitri Kanevsky, Yong Qin
Apparatus and method for calculating a fundamental frequency change

Patent number: 8554546

Abstract: A logarithmic frequency spectrum within a predetermined time range is calculated from a speech signal. The logarithmic frequency spectrum has a frequency element at equal intervals along a logarithmic frequency axis. A logarithmic frequency spectrogram is calculated by connecting a plurality of logarithmic frequency spectrums. A value of the frequency element along a straight line on the logarithmic frequency spectrogram is voted onto a Hough plane. The Hough plane has a voted value in correspondence with a gradient of the straight line. The voted value above a threshold and the gradient corresponding to the voted value are extracted from the Hough plane. A fundamental frequency change is calculated using the voted value and the gradient extracted.

Type: Grant

Filed: September 9, 2009

Date of Patent: October 8, 2013

Assignee: Kabushiki Kaisha Toshiba

Inventors: Yusuke Kida, Takashi Masuko
METHOD AND SYSTEM FOR USING CONVERSATIONAL BIOMETRICS AND SPEAKER IDENTIFICATION/VERIFICATION TO FILTER VOICE STREAMS

Publication number: 20130262112

Abstract: A method implemented in a computer infrastructure having computer executable code having programming instructions tangibly embodied on a computer readable storage medium. The programming instructions are operable to receive an audio stream of a communication between a plurality of participants. Additionally, the programming instructions are operable to filter the audio stream of the communication into separate audio streams, one for each of the plurality of participants, wherein each of the separate audio streams contains portions of the communication attributable to a respective participant of the plurality of participants. Furthermore, the programming instructions are operable to output the separate audio streams to a storage system.

Type: Application

Filed: May 23, 2013

Publication date: October 3, 2013

Applicant: INTERNATIONAL BUSINESS MAHCHINES CORPORATION

Inventors: Peeyush JAISWAL, Naveen NARAYAN
Generating sample error coefficients

Patent number: 8548804

Abstract: This invention relates to generation of a sample error coefficient suitable for use in an audio signal quality assessment system. The invention provides a method of determining a sample error coefficient between a first signal and a similar second signal comprising the steps of: determining a first periodicity measure from the first signal; determining a second periodicity measure from the second signal; generating a ratio in dependence upon said first periodicity measure and said second periodicity measure; and determining a sampling rate error coefficient in dependence upon said ratio.

Type: Grant

Filed: October 19, 2007

Date of Patent: October 1, 2013

Assignee: Psytechnics Limited

Inventors: Paul Barrett, Ludovic Maifait
System and method for adapting automatic speech recognition pronunciation by acoustic model restructuring

Patent number: 8548807

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

Type: Grant

Filed: June 9, 2009

Date of Patent: October 1, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
FACTORED TRANSFORMS FOR SEPARABLE ADAPTATION OF ACOUSTIC MODELS

Publication number: 20130253930

Abstract: Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.

Type: Application

Filed: March 23, 2012

Publication date: September 26, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Michael Lewis Seltzer, Alejandro Acero
Voice processing methods and systems

Patent number: 8543400

Abstract: Voice processing methods and systems are provided. An utterance is received. The utterance is compared with teaching materials according to at least one matching algorithm to obtain a plurality of matching values corresponding to a plurality of voice units of the utterance. Respective voice units are scored in at least one first scoring item according to the matching values and a personified voice scoring algorithm. The personified voice scoring algorithm is generated according to training utterances corresponding to at least one training sentence in a phonetic-balanced sentence set of a plurality of learners and at least one real teacher, and scores corresponding to the respective voice units of the training utterances of the learners in the first scoring item provided by the real teacher.

Type: Grant

Filed: June 6, 2008

Date of Patent: September 24, 2013

Assignee: National Taiwan University

Inventors: Lin-Shan Lee, Che-Kuang Lin, Chia-Lin Chang, Yi-Jing Lin, Yow-Bang Wang, Yun-Huan Lee, Li-Wei Cheng
Customizable method and system for emotional recognition

Patent number: 8538755

Abstract: An automated emotional recognition system is adapted to determine emotional states of a speaker based on the analysis of a speech signal. The emotional recognition system includes at least one server function and at least one client function in communication with the at least one server function for receiving assistance in determining the emotional states of the speaker. The at least one client function includes an emotional features calculator adapted to receive the speech signal and to extract therefrom a set of speech features indicative of the emotional state of the speaker. The emotional state recognition system further includes at least one emotional state decider adapted to determine the emotional state of the speaker exploiting the set of speech features based on a decision model. The server function includes at least a decision model trainer adapted to update the selected decision model according to the speech signal.

Type: Grant

Filed: January 31, 2007

Date of Patent: September 17, 2013

Assignee: Telecom Italia S.p.A.

Inventors: Gianmario Bollano, Donato Ettorre, Antonio Esiliato
Speech recognition system and speech recognizing method

Patent number: 8538751

Abstract: A speech recognition system and a speech recognizing method for high-accuracy speech recognition in the environment with ego noise are provided. A speech recognition system according to the present invention includes a sound source separating and speech enhancing section; an ego noise predicting section; and a missing feature mask generating section for generating missing feature masks using outputs of the sound source separating and speech enhancing section and the ego noise predicting section; an acoustic feature extracting section for extracting an acoustic feature of each sound source using an output for said each sound source of the sound source separating and speech enhancing section; and a speech recognizing section for performing speech recognition using outputs of the acoustic feature extracting section and the missing feature masks.

Type: Grant

Filed: June 10, 2011

Date of Patent: September 17, 2013

Assignee: Honda Motor Co., Ltd.

Inventors: Kazuhiro Nakadai, Gokhan Ince
Voice Activity Detection and Pitch Estimation

Publication number: 20130231932

Abstract: Implementations include systems, methods and/or devices operable to detect voice activity in an audible signal by detecting glottal pulses. The dominant frequency of a series of glottal pulses is perceived as the intonation pattern or melody of natural speech, which is also referred to as the pitch. However, as noted above, spoken communication typically occurs in the presence of noise and/or other interference. In turn, the undulation of voiced speech is masked in some portions of the frequency spectrum associated with human speech by the noise and/or other interference. In some implementations, detection of voice activity is facilitated by dividing the frequency spectrum associated with human speech into multiple sub-bands in order to identify glottal pulses that dominate the noise and/or other inference in particular sub-bands. Additionally and/or alternatively, in some implementations the analysis is furthered to provide a pitch estimate of the detected voice activity.

Type: Application

Filed: August 20, 2012

Publication date: September 5, 2013

Inventors: Pierre Zakarauskas, Alexander Escott, Clarence S.H. Chu, Shawn E. Stevenson
ASYNCHRONOUS VIDEO INTERVIEW SYSTEM

Publication number: 20130226578

Abstract: Aspects of an asynchronous video interview system and related techniques include a server that receives a plurality of pre-recorded video prompts, generates an interview script, transmits a video prompt from the interview script to be displayed at a client computing device, and receives a streamed video response from the client computing device. The server can perform algorithmic analysis on content of the video response. In another aspect, a server obtains response preference data indicating a timing parameter for a response. In another aspect, a video prompt and an information supplement (e.g., a news item) that relates to the content of the video prompt are transmitted. In another aspect, a server automatically selects a video prompt (e.g., a follow-up question) to be displayed at the client computing device (e.g., based on a response or information about an interviewee).

Type: Application

Filed: February 22, 2013

Publication date: August 29, 2013

Applicant: COLLEGENET, INC.

Inventor: CollegeNET, Inc.
Computer-implemented system and method for processing audio in a voice response environment

Patent number: 8521527

Abstract: A computer-implemented system and method for processing audio in a voice response environment is provided. A database of host scripts each comprising signature files of audio phrases and actions to take when one of the audio phrases is recognized is maintained. The host scripts are loaded and a call to a voice mail server is initiated. Incoming audio buffers are received during the call from voice messages stored on the voice mail server. The incoming audio buffers are processed. A signature data structure is created for each audio buffer. The signature data structure is compared with signatures of expected phrases in the host scripts. The actions stored in the host scripts are executed when the signature data structure matches the signature of the expected phrase.

Type: Grant

Filed: September 10, 2012

Date of Patent: August 27, 2013

Assignee: Intellisist, Inc.

Inventor: Martin R. M. Dunsmuir
Selecting speech data for speech recognition vocabulary

Patent number: 8521523

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting training data. In one aspect, a method comprises: selecting a target out of vocabulary rate; selecting a target percentage of user sessions; and determining a minimum training data freshness for a vocabulary of words, the minimum training data freshness corresponding to the target percentage of user sessions experiencing the target out of vocabulary rate.

Type: Grant

Filed: August 24, 2012

Date of Patent: August 27, 2013

Assignee: Google Inc.

Inventors: Maryam Garrett, Ciprian I. Chelba
Disambiguation of a spoken query term

Patent number: 8521526

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing spoken query terms. In one aspect, a method includes performing speech recognition on an audio signal to select two or more textual, candidate transcriptions that match a spoken query term, and to establish a speech recognition confidence value for each candidate transcription, obtaining a search history for a user who spoke the spoken query term, where the search history references one or more past search queries that have been submitted by the user, generating one or more n-grams from each candidate transcription, where each n-gram is a subsequence of n phonemes, syllables, letters, characters, words or terms from a respective candidate transcription, and determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries, and a weighting value that is based on the respective frequency.

Type: Grant

Filed: July 28, 2010

Date of Patent: August 27, 2013

Assignee: Google Inc.

Inventors: Matthew I. Lloyd, Johan Schalkwyk, Pankaj Risbood
Selecting speech data for speech recognition vocabulary

Patent number: 8515746

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting training data. In an aspect, a method comprises: selecting a target out of vocabulary rate; selecting a target percentage of user sessions; and determining a minimum training data collection duration for a vocabulary of words, the minimum training data collection duration corresponding to the target percentage of user sessions experiencing the target out of vocabulary rate.

Type: Grant

Filed: August 24, 2012

Date of Patent: August 20, 2013

Assignee: Google Inc.

Inventors: Maryam Garrett, Ciprian I. Chelba
Selecting speech data for speech recognition vocabulary

Patent number: 8515745

Abstract: Methods, systems, and apparatus for selecting training data. In an aspect, a method comprises: obtaining search session data comprising search sessions that include search queries, wherein each search query comprises words; determining a threshold out of vocabulary rate indicating a rate at which a word in a search query is not included in a vocabulary; determining a threshold session out of vocabulary rate, the session out of vocabulary rate indicating a rate at which search sessions have an out of vocabulary rate that meets the threshold out of vocabulary rate; selecting a vocabulary of words that, for a set of test data, has a session out of vocabulary rate that meets the threshold session out of vocabulary rate, the vocabulary of words being selected from the one or more words included in each of the search queries included in the search sessions.

Type: Grant

Filed: August 24, 2012

Date of Patent: August 20, 2013

Assignee: Google Inc.

Inventors: Maryam Garrett, Ciprian I. Chelba
Voice interface for a search engine

Patent number: 8515752

Abstract: A system provides search results from a voice search query. The system receives a voice search query from a user, derives one or more recognition hypotheses, each being associated with a weight, from the voice search query, and constructs a weighted boolean query using the recognition hypotheses. The system then provides the weighted boolean query to a search system and provides the results of the search system to a user.

Type: Grant

Filed: March 12, 2008

Date of Patent: August 20, 2013

Assignee: Google Inc.

Inventors: Alexander Mark Franz, Monika H. Henzinger, Sergey Brin, Brian Christopher Milch
Compression and decompression of data vectors

Patent number: 8510105

Abstract: For an enhanced sequential compression of data vectors in a respective compression pass, a current data vector is mapped to at least one current code vector of at least one codebook in at least one quantization stage. The at least one codebook is reordered taking account of at least one intermediate result from the current compression pass and at least one intermediate result from a preceding compression pass. At least one codebook index that is associated in the at least one reordered codebook to the at least one current code vector is then provided for further use. For a decompression of compressed data vectors represented by such codebook indices, at least one codebook index is mapped to at least one code vector of at least one equally reordered codebook.

Type: Grant

Filed: October 21, 2005

Date of Patent: August 13, 2013

Assignee: Nokia Corporation

Inventor: Jani K. Nurminen
Methods and systems for predicting a text

Patent number: 8498864

Abstract: Methods and systems for predicting a text are described. In an example, a computing device may be configured to receive one or more typed characters that compose a portion of a text; and receive, a voice input corresponding to a spoken utterance of at least a portion of the text. The computing device may be configured to determine, based on the one or more typed characters and the voice input, one or more candidate texts predicting the text. Further, the computing device may be configured to provide the one or more candidate texts.

Type: Grant

Filed: September 27, 2012

Date of Patent: July 30, 2013

Assignee: Google Inc.

Inventors: Yu Liang, Xiaotao Duan
Weighting factor learning system and audio recognition system

Patent number: 8494847

Abstract: A weighting factor learning system includes an audio recognition section that recognizes learning audio data and outputting the recognition result; a weighting factor updating section that updates a weighting factor applied to a score obtained from an acoustic model and a language model so that the difference between a correct-answer score calculated with the use of a correct-answer text of the learning audio data and a score of the recognition result becomes large; a convergence determination section that determines, with the use of the score after updating, whether to return to the weighting factor updating section to update the weighting factor again; and a weighting factor convergence determination section that determines, with the use of the score after updating, whether to return to the audio recognition section to perform the process again and update the weighting factor using the weighting factor updating section.

Type: Grant

Filed: February 19, 2008

Date of Patent: July 23, 2013

Assignee: NEC Corporation

Inventors: Tadashi Emori, Yoshifumi Onishi
Compression coding and decoding method, coder, decoder, and coding device

Patent number: 8489405

Abstract: The embodiments of the present invention relate to a compression coding and decoding method, a coder, a decoder and a coding device. The compression coding method includes: extracting sign information of an input signal to obtain an absolute value signal of the input signal; obtaining a residual signal of the absolute value signal by using a prediction coefficient, where the prediction coefficient is obtained by prediction and analysis that are performed according to a signal characteristic of the absolute value signal of the input signal; and multiplexing the residual signal, the sign information and a coding parameter to output a coding code stream, after the residual signal, the sign information and the coding parameter are respectively coded, so as to improve compression efficiency of a voice and audio signal.

Type: Grant

Filed: December 1, 2011

Date of Patent: July 16, 2013

Assignee: Huawei Technologies Co., Ltd.

Inventors: Fengyan Qi, Lei Miao, Qing Zhang
VOICE ANALYZER AND VOICE ANALYSIS SYSTEM

Publication number: 20130173266

Abstract: A voice analyzer includes a first voice acquisition unit provided in a place where a distance of a sound wave propagation path from a mouth of a user is a first distance, plural second voice acquisition units provided in places where distances of sound wave propagation paths from the mouth of the user are smaller than the first distance, and an identification unit that identifies whether the voices acquired by the first and second voice acquisition units are voices of the user or voices of others excluding the user on the basis of a result of comparison between first sound pressure of a voice signal of the voice acquired by the first voice acquisition unit and second sound pressure calculated from sound pressure of a voice signal of the voice acquired by each of the plural second voice acquisition units.

Type: Application

Filed: May 7, 2012

Publication date: July 4, 2013

Applicant: FUJI XEROX CO., LTD.

Inventors: Yohei NISHINO, Haruo HARADA, Kei SHIMOTANI, Hirohito YONEYAMA, Kiyoshi IIDA, Akira FUJII
EMOTIONAL AND/OR PSYCHIATRIC STATE DETECTION

Publication number: 20130166291

Abstract: Mental state of a person is classified in an automated manner by analysing natural speech of the person. A glottal waveform is extracted from a natural speech signal. Pre-determined parameters defining at least one diagnostic class of a class model are retrieved, the parameters determined from selected training glottal waveform features. The selected glottal waveform features are extracted from the signal. Current mental state of the person is classified by comparing extracted glottal waveform features with the parameters and class model. Feature extraction from a glottal waveform or other natural speech signal may involve determining spectral amplitudes of the signal, setting spectral amplitudes below a pre-defined threshold to zero and, for each of a plurality of sub bands, determining an area under the thresholded spectral amplitudes, and deriving signal feature parameters from the determined areas in accordance with a diagnostic class model.

Type: Application

Filed: August 23, 2010

Publication date: June 27, 2013

Applicant: RMIT UNIVERSITY

Inventors: Margaret Lech, Nicholas Brian Allen, Ian Shaw Burnett, Ling He
Systems and methods for implicitly interpreting semantically redundant communication modes

Patent number: 8457959

Abstract: New language constantly emerges from complex, collaborative human-human interactions like meetings—such as when a presenter handwrites a new term on a whiteboard while saying it redundantly. The system and method described includes devices for receiving various types of human communication activities (e.g., speech, writing and gestures) presented in a multimodally redundant manner, includes processors and recognizers for segmenting or parsing, and then recognizing selected sub-word units such as phonemes and syllables, and then includes alignment, refinement, and integration modules to find or at least an approximate match to the one or more terms that were presented in the multimodally redundant manner. Once the system has performed a successful integration, one or more terms may be newly enrolled into a database of the system, which permits the system to continuously learn and provide an association for proper names, abbreviations, acronyms, symbols, and other forms of communicated language.

Type: Grant

Filed: February 29, 2008

Date of Patent: June 4, 2013

Inventor: Edward C. Kaiser
Systems and Methods for Concurrent Signal Recognition

Publication number: 20130132082

Abstract: Methods and systems for recognition of concurrent, superimposed, or otherwise overlapping signals are described. A Markov Selection Model is introduced that, together with probabilistic decomposition methods, enable recognition of simultaneously emitted signals from various sources. For example, a signal mixture may include overlapping speech from different persons. In some instances, recognition may be performed without the need to separate signals or sources. As such, some of the techniques described herein may be useful in automatic transcription, noise reduction, teaching, electronic games, audio search and retrieval, medical and scientific applications, etc.

Type: Application

Filed: February 21, 2011

Publication date: May 23, 2013

Inventor: Paris Smaragdis
System and method for customized voice response

Patent number: 8442827

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating an accent source. A system practicing the method collects data associated with customer specific services, generates country-specific or dialect-specific weights for each service in the customer specific services list, generates a summary weight based on an aggregation of the country-specific or dialect-specific weights, and sets an interactive voice response system language model based on the summary weight and the country-specific or dialect-specific weights. The interactive voice response system can also change the user interface based on the interactive voice response system language model. The interactive voice response system can tune a voice recognition algorithm based on the summary weight and the country-specific weights. The interactive voice response system can adjust phoneme matching in the language model based on a possibility that the speaker is using other languages.

Type: Grant

Filed: June 18, 2010

Date of Patent: May 14, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Nicholas Duffield
Method and apparatus for estimating high-band energy in a bandwidth extension system

Patent number: 8433582

Abstract: A method (100) includes receiving (101) an input digital audio signal comprising a narrow-band signal. The input digital audio signal is processed (102) to generate a processed digital audio signal. A high-band energy level corresponding to the input digital audio signal is estimated (103) based on a transition-band of the processed digital audio signal within a predetermined upper frequency range of a narrow-band bandwidth. A high-band digital audio signal is generated (104) based on the high-band energy level and an estimated high-band spectrum corresponding to the high-band energy level.

Type: Grant

Filed: February 1, 2008

Date of Patent: April 30, 2013

Assignee: Motorola Mobility LLC

Inventors: Tenkasi V. Ramabadran, Mark A. Jasiuk
Method and system for annotating video material

Patent number: 8433566

Abstract: Video material is dividing into temporal segments. Each segment is examined to determine whether the soundtrack of the segment contains speech sufficient for analysis and if so, metadata are generated based on analysis of the speech. If not, the segment is analysed by comparing frames thereof with those of stored segments that already have metadata assigned to them. One then assigns to the segment under consideration stored metadata associated with one or more stored segments that are similar.

Type: Grant

Filed: February 7, 2008

Date of Patent: April 30, 2013

Assignee: BRITISH TELECOMMUNICATIONS public limited company

Inventors: Zhan Cui, Nader Azarmi, Gery M Ducatel
Compensation of intra-speaker variability in speaker diarization

Patent number: 8433567

Abstract: A method, system, and computer program product compensation of intra-speaker variability in speaker diarization are provided. The method includes: dividing a speech session into segments of duration less than an average duration between speaker change; parameterizing each segment by a time dependent probability density function supervector, for example, using a Gaussian Mixture Model; computing a difference between successive segment supervectors; and computing a scatter measure such as a covariance matrix of the difference as an estimate of intra-speaker variability. The method further includes compensating the speech session for intra-speaker variability using the estimate of intra-speaker variability.

Type: Grant

Filed: April 8, 2010

Date of Patent: April 30, 2013

Assignee: International Business Machines Corporation

Inventor: Hagai Aronowitz
Augmenting an audio signal via extraction of musical features and obtaining of media fragments

Patent number: 8433575

Abstract: A system and method is described in which a multimedia story is rendered to a consumer in dependence on features extracted from an audio signal representing for example a musical selection of the consumer. Features such as key changes and tempo of the music selection are related to dramatic parameters defined by and associated with story arcs, narrative story rules and film or story structure. In one example a selection of a few music tracks provides input audio signals (602) from which musical features are extracted (604), following which a dramatic parameter list and timeline are generated (606). Media fragments are then obtained (608), the fragments having story content associated with the dramatic parameters, and the fragments output (610) with the music selection.

Type: Grant

Filed: December 10, 2003

Date of Patent: April 30, 2013

Assignee: AMBX UK Limited

Inventors: David A. Eves, Richard S. Cole, Christopher Thorne
SYSTEM AND METHOD FOR COMBINING FRAME AND SEGMENT LEVEL PROCESSING, VIA TEMPORAL POOLING, FOR PHONETIC CLASSIFICATION

Publication number: 20130103402

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.

Type: Application

Filed: October 25, 2011

Publication date: April 25, 2013

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Sumit CHOPRA, Dimitrios Dimitriadis, Patrick Haffner
SYSTEM AND METHOD FOR SUPPLEMENTAL SPEECH RECOGNITION BY IDENTIFIED IDLE RESOURCES

Publication number: 20130090925

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for improving automatic speech recognition performance. A system practicing the method identifies idle speech recognition resources and establishes a supplemental speech recognizer on the idle resources based on overall speech recognition demand. The supplemental speech recognizer can differ from a main speech recognizer, and, along with the main speech recognizer, can be associated with a particular speaker. The system performs speech recognition on speech received from the particular speaker in parallel with the main speech recognizer and the supplemental speech recognizer and combines results from the main and supplemental speech recognizer. The system recognizes the received speech based on the combined results. The system can use beam adjustment in place of or in combination with a supplemental speech recognizer.

Type: Application

Filed: November 30, 2012

Publication date: April 11, 2013

Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventor: AT&T Intellectual Property I, L.P.
Speaker adaptation of vocabulary for speech recognition

Patent number: 8417527

Abstract: A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.

Type: Grant

Filed: October 13, 2011

Date of Patent: April 9, 2013

Assignee: Nuance Communications, Inc.

Inventors: Nitendra Rajput, Ashish Verma
Automatic disclosure detection

Patent number: 8412527

Abstract: A method of detecting pre-determined phrases to determine compliance quality is provided. The method includes determining whether at least one of an event or a precursor event has occurred based on a comparison between pre-determined phrases and a communication between a sender and a recipient in a communications network, and rating the recipient based on the presence of the pre-determined phrases associated with the event or the presence of the pre-determined phrases associated with the precursor event in the communication.

Type: Grant

Filed: June 24, 2009

Date of Patent: April 2, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventors: I. Dan Melamed, Yeon-Jun Kim, Andrej Ljolje, Bernard S. Renger, David J. Smith
DIALOG-BASED VOICEPRINT SECURITY FOR BUSINESS TRANSACTIONS

Publication number: 20130080166

Abstract: A system for biometrically securing business transactions uses speech recognition and voiceprint authentication to biometrically secure a transaction from a variety of client devices in a variety of media. A voiceprint authentication server receives a request from a third party requestor to authenticate a previously enrolled end user of a client device. A signature collection applet presents the user a randomly generated signature string, prompting the user to speak the string, and recording the user's as he speaks. After transmittal to the authentication server, the signature string is recognized using voice recognition software, and compared with a stored voiceprint, using voiceprint authentication software. An authentication result is reported to both user and requestor. Voiceprints are stored in a repository along with the associated user data. Enrollment is by way of a separate enrollment applet, wherein the end user provides user information and records a voiceprint, which is subsequently stored.

Type: Application

Filed: November 19, 2012

Publication date: March 28, 2013

Applicant: EMC Corporation

Inventor: EMC Corporation
Model Based Online Normalization of Feature Distribution for Noise Robust Speech Recognition

Publication number: 20130080165

Abstract: Online histogram recognition may be provided. Upon receiving a spoken phrase from a user, a histogram/frequency distribution may be estimated on the spoken phrase according to a prior distribution. The histogram distribution may be equalized and then provided to a spoken language understanding application.

Type: Application

Filed: September 24, 2011

Publication date: March 28, 2013

Applicant: Microsoft Corporation

Inventors: Shizen Wang, Yifan Gong
Transparent voice registration and verification method and system

Patent number: 8406382

Abstract: A method includes registering a voice of a party in order to provide voice verification for communications with an entity. A call is received from a party at a voice response system. The party is prompted for information and verbal communication spoken by the party is captured. A voice model associated with the party is created by processing the captured verbal communication spoken by the party and is stored. The identity of the party is verified and a previously stored voice model of the party, registered during a previous call from the party, is updated. The creation of the voice model is imperceptible to the party.

Type: Grant

Filed: November 9, 2011

Date of Patent: March 26, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Mazin Gilbert
Recognition via high-dimensional data classification

Patent number: 8406525

Abstract: A method is disclosed for recognition of high-dimensional data in the presence of occlusion, including: receiving a target data that includes an occlusion and is of an unknown class, wherein the target data includes a known object; sampling a plurality of training data files comprising a plurality of distinct classes of the same object as that of the target data; and identifying the class of the target data through linear superposition of the sampled training data files using l1 minimization, wherein a linear superposition with a sparsest number of coefficients is used to identify the class of the target data.

Type: Grant

Filed: January 29, 2009

Date of Patent: March 26, 2013

Assignees: The Regents of the University of California, The Board of Trustees of the University of Illinois

Inventors: Yi Ma, Allen Yang Yang, John Norbert Wright, Andrew William Wagner
Processing packets of encoded speech using a plurality of processing levels based on values transmitted over a network

Patent number: 8401850

Abstract: Methods and systems for handling speech recognition processing in effectively real-time, via the Internet, in order that users do not experience noticeable delays from the start of an exercise until they receive responsive feedback. A user uses a client to access the Internet and a server supporting speech recognition processing, e.g., for language learning activities. The user inputs speech to the client, which transmits the user speech to the server in approximate real-time. The server evaluates the user speech in context of the current speech recognition exercise being executed. The server receives a first value and a first packet of encoded speech from a first client, a second value and a second packet of encoded speech from a second client, and services the first and second packets using first and second levels of processing based on the first and second values.

Type: Grant

Filed: January 10, 2011

Date of Patent: March 19, 2013

Assignee: GlobalEnglish Corporation

Inventor: Christopher S. Jochumson
Method and system to generate finite state grammars using sample phrases

Patent number: 8396712

Abstract: A method and system for generating a finite state grammar is provided. The method comprises receiving user input of at least two sample phrases; analyzing the sample phrases to determine common words that occur in each of the sample phrases and optional words that occur in only some of the sample phrases; creating a mathematical expression representing the sample phrases, the expression including each word found in the sample phrases and an indication of whether a word is a common word or an optional word; displaying the mathematical expression to a user; allowing the user to alter the mathematical expression; generating a finite state grammar corresponding to the altered mathematical expression; and displaying the finite state grammar to the user.

Type: Grant

Filed: August 26, 2004

Date of Patent: March 12, 2013

Assignee: West Corporation

Inventor: Ashok Mitter Khosla
Device and method for automatic participant identification in a recorded multimedia stream

Patent number: 8390669

Abstract: The present disclosure discloses a method for identifying individuals in a multimedia stream originating from a video conferencing terminal or a Multipoint Control Unit, including executing a face detection process on the multimedia stream; defining subsets including facial images of one or more individuals, where the subsets are ranked according to a probability that their respective one or more individuals will appear in a video stream; comparing a detected face to the subsets in consecutive order starting with a most probable subset, until a match is found; and storing an identity of the detected face as searchable metadata in a content database in response to the detected face matching a facial image in one of the subsets.

Type: Grant

Filed: December 15, 2009

Date of Patent: March 5, 2013

Assignee: Cisco Technology, Inc.

Inventors: Jason Catchpole, Craig Cockerton
Speech recognition system and method for generating a mask of the system

Patent number: 8392185

Abstract: The speech recognition system of the present invention includes: a sound source separating section which separates mixed speeches from multiple sound sources; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each separated speech according to reliability of separation in separating operation of the sound source separating section; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.

Type: Grant

Filed: August 19, 2009

Date of Patent: March 5, 2013

Assignee: Honda Motor Co., Ltd.

Inventors: Kazuhiro Nakadai, Toru Takahashi, Hiroshi Okuno
Systems and methods for performing speech recognition using constraint based processing

Patent number: 8392193

Abstract: A method for performing speech recognition includes receiving a voice input and generating at least one possible result corresponding to the voice input. The method may also include calculating a value for the speech recognition result and comparing the calculated value to a particular portion of the speech recognition result. The method may further include retrieving information based on one or more factors associated with the voice input and using the retrieved information to determine a likelihood that the speech recognition result is correct.

Type: Grant

Filed: June 1, 2004

Date of Patent: March 5, 2013

Assignee: Verizon Business Global LLC

Inventors: Paul T. Schultz, Robert A. Sartini
Dynamic pruning for automatic speech recognition

Patent number: 8392187

Abstract: Methods, speech recognition systems, and computer readable media are provided that recognize speech using dynamic pruning techniques. A search network is expanded based on a frame from a speech signal, a best hypothesis is determined in the search network, a default beam threshold is modified, and the search network is pruned using the modified beam threshold. The search network may be further pruned based on the search depth of the best hypothesis and/or the average number of frames per state for a search path.

Type: Grant

Filed: January 30, 2009

Date of Patent: March 5, 2013

Assignee: Texas Instruments Incorporated

Inventor: Qifeng Zhu
Speech recognition using speech characteristic probabilities

Patent number: 8392189

Abstract: A speech recognition module includes an acoustic front-end module, a sound detection module, and a word detection module. The acoustic front-end module generates a plurality of representations of frames from a digital audio signal and generates speech characteristic probabilities for the plurality of frames. The sound detection module determines a plurality of estimated utterances from the plurality of representations and the speech characteristic probabilities. The word detection module determines one or more words based on the plurality of estimated utterances and the speech characteristic probabilities.

Type: Grant

Filed: November 30, 2009

Date of Patent: March 5, 2013

Assignee: Broadcom Corporation

Inventor: Nambirajan Seshadri
Parcel address recognition by voice and image through operational rules

Patent number: 8380501

Abstract: A system, method, and computer-readable medium for parcel address recognition. A method includes receiving an address input and producing candidate address results corresponding to the address input. The method includes receiving operational scheme knowledge describing the mode of operation of a parcel processing system, and receiving at least one operational rule corresponding to the operational scheme knowledge. The method includes applying the at least one operational rule to the candidate address results and producing and storing a finalized result according to the operational rule and the candidate address results.

Type: Grant

Filed: July 30, 2010

Date of Patent: February 19, 2013

Assignee: Siemens Industry, Inc.

Inventor: Stanley W. Sipe
Voice interface for a search engine

Patent number: 8380502

Abstract: A system receives a voice search query from a user, derives recognition hypotheses from the voice search query, and determines scores associated with the recognition hypotheses, the scores being based on a comparison of the recognition hypotheses to previously received search queries. The system discards at least one of the recognition hypotheses that is associated with a first score that is less than a threshold value, and constructs a first query using at least one non-discarded recognition hypothesis, where the at least one first non-discarded recognition hypothesis is associated with a second score that at least meets the threshold value. The system forwards the first query to a search system, receives first results associated with the first query, and provides the first results to the user.

Type: Grant

Filed: October 14, 2011

Date of Patent: February 19, 2013

Assignee: Google Inc.

Inventors: Alexander Mark Franz, Monika H. Henzinger, Sergey Brin, Brian Christopher Milch
Distributed apparatus and method for a perceptual quality measurement service

Patent number: 8370132

Abstract: Apparatus and methods are provided for measuring perceptual quality of a signal transmitted over a communication network, such as a circuit-switching network, packet-switching network, or a combination thereof. In accordance with one embodiment, a distributed apparatus is provided for measuring perceptual quality of a signal transmitted over a communication network. The distributed apparatus includes communication ports located at various locations in the network. The distributed apparatus may also include a signal processor including a processor for providing non-intrusive measurement of the perceptual quality of the signal. The distributed apparatus may further include recorders operatively connected to the communication ports and to the signal processor, wherein at least one of the recorders processes the signal at one of the communication ports and the recorder sends the signal to the signal processor to measure the perceptual quality of the signal.

Type: Grant

Filed: November 21, 2005

Date of Patent: February 5, 2013

Assignee: Verizon Services Corp.

Inventor: Adrian E. Conway
Computer-Implemented Systems and Methods for Scoring Concatenated Speech Responses

Publication number: 20130030808

Abstract: Systems and methods are provided for scoring non-native speech. Two or more speech samples are received, where each of the samples are of speech spoken by a non-native speaker, and where each of the samples are spoken in response to distinct prompts. The two or more samples are concatenated to generate a concatenated response for the non-native speaker, where the concatenated response is based on the two or more speech samples that were elicited using the distinct prompts. A concatenated speech proficiency metric is computed based on the concatenated response, and the concatenated speech proficiency metric is provided to a scoring model, where the scoring model generates a speaking score based on the concatenated speech metric.

Type: Application

Filed: July 24, 2012

Publication date: January 31, 2013

Inventors: Klaus Zechner, Su-Youn Yoon, Lei Chen, Shasha Xie, Xiaoming Xi, Chaitanya Ramineni

prev … 3 4 5 6 7 8 9 10 11 … next