Specialized Equations Or Comparisons Patents (Class 704/236)
  • Publication number: 20130275135
    Abstract: Automatically adjusting confidence scoring functionality is described for a speech recognition engine. Operation of the speech recognition system is revised so as to change an associated receiver operating characteristic (ROC) curve describing performance of the speech recognition system with respect to rates of false acceptance (FA) versus correct acceptance (CA). Then a confidence scoring functionality related to recognition reliability for a given input utterance is automatically adjusted such that where the ROC curve is better for a given operating point after revising the operation of the speech recognition system, the adjusting reflects a double gain constraint to maintain FA and CA rates at least as good as before revising operation of the speech recognition system.
    Type: Application
    Filed: January 7, 2011
    Publication date: October 17, 2013
    Inventors: Nicolas Morales, Dermot Connolly, Andrew Halberstadt
  • Publication number: 20130268270
    Abstract: A method is described for use with automatic speech recognition using discriminative criteria for speaker adaptation. An adaptation evaluation is performed of speech recognition performance data for speech recognition system users. Adaptation candidate users are identified based on the adaptation evaluation for whom an adaptation process is likely to improve system performance.
    Type: Application
    Filed: April 5, 2012
    Publication date: October 10, 2013
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Dan Ning Jiang, Vaibhava Goel, Dimitri Kanevsky, Yong Qin
  • Patent number: 8554546
    Abstract: A logarithmic frequency spectrum within a predetermined time range is calculated from a speech signal. The logarithmic frequency spectrum has a frequency element at equal intervals along a logarithmic frequency axis. A logarithmic frequency spectrogram is calculated by connecting a plurality of logarithmic frequency spectrums. A value of the frequency element along a straight line on the logarithmic frequency spectrogram is voted onto a Hough plane. The Hough plane has a voted value in correspondence with a gradient of the straight line. The voted value above a threshold and the gradient corresponding to the voted value are extracted from the Hough plane. A fundamental frequency change is calculated using the voted value and the gradient extracted.
    Type: Grant
    Filed: September 9, 2009
    Date of Patent: October 8, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Yusuke Kida, Takashi Masuko
  • Publication number: 20130262112
    Abstract: A method implemented in a computer infrastructure having computer executable code having programming instructions tangibly embodied on a computer readable storage medium. The programming instructions are operable to receive an audio stream of a communication between a plurality of participants. Additionally, the programming instructions are operable to filter the audio stream of the communication into separate audio streams, one for each of the plurality of participants, wherein each of the separate audio streams contains portions of the communication attributable to a respective participant of the plurality of participants. Furthermore, the programming instructions are operable to output the separate audio streams to a storage system.
    Type: Application
    Filed: May 23, 2013
    Publication date: October 3, 2013
    Applicant: INTERNATIONAL BUSINESS MAHCHINES CORPORATION
    Inventors: Peeyush JAISWAL, Naveen NARAYAN
  • Patent number: 8548804
    Abstract: This invention relates to generation of a sample error coefficient suitable for use in an audio signal quality assessment system. The invention provides a method of determining a sample error coefficient between a first signal and a similar second signal comprising the steps of: determining a first periodicity measure from the first signal; determining a second periodicity measure from the second signal; generating a ratio in dependence upon said first periodicity measure and said second periodicity measure; and determining a sampling rate error coefficient in dependence upon said ratio.
    Type: Grant
    Filed: October 19, 2007
    Date of Patent: October 1, 2013
    Assignee: Psytechnics Limited
    Inventors: Paul Barrett, Ludovic Maifait
  • Patent number: 8548807
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
    Type: Grant
    Filed: June 9, 2009
    Date of Patent: October 1, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
  • Publication number: 20130253930
    Abstract: Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.
    Type: Application
    Filed: March 23, 2012
    Publication date: September 26, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Michael Lewis Seltzer, Alejandro Acero
  • Patent number: 8543400
    Abstract: Voice processing methods and systems are provided. An utterance is received. The utterance is compared with teaching materials according to at least one matching algorithm to obtain a plurality of matching values corresponding to a plurality of voice units of the utterance. Respective voice units are scored in at least one first scoring item according to the matching values and a personified voice scoring algorithm. The personified voice scoring algorithm is generated according to training utterances corresponding to at least one training sentence in a phonetic-balanced sentence set of a plurality of learners and at least one real teacher, and scores corresponding to the respective voice units of the training utterances of the learners in the first scoring item provided by the real teacher.
    Type: Grant
    Filed: June 6, 2008
    Date of Patent: September 24, 2013
    Assignee: National Taiwan University
    Inventors: Lin-Shan Lee, Che-Kuang Lin, Chia-Lin Chang, Yi-Jing Lin, Yow-Bang Wang, Yun-Huan Lee, Li-Wei Cheng
  • Patent number: 8538755
    Abstract: An automated emotional recognition system is adapted to determine emotional states of a speaker based on the analysis of a speech signal. The emotional recognition system includes at least one server function and at least one client function in communication with the at least one server function for receiving assistance in determining the emotional states of the speaker. The at least one client function includes an emotional features calculator adapted to receive the speech signal and to extract therefrom a set of speech features indicative of the emotional state of the speaker. The emotional state recognition system further includes at least one emotional state decider adapted to determine the emotional state of the speaker exploiting the set of speech features based on a decision model. The server function includes at least a decision model trainer adapted to update the selected decision model according to the speech signal.
    Type: Grant
    Filed: January 31, 2007
    Date of Patent: September 17, 2013
    Assignee: Telecom Italia S.p.A.
    Inventors: Gianmario Bollano, Donato Ettorre, Antonio Esiliato
  • Patent number: 8538751
    Abstract: A speech recognition system and a speech recognizing method for high-accuracy speech recognition in the environment with ego noise are provided. A speech recognition system according to the present invention includes a sound source separating and speech enhancing section; an ego noise predicting section; and a missing feature mask generating section for generating missing feature masks using outputs of the sound source separating and speech enhancing section and the ego noise predicting section; an acoustic feature extracting section for extracting an acoustic feature of each sound source using an output for said each sound source of the sound source separating and speech enhancing section; and a speech recognizing section for performing speech recognition using outputs of the acoustic feature extracting section and the missing feature masks.
    Type: Grant
    Filed: June 10, 2011
    Date of Patent: September 17, 2013
    Assignee: Honda Motor Co., Ltd.
    Inventors: Kazuhiro Nakadai, Gokhan Ince
  • Publication number: 20130231932
    Abstract: Implementations include systems, methods and/or devices operable to detect voice activity in an audible signal by detecting glottal pulses. The dominant frequency of a series of glottal pulses is perceived as the intonation pattern or melody of natural speech, which is also referred to as the pitch. However, as noted above, spoken communication typically occurs in the presence of noise and/or other interference. In turn, the undulation of voiced speech is masked in some portions of the frequency spectrum associated with human speech by the noise and/or other interference. In some implementations, detection of voice activity is facilitated by dividing the frequency spectrum associated with human speech into multiple sub-bands in order to identify glottal pulses that dominate the noise and/or other inference in particular sub-bands. Additionally and/or alternatively, in some implementations the analysis is furthered to provide a pitch estimate of the detected voice activity.
    Type: Application
    Filed: August 20, 2012
    Publication date: September 5, 2013
    Inventors: Pierre Zakarauskas, Alexander Escott, Clarence S.H. Chu, Shawn E. Stevenson
  • Publication number: 20130226578
    Abstract: Aspects of an asynchronous video interview system and related techniques include a server that receives a plurality of pre-recorded video prompts, generates an interview script, transmits a video prompt from the interview script to be displayed at a client computing device, and receives a streamed video response from the client computing device. The server can perform algorithmic analysis on content of the video response. In another aspect, a server obtains response preference data indicating a timing parameter for a response. In another aspect, a video prompt and an information supplement (e.g., a news item) that relates to the content of the video prompt are transmitted. In another aspect, a server automatically selects a video prompt (e.g., a follow-up question) to be displayed at the client computing device (e.g., based on a response or information about an interviewee).
    Type: Application
    Filed: February 22, 2013
    Publication date: August 29, 2013
    Applicant: COLLEGENET, INC.
    Inventor: CollegeNET, Inc.
  • Patent number: 8521527
    Abstract: A computer-implemented system and method for processing audio in a voice response environment is provided. A database of host scripts each comprising signature files of audio phrases and actions to take when one of the audio phrases is recognized is maintained. The host scripts are loaded and a call to a voice mail server is initiated. Incoming audio buffers are received during the call from voice messages stored on the voice mail server. The incoming audio buffers are processed. A signature data structure is created for each audio buffer. The signature data structure is compared with signatures of expected phrases in the host scripts. The actions stored in the host scripts are executed when the signature data structure matches the signature of the expected phrase.
    Type: Grant
    Filed: September 10, 2012
    Date of Patent: August 27, 2013
    Assignee: Intellisist, Inc.
    Inventor: Martin R. M. Dunsmuir
  • Patent number: 8521523
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting training data. In one aspect, a method comprises: selecting a target out of vocabulary rate; selecting a target percentage of user sessions; and determining a minimum training data freshness for a vocabulary of words, the minimum training data freshness corresponding to the target percentage of user sessions experiencing the target out of vocabulary rate.
    Type: Grant
    Filed: August 24, 2012
    Date of Patent: August 27, 2013
    Assignee: Google Inc.
    Inventors: Maryam Garrett, Ciprian I. Chelba
  • Patent number: 8521526
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing spoken query terms. In one aspect, a method includes performing speech recognition on an audio signal to select two or more textual, candidate transcriptions that match a spoken query term, and to establish a speech recognition confidence value for each candidate transcription, obtaining a search history for a user who spoke the spoken query term, where the search history references one or more past search queries that have been submitted by the user, generating one or more n-grams from each candidate transcription, where each n-gram is a subsequence of n phonemes, syllables, letters, characters, words or terms from a respective candidate transcription, and determining, for each n-gram, a frequency with which the n-gram occurs in the past search queries, and a weighting value that is based on the respective frequency.
    Type: Grant
    Filed: July 28, 2010
    Date of Patent: August 27, 2013
    Assignee: Google Inc.
    Inventors: Matthew I. Lloyd, Johan Schalkwyk, Pankaj Risbood
  • Patent number: 8515746
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting training data. In an aspect, a method comprises: selecting a target out of vocabulary rate; selecting a target percentage of user sessions; and determining a minimum training data collection duration for a vocabulary of words, the minimum training data collection duration corresponding to the target percentage of user sessions experiencing the target out of vocabulary rate.
    Type: Grant
    Filed: August 24, 2012
    Date of Patent: August 20, 2013
    Assignee: Google Inc.
    Inventors: Maryam Garrett, Ciprian I. Chelba
  • Patent number: 8515745
    Abstract: Methods, systems, and apparatus for selecting training data. In an aspect, a method comprises: obtaining search session data comprising search sessions that include search queries, wherein each search query comprises words; determining a threshold out of vocabulary rate indicating a rate at which a word in a search query is not included in a vocabulary; determining a threshold session out of vocabulary rate, the session out of vocabulary rate indicating a rate at which search sessions have an out of vocabulary rate that meets the threshold out of vocabulary rate; selecting a vocabulary of words that, for a set of test data, has a session out of vocabulary rate that meets the threshold session out of vocabulary rate, the vocabulary of words being selected from the one or more words included in each of the search queries included in the search sessions.
    Type: Grant
    Filed: August 24, 2012
    Date of Patent: August 20, 2013
    Assignee: Google Inc.
    Inventors: Maryam Garrett, Ciprian I. Chelba
  • Patent number: 8515752
    Abstract: A system provides search results from a voice search query. The system receives a voice search query from a user, derives one or more recognition hypotheses, each being associated with a weight, from the voice search query, and constructs a weighted boolean query using the recognition hypotheses. The system then provides the weighted boolean query to a search system and provides the results of the search system to a user.
    Type: Grant
    Filed: March 12, 2008
    Date of Patent: August 20, 2013
    Assignee: Google Inc.
    Inventors: Alexander Mark Franz, Monika H. Henzinger, Sergey Brin, Brian Christopher Milch
  • Patent number: 8510105
    Abstract: For an enhanced sequential compression of data vectors in a respective compression pass, a current data vector is mapped to at least one current code vector of at least one codebook in at least one quantization stage. The at least one codebook is reordered taking account of at least one intermediate result from the current compression pass and at least one intermediate result from a preceding compression pass. At least one codebook index that is associated in the at least one reordered codebook to the at least one current code vector is then provided for further use. For a decompression of compressed data vectors represented by such codebook indices, at least one codebook index is mapped to at least one code vector of at least one equally reordered codebook.
    Type: Grant
    Filed: October 21, 2005
    Date of Patent: August 13, 2013
    Assignee: Nokia Corporation
    Inventor: Jani K. Nurminen
  • Patent number: 8498864
    Abstract: Methods and systems for predicting a text are described. In an example, a computing device may be configured to receive one or more typed characters that compose a portion of a text; and receive, a voice input corresponding to a spoken utterance of at least a portion of the text. The computing device may be configured to determine, based on the one or more typed characters and the voice input, one or more candidate texts predicting the text. Further, the computing device may be configured to provide the one or more candidate texts.
    Type: Grant
    Filed: September 27, 2012
    Date of Patent: July 30, 2013
    Assignee: Google Inc.
    Inventors: Yu Liang, Xiaotao Duan
  • Patent number: 8494847
    Abstract: A weighting factor learning system includes an audio recognition section that recognizes learning audio data and outputting the recognition result; a weighting factor updating section that updates a weighting factor applied to a score obtained from an acoustic model and a language model so that the difference between a correct-answer score calculated with the use of a correct-answer text of the learning audio data and a score of the recognition result becomes large; a convergence determination section that determines, with the use of the score after updating, whether to return to the weighting factor updating section to update the weighting factor again; and a weighting factor convergence determination section that determines, with the use of the score after updating, whether to return to the audio recognition section to perform the process again and update the weighting factor using the weighting factor updating section.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: July 23, 2013
    Assignee: NEC Corporation
    Inventors: Tadashi Emori, Yoshifumi Onishi
  • Patent number: 8489405
    Abstract: The embodiments of the present invention relate to a compression coding and decoding method, a coder, a decoder and a coding device. The compression coding method includes: extracting sign information of an input signal to obtain an absolute value signal of the input signal; obtaining a residual signal of the absolute value signal by using a prediction coefficient, where the prediction coefficient is obtained by prediction and analysis that are performed according to a signal characteristic of the absolute value signal of the input signal; and multiplexing the residual signal, the sign information and a coding parameter to output a coding code stream, after the residual signal, the sign information and the coding parameter are respectively coded, so as to improve compression efficiency of a voice and audio signal.
    Type: Grant
    Filed: December 1, 2011
    Date of Patent: July 16, 2013
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Fengyan Qi, Lei Miao, Qing Zhang
  • Publication number: 20130173266
    Abstract: A voice analyzer includes a first voice acquisition unit provided in a place where a distance of a sound wave propagation path from a mouth of a user is a first distance, plural second voice acquisition units provided in places where distances of sound wave propagation paths from the mouth of the user are smaller than the first distance, and an identification unit that identifies whether the voices acquired by the first and second voice acquisition units are voices of the user or voices of others excluding the user on the basis of a result of comparison between first sound pressure of a voice signal of the voice acquired by the first voice acquisition unit and second sound pressure calculated from sound pressure of a voice signal of the voice acquired by each of the plural second voice acquisition units.
    Type: Application
    Filed: May 7, 2012
    Publication date: July 4, 2013
    Applicant: FUJI XEROX CO., LTD.
    Inventors: Yohei NISHINO, Haruo HARADA, Kei SHIMOTANI, Hirohito YONEYAMA, Kiyoshi IIDA, Akira FUJII
  • Publication number: 20130166291
    Abstract: Mental state of a person is classified in an automated manner by analysing natural speech of the person. A glottal waveform is extracted from a natural speech signal. Pre-determined parameters defining at least one diagnostic class of a class model are retrieved, the parameters determined from selected training glottal waveform features. The selected glottal waveform features are extracted from the signal. Current mental state of the person is classified by comparing extracted glottal waveform features with the parameters and class model. Feature extraction from a glottal waveform or other natural speech signal may involve determining spectral amplitudes of the signal, setting spectral amplitudes below a pre-defined threshold to zero and, for each of a plurality of sub bands, determining an area under the thresholded spectral amplitudes, and deriving signal feature parameters from the determined areas in accordance with a diagnostic class model.
    Type: Application
    Filed: August 23, 2010
    Publication date: June 27, 2013
    Applicant: RMIT UNIVERSITY
    Inventors: Margaret Lech, Nicholas Brian Allen, Ian Shaw Burnett, Ling He
  • Patent number: 8457959
    Abstract: New language constantly emerges from complex, collaborative human-human interactions like meetings—such as when a presenter handwrites a new term on a whiteboard while saying it redundantly. The system and method described includes devices for receiving various types of human communication activities (e.g., speech, writing and gestures) presented in a multimodally redundant manner, includes processors and recognizers for segmenting or parsing, and then recognizing selected sub-word units such as phonemes and syllables, and then includes alignment, refinement, and integration modules to find or at least an approximate match to the one or more terms that were presented in the multimodally redundant manner. Once the system has performed a successful integration, one or more terms may be newly enrolled into a database of the system, which permits the system to continuously learn and provide an association for proper names, abbreviations, acronyms, symbols, and other forms of communicated language.
    Type: Grant
    Filed: February 29, 2008
    Date of Patent: June 4, 2013
    Inventor: Edward C. Kaiser
  • Publication number: 20130132082
    Abstract: Methods and systems for recognition of concurrent, superimposed, or otherwise overlapping signals are described. A Markov Selection Model is introduced that, together with probabilistic decomposition methods, enable recognition of simultaneously emitted signals from various sources. For example, a signal mixture may include overlapping speech from different persons. In some instances, recognition may be performed without the need to separate signals or sources. As such, some of the techniques described herein may be useful in automatic transcription, noise reduction, teaching, electronic games, audio search and retrieval, medical and scientific applications, etc.
    Type: Application
    Filed: February 21, 2011
    Publication date: May 23, 2013
    Inventor: Paris Smaragdis
  • Patent number: 8442827
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating an accent source. A system practicing the method collects data associated with customer specific services, generates country-specific or dialect-specific weights for each service in the customer specific services list, generates a summary weight based on an aggregation of the country-specific or dialect-specific weights, and sets an interactive voice response system language model based on the summary weight and the country-specific or dialect-specific weights. The interactive voice response system can also change the user interface based on the interactive voice response system language model. The interactive voice response system can tune a voice recognition algorithm based on the summary weight and the country-specific weights. The interactive voice response system can adjust phoneme matching in the language model based on a possibility that the speaker is using other languages.
    Type: Grant
    Filed: June 18, 2010
    Date of Patent: May 14, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Nicholas Duffield
  • Patent number: 8433582
    Abstract: A method (100) includes receiving (101) an input digital audio signal comprising a narrow-band signal. The input digital audio signal is processed (102) to generate a processed digital audio signal. A high-band energy level corresponding to the input digital audio signal is estimated (103) based on a transition-band of the processed digital audio signal within a predetermined upper frequency range of a narrow-band bandwidth. A high-band digital audio signal is generated (104) based on the high-band energy level and an estimated high-band spectrum corresponding to the high-band energy level.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: April 30, 2013
    Assignee: Motorola Mobility LLC
    Inventors: Tenkasi V. Ramabadran, Mark A. Jasiuk
  • Patent number: 8433566
    Abstract: Video material is dividing into temporal segments. Each segment is examined to determine whether the soundtrack of the segment contains speech sufficient for analysis and if so, metadata are generated based on analysis of the speech. If not, the segment is analysed by comparing frames thereof with those of stored segments that already have metadata assigned to them. One then assigns to the segment under consideration stored metadata associated with one or more stored segments that are similar.
    Type: Grant
    Filed: February 7, 2008
    Date of Patent: April 30, 2013
    Assignee: BRITISH TELECOMMUNICATIONS public limited company
    Inventors: Zhan Cui, Nader Azarmi, Gery M Ducatel
  • Patent number: 8433567
    Abstract: A method, system, and computer program product compensation of intra-speaker variability in speaker diarization are provided. The method includes: dividing a speech session into segments of duration less than an average duration between speaker change; parameterizing each segment by a time dependent probability density function supervector, for example, using a Gaussian Mixture Model; computing a difference between successive segment supervectors; and computing a scatter measure such as a covariance matrix of the difference as an estimate of intra-speaker variability. The method further includes compensating the speech session for intra-speaker variability using the estimate of intra-speaker variability.
    Type: Grant
    Filed: April 8, 2010
    Date of Patent: April 30, 2013
    Assignee: International Business Machines Corporation
    Inventor: Hagai Aronowitz
  • Patent number: 8433575
    Abstract: A system and method is described in which a multimedia story is rendered to a consumer in dependence on features extracted from an audio signal representing for example a musical selection of the consumer. Features such as key changes and tempo of the music selection are related to dramatic parameters defined by and associated with story arcs, narrative story rules and film or story structure. In one example a selection of a few music tracks provides input audio signals (602) from which musical features are extracted (604), following which a dramatic parameter list and timeline are generated (606). Media fragments are then obtained (608), the fragments having story content associated with the dramatic parameters, and the fragments output (610) with the music selection.
    Type: Grant
    Filed: December 10, 2003
    Date of Patent: April 30, 2013
    Assignee: AMBX UK Limited
    Inventors: David A. Eves, Richard S. Cole, Christopher Thorne
  • Publication number: 20130103402
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.
    Type: Application
    Filed: October 25, 2011
    Publication date: April 25, 2013
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Sumit CHOPRA, Dimitrios Dimitriadis, Patrick Haffner
  • Publication number: 20130090925
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for improving automatic speech recognition performance. A system practicing the method identifies idle speech recognition resources and establishes a supplemental speech recognizer on the idle resources based on overall speech recognition demand. The supplemental speech recognizer can differ from a main speech recognizer, and, along with the main speech recognizer, can be associated with a particular speaker. The system performs speech recognition on speech received from the particular speaker in parallel with the main speech recognizer and the supplemental speech recognizer and combines results from the main and supplemental speech recognizer. The system recognizes the received speech based on the combined results. The system can use beam adjustment in place of or in combination with a supplemental speech recognizer.
    Type: Application
    Filed: November 30, 2012
    Publication date: April 11, 2013
    Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventor: AT&T Intellectual Property I, L.P.
  • Patent number: 8417527
    Abstract: A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.
    Type: Grant
    Filed: October 13, 2011
    Date of Patent: April 9, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Nitendra Rajput, Ashish Verma
  • Patent number: 8412527
    Abstract: A method of detecting pre-determined phrases to determine compliance quality is provided. The method includes determining whether at least one of an event or a precursor event has occurred based on a comparison between pre-determined phrases and a communication between a sender and a recipient in a communications network, and rating the recipient based on the presence of the pre-determined phrases associated with the event or the presence of the pre-determined phrases associated with the precursor event in the communication.
    Type: Grant
    Filed: June 24, 2009
    Date of Patent: April 2, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: I. Dan Melamed, Yeon-Jun Kim, Andrej Ljolje, Bernard S. Renger, David J. Smith
  • Publication number: 20130080166
    Abstract: A system for biometrically securing business transactions uses speech recognition and voiceprint authentication to biometrically secure a transaction from a variety of client devices in a variety of media. A voiceprint authentication server receives a request from a third party requestor to authenticate a previously enrolled end user of a client device. A signature collection applet presents the user a randomly generated signature string, prompting the user to speak the string, and recording the user's as he speaks. After transmittal to the authentication server, the signature string is recognized using voice recognition software, and compared with a stored voiceprint, using voiceprint authentication software. An authentication result is reported to both user and requestor. Voiceprints are stored in a repository along with the associated user data. Enrollment is by way of a separate enrollment applet, wherein the end user provides user information and records a voiceprint, which is subsequently stored.
    Type: Application
    Filed: November 19, 2012
    Publication date: March 28, 2013
    Applicant: EMC Corporation
    Inventor: EMC Corporation
  • Publication number: 20130080165
    Abstract: Online histogram recognition may be provided. Upon receiving a spoken phrase from a user, a histogram/frequency distribution may be estimated on the spoken phrase according to a prior distribution. The histogram distribution may be equalized and then provided to a spoken language understanding application.
    Type: Application
    Filed: September 24, 2011
    Publication date: March 28, 2013
    Applicant: Microsoft Corporation
    Inventors: Shizen Wang, Yifan Gong
  • Patent number: 8406382
    Abstract: A method includes registering a voice of a party in order to provide voice verification for communications with an entity. A call is received from a party at a voice response system. The party is prompted for information and verbal communication spoken by the party is captured. A voice model associated with the party is created by processing the captured verbal communication spoken by the party and is stored. The identity of the party is verified and a previously stored voice model of the party, registered during a previous call from the party, is updated. The creation of the voice model is imperceptible to the party.
    Type: Grant
    Filed: November 9, 2011
    Date of Patent: March 26, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Mazin Gilbert
  • Patent number: 8406525
    Abstract: A method is disclosed for recognition of high-dimensional data in the presence of occlusion, including: receiving a target data that includes an occlusion and is of an unknown class, wherein the target data includes a known object; sampling a plurality of training data files comprising a plurality of distinct classes of the same object as that of the target data; and identifying the class of the target data through linear superposition of the sampled training data files using l1 minimization, wherein a linear superposition with a sparsest number of coefficients is used to identify the class of the target data.
    Type: Grant
    Filed: January 29, 2009
    Date of Patent: March 26, 2013
    Assignees: The Regents of the University of California, The Board of Trustees of the University of Illinois
    Inventors: Yi Ma, Allen Yang Yang, John Norbert Wright, Andrew William Wagner
  • Patent number: 8401850
    Abstract: Methods and systems for handling speech recognition processing in effectively real-time, via the Internet, in order that users do not experience noticeable delays from the start of an exercise until they receive responsive feedback. A user uses a client to access the Internet and a server supporting speech recognition processing, e.g., for language learning activities. The user inputs speech to the client, which transmits the user speech to the server in approximate real-time. The server evaluates the user speech in context of the current speech recognition exercise being executed. The server receives a first value and a first packet of encoded speech from a first client, a second value and a second packet of encoded speech from a second client, and services the first and second packets using first and second levels of processing based on the first and second values.
    Type: Grant
    Filed: January 10, 2011
    Date of Patent: March 19, 2013
    Assignee: GlobalEnglish Corporation
    Inventor: Christopher S. Jochumson
  • Patent number: 8396712
    Abstract: A method and system for generating a finite state grammar is provided. The method comprises receiving user input of at least two sample phrases; analyzing the sample phrases to determine common words that occur in each of the sample phrases and optional words that occur in only some of the sample phrases; creating a mathematical expression representing the sample phrases, the expression including each word found in the sample phrases and an indication of whether a word is a common word or an optional word; displaying the mathematical expression to a user; allowing the user to alter the mathematical expression; generating a finite state grammar corresponding to the altered mathematical expression; and displaying the finite state grammar to the user.
    Type: Grant
    Filed: August 26, 2004
    Date of Patent: March 12, 2013
    Assignee: West Corporation
    Inventor: Ashok Mitter Khosla
  • Patent number: 8390669
    Abstract: The present disclosure discloses a method for identifying individuals in a multimedia stream originating from a video conferencing terminal or a Multipoint Control Unit, including executing a face detection process on the multimedia stream; defining subsets including facial images of one or more individuals, where the subsets are ranked according to a probability that their respective one or more individuals will appear in a video stream; comparing a detected face to the subsets in consecutive order starting with a most probable subset, until a match is found; and storing an identity of the detected face as searchable metadata in a content database in response to the detected face matching a facial image in one of the subsets.
    Type: Grant
    Filed: December 15, 2009
    Date of Patent: March 5, 2013
    Assignee: Cisco Technology, Inc.
    Inventors: Jason Catchpole, Craig Cockerton
  • Patent number: 8392185
    Abstract: The speech recognition system of the present invention includes: a sound source separating section which separates mixed speeches from multiple sound sources; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each separated speech according to reliability of separation in separating operation of the sound source separating section; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.
    Type: Grant
    Filed: August 19, 2009
    Date of Patent: March 5, 2013
    Assignee: Honda Motor Co., Ltd.
    Inventors: Kazuhiro Nakadai, Toru Takahashi, Hiroshi Okuno
  • Patent number: 8392193
    Abstract: A method for performing speech recognition includes receiving a voice input and generating at least one possible result corresponding to the voice input. The method may also include calculating a value for the speech recognition result and comparing the calculated value to a particular portion of the speech recognition result. The method may further include retrieving information based on one or more factors associated with the voice input and using the retrieved information to determine a likelihood that the speech recognition result is correct.
    Type: Grant
    Filed: June 1, 2004
    Date of Patent: March 5, 2013
    Assignee: Verizon Business Global LLC
    Inventors: Paul T. Schultz, Robert A. Sartini
  • Patent number: 8392187
    Abstract: Methods, speech recognition systems, and computer readable media are provided that recognize speech using dynamic pruning techniques. A search network is expanded based on a frame from a speech signal, a best hypothesis is determined in the search network, a default beam threshold is modified, and the search network is pruned using the modified beam threshold. The search network may be further pruned based on the search depth of the best hypothesis and/or the average number of frames per state for a search path.
    Type: Grant
    Filed: January 30, 2009
    Date of Patent: March 5, 2013
    Assignee: Texas Instruments Incorporated
    Inventor: Qifeng Zhu
  • Patent number: 8392189
    Abstract: A speech recognition module includes an acoustic front-end module, a sound detection module, and a word detection module. The acoustic front-end module generates a plurality of representations of frames from a digital audio signal and generates speech characteristic probabilities for the plurality of frames. The sound detection module determines a plurality of estimated utterances from the plurality of representations and the speech characteristic probabilities. The word detection module determines one or more words based on the plurality of estimated utterances and the speech characteristic probabilities.
    Type: Grant
    Filed: November 30, 2009
    Date of Patent: March 5, 2013
    Assignee: Broadcom Corporation
    Inventor: Nambirajan Seshadri
  • Patent number: 8380501
    Abstract: A system, method, and computer-readable medium for parcel address recognition. A method includes receiving an address input and producing candidate address results corresponding to the address input. The method includes receiving operational scheme knowledge describing the mode of operation of a parcel processing system, and receiving at least one operational rule corresponding to the operational scheme knowledge. The method includes applying the at least one operational rule to the candidate address results and producing and storing a finalized result according to the operational rule and the candidate address results.
    Type: Grant
    Filed: July 30, 2010
    Date of Patent: February 19, 2013
    Assignee: Siemens Industry, Inc.
    Inventor: Stanley W. Sipe
  • Patent number: 8380502
    Abstract: A system receives a voice search query from a user, derives recognition hypotheses from the voice search query, and determines scores associated with the recognition hypotheses, the scores being based on a comparison of the recognition hypotheses to previously received search queries. The system discards at least one of the recognition hypotheses that is associated with a first score that is less than a threshold value, and constructs a first query using at least one non-discarded recognition hypothesis, where the at least one first non-discarded recognition hypothesis is associated with a second score that at least meets the threshold value. The system forwards the first query to a search system, receives first results associated with the first query, and provides the first results to the user.
    Type: Grant
    Filed: October 14, 2011
    Date of Patent: February 19, 2013
    Assignee: Google Inc.
    Inventors: Alexander Mark Franz, Monika H. Henzinger, Sergey Brin, Brian Christopher Milch
  • Patent number: 8370132
    Abstract: Apparatus and methods are provided for measuring perceptual quality of a signal transmitted over a communication network, such as a circuit-switching network, packet-switching network, or a combination thereof. In accordance with one embodiment, a distributed apparatus is provided for measuring perceptual quality of a signal transmitted over a communication network. The distributed apparatus includes communication ports located at various locations in the network. The distributed apparatus may also include a signal processor including a processor for providing non-intrusive measurement of the perceptual quality of the signal. The distributed apparatus may further include recorders operatively connected to the communication ports and to the signal processor, wherein at least one of the recorders processes the signal at one of the communication ports and the recorder sends the signal to the signal processor to measure the perceptual quality of the signal.
    Type: Grant
    Filed: November 21, 2005
    Date of Patent: February 5, 2013
    Assignee: Verizon Services Corp.
    Inventor: Adrian E. Conway
  • Publication number: 20130030808
    Abstract: Systems and methods are provided for scoring non-native speech. Two or more speech samples are received, where each of the samples are of speech spoken by a non-native speaker, and where each of the samples are spoken in response to distinct prompts. The two or more samples are concatenated to generate a concatenated response for the non-native speaker, where the concatenated response is based on the two or more speech samples that were elicited using the distinct prompts. A concatenated speech proficiency metric is computed based on the concatenated response, and the concatenated speech proficiency metric is provided to a scoring model, where the scoring model generates a speaking score based on the concatenated speech metric.
    Type: Application
    Filed: July 24, 2012
    Publication date: January 31, 2013
    Inventors: Klaus Zechner, Su-Youn Yoon, Lei Chen, Shasha Xie, Xiaoming Xi, Chaitanya Ramineni