Specialized Equations Or Comparisons Patents (Class 704/236)
  • Patent number: 7925505
    Abstract: Architecture is disclosed herewith for minimizing an empirical error rate by discriminative adaptation of a statistical language model in a dictation and/or dialog application. The architecture allows assignment of an improved weighting value to each term or phrase to reduce empirical error. Empirical errors are minimized whether a user provides correction results or not based on criteria for discriminatively adapting the user language model (LM)/context-free grammar (CFG) to the target. Moreover, algorithms are provided for the training and adaptation processes of LM/CFG parameters for criteria optimization.
    Type: Grant
    Filed: April 10, 2007
    Date of Patent: April 12, 2011
    Assignee: Microsoft Corporation
    Inventor: Jian Wu
  • Publication number: 20110082695
    Abstract: An electronic device includes a call analysis module that is configured to analyze characteristics of a phone call and to generate an indicium that represents a prevailing mood associated with the phone call based on the analyzed characteristics.
    Type: Application
    Filed: October 2, 2009
    Publication date: April 7, 2011
    Inventor: Emil Morgan Billing Bengt
  • Patent number: 7917361
    Abstract: A method for training a spoken language identification system to identify an unknown language as one of a plurality of known candidate languages includes the process of creating a sound inventory comprising a plurality of sound tokens, the collective plurality of sound tokens provided from a subset of the known candidate languages. The method further includes providing a plurality of training samples, each training sample composed within one of the known candidate languages. Further included is the process of generating one or more training vectors from each training database, wherein each training vector is defined as a function of said plurality of sound tokens provided from said subset of the known candidate languages. The method further includes associating each training vector with the candidate language of the corresponding training sample.
    Type: Grant
    Filed: September 19, 2005
    Date of Patent: March 29, 2011
    Assignee: Agency for Science, Technology and Research
    Inventors: Haizhou Li, Bin Ma, George M. White
  • Patent number: 7912713
    Abstract: An automatic speech recognition method for identifying words from an input speech signal includes providing at least one hypothesis recognition based on the input speech signal, the hypothesis recognition being an individual hypothesis word or a sequence of individual hypothesis words, and computing a confidence measure for the hypothesis recognition, based on the input speech signal, wherein computing a confidence measure includes computing differential contributions to the confidence measure, each as a difference between a constrained acoustic score and an unconstrained acoustic score, weighting each differential contribution by applying thereto a cumulative distribution function of the differential contribution, so as to make the distributions of the confidence measures homogeneous in terms of rejection capability, as the language, vocabulary and grammar vary, and computing the confidence measure by averaging the weighted differential contributions.
    Type: Grant
    Filed: December 28, 2004
    Date of Patent: March 22, 2011
    Assignee: Loquendo S.p.A.
    Inventors: Claudio Vair, Daniele Colibro
  • Publication number: 20110064302
    Abstract: A method is disclosed for recognition of high-dimensional data in the presence of occlusion, including: receiving a target data that includes an occlusion and is of an unknown class, wherein the target data includes a known object; sampling a plurality of training data files comprising a plurality of distinct classes of the same object as that of the target data; and identifying the class of the target data through linear superposition of the sampled training data files using l1 minimization, wherein a linear superposition with a sparsest number of coefficients is used to identify the class of the target data.
    Type: Application
    Filed: January 29, 2009
    Publication date: March 17, 2011
    Inventors: Yi Ma, Allen Yang Yang, John Norbert Wright, Andrew William Wagner
  • Publication number: 20110066433
    Abstract: Disclosed herein are methods, systems, and computer-readable storage media for automatic speech recognition. The method includes selecting a speaker independent model, and selecting a quantity of speaker dependent models, the quantity of speaker dependent models being based on available computing resources, the selected models including the speaker independent model and the quantity of speaker dependent models. The method also includes recognizing an utterance using each of the selected models in parallel, and selecting a dominant speech model from the selected models based on recognition accuracy using the group of selected models. The system includes a processor and modules configured to control the processor to perform the method. The computer-readable storage medium includes instructions for causing a computing device to perform the steps of the method.
    Type: Application
    Filed: September 16, 2009
    Publication date: March 17, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Andrej LJOLJE, Diamantino Antonio CASEIRO, Alistair D. CONKIE
  • Patent number: 7904294
    Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a mobile device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.
    Type: Grant
    Filed: April 9, 2007
    Date of Patent: March 8, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Richard C. Rose, Sarangarajan Pathasarathy, Aaron Edward Rosenberg, Shrikanth Sambasivan Narayanan
  • Publication number: 20110046951
    Abstract: A system and a method to generate statistical utterance classifiers optimized for the individual states of a spoken dialog system is disclosed. The system and method make use of large databases of transcribed and annotated utterances from calls collected in a dialog system in production and log data reporting the association between the state of the system at the moment when the utterances were recorded and the utterance. From the system state, being a vector of multiple system variables, subsets of these variables, certain variable ranges, quantized variable values, etc. can be extracted to produce a multitude of distinct utterance subsets matching every possible system state. For each of these subset and variable combinations, statistical classifiers can be trained, tuned, and tested, and the classifiers can be stored together with the performance results and the state subset and variable combination.
    Type: Application
    Filed: August 21, 2009
    Publication date: February 24, 2011
    Inventors: David Suendermann, Jackson Liscombe, Krishna Dayanidhi, Roberto Pieraccini
  • Patent number: 7894849
    Abstract: Methods, systems, and apparatus, including computer program products, for generating feedback. In one aspect, a method includes receiving sensor data from a plurality of sensors, wherein at least one of the plurality of sensors is associated with a mobile device of a user; aggregating the received sensor data to generate aggregated sensor data; processing the aggregated sensor data to determine an aggregated metric; comparing the aggregated metric to a target associated with the user to determine a measure of performance; and generating feedback based on the determined measure of performance. Further, the mobile device can comprise a mobile personal services device that includes one or more of an audio sensor, a video sensor, an environmental sensor, a biometric sensor, a location sensor, an activity detector, and a health monitor. The feedback can be displayed on the mobile personal services device. The feedback also can be displayed in near real-time.
    Type: Grant
    Filed: July 9, 2007
    Date of Patent: February 22, 2011
    Assignee: Accenture Global Services Limited
    Inventors: Alex M. Kass, Lucian P. Hughes, Owen E. Richter, Dana Le, Daniel Farina
  • Patent number: 7865364
    Abstract: A method for improving speech recognition accuracy includes utilizing skiplists or lists of values that cannot occur because of improbability or impossibility. A table or list is stored in a dialog manager module. The table includes a plurality of information items and a corresponding list of improbable values for each of the plurality of information items. A plurality of recognized ordered interpretations is received from an automatic speech recognition (ASR) engine. Each of the plurality of recognized ordered interpretations each includes a number of information items. A value of one or more of the received information items for a first recognized ordered interpretation is compared to a table to determine if the value of the one of the received information items matches any of the list of improbable values for the corresponding information item.
    Type: Grant
    Filed: May 5, 2006
    Date of Patent: January 4, 2011
    Assignee: Nuance Communications, Inc.
    Inventor: Marc Helbing
  • Publication number: 20100332228
    Abstract: According to some embodiments, a method and apparatus are provided to buffer N audio frames of a plurality of audio frames associated with an audio signal, pre-compute scores for a subset of context dependent models (CDMs), and perform a graphical model search associated with the N audio frames where a score of a context independent model (CIM) associated with a CDM is used in lieu of a score for the CDM when a score for the CDM is needed and has not been pre-computed.
    Type: Application
    Filed: June 25, 2009
    Publication date: December 30, 2010
    Inventors: Michael Eugene Deisher, Tao Ma
  • Publication number: 20100332227
    Abstract: A method of detecting pre-determined phrases to determine compliance quality is provided. The method includes determining whether at least one of an event or a precursor event has occurred based on a comparison between pre-determined phrases and a communication between a sender and a recipient in a communications network, and rating the recipient based on the presence of the pre-determined phrases associated with the event or the presence of the pre-determined phrases associated with the precursor event in the communication.
    Type: Application
    Filed: June 24, 2009
    Publication date: December 30, 2010
    Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventors: I. Dan MELAMED, Yeon-Jun KIM, Andrej LJOLJE, Bernard S. RENGER, David J. SMITH
  • Patent number: 7860713
    Abstract: Systems and methods for annotating speech data. The present invention reduces the time required to annotate speech data by selecting utterances for annotation that will be of greatest benefit. A selection module uses speech models, including speech recognition models and spoken language understanding models, to identify utterances that should be annotated based on criteria such as confidence scores generated by the models. These utterances are placed in an annotation list along with a type of annotation to be performed for the utterances and an order in which the annotation should proceed. The utterances in the annotation list can be annotated for speech recognition purposes, spoken language understanding purposes, labeling purposes, etc. The selection module can also select utterances for annotation based on previously annotated speech data and deficiencies in the various models.
    Type: Grant
    Filed: July 1, 2008
    Date of Patent: December 28, 2010
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Tirso M. Alonso, Ilana Bromberg, Dilek Z. Hakkani-Tur, Barbara B. Hollister, Mazin G. Rahim, Giuseppe Riccardi, Lawrence Lyon Rose, Daniel Leon Stern, Gokhan Tur, James M. Wilson
  • Patent number: 7853448
    Abstract: An electronic instrument includes: a display control unit for displaying a control content corresponding to the command information based on the result of the speech recognition; an instruction unit for instructing that a control for the control content displayed by the display control unit, is cancelled; a control unit for performing the control based on the command information based on the result of the speech recognition after a predetermined standby time elapses since the control content corresponding to the command information based on the result of the speech recognition starts to be displayed by the display control unit when the instruction unit does not instruct that the control for the control content is cancelled within the predetermined standby time, and for canceling the control based on the command information based on the result of the speech recognition when the instruction unit instructs that the control is cancelled within the predetermined standby time.
    Type: Grant
    Filed: April 16, 2007
    Date of Patent: December 14, 2010
    Assignee: Funai Electric Co., Ltd.
    Inventors: Shusuke Narita, Susumu Tokoshima
  • Patent number: 7831422
    Abstract: Methods and systems for handling speech recognition processing in effectively real-time, via the Internet, in order that users do not experience noticeable delays from the start of an exercise until they receive responsive feedback. A user uses a client to access the Internet and a server supporting speech recognition processing, e.g., for language learning activities. The user inputs speech to the client, which transmits the user speech to the server in approximate real-time. The server evaluates the user speech in context of the current speech recognition exercise being executed, and provides responsive feedback to the client, again, in approximate real-time, with minimum latency delays. The client upon receiving responsive feedback from the server, displays, or otherwise provides, the feedback to the user.
    Type: Grant
    Filed: October 26, 2007
    Date of Patent: November 9, 2010
    Assignee: GlobalEnglish Corporation
    Inventor: Christopher S. Jochumson
  • Publication number: 20100280827
    Abstract: Embodiments for implementing a speech recognition system that includes a speech classifier ensemble are disclosed. In accordance with one embodiment, the speech recognition system includes a classifier ensemble to convert feature vectors that represent a speech vector into log probability sets. The classifier ensemble includes a plurality of classifiers. The speech recognition system includes a decoder ensemble to transform the log probability sets into output symbol sequences. The speech recognition system further includes a query component to retrieve one or more speech utterances from a speech database using the output symbol sequences.
    Type: Application
    Filed: April 30, 2009
    Publication date: November 4, 2010
    Applicant: Microsoft Corporation
    Inventors: Kunal Mukerjee, Kazuhito Koishida, Shankar Regunathan
  • Publication number: 20100268535
    Abstract: A problem to be solved is to robustly detect a pronunciation variation example and acquire a pronunciation variation rule having a high generalization property, with less effort. The problem can be solved by a pronunciation variation rule extraction apparatus including a speech data storage unit, a base form pronunciation storage unit, a sub word language model generation unit, a speech recognition unit, and a difference extraction unit. The speech data storage unit stores speech data. The base form pronunciation storage unit stores base form pronunciation data representing base form pronunciation of the speech data. The sub word language model generation unit generates a sub word language model from the base form pronunciation data. The speech recognition unit recognizes the speech data by using the sub word language model.
    Type: Application
    Filed: November 27, 2008
    Publication date: October 21, 2010
    Inventor: Takafumi Koshinaka
  • Patent number: 7818172
    Abstract: The method of recognizing speech in an acoustic signal comprises developing acoustic stochastic models of voice units in the form of a set of states of an acoustic signal and using the acoustic models for recognition by a comparison of the signal with predetermined acoustic models obtained via a prior learning process. While developing the acoustic models, the voice units are modeled by means of a first portion of the states independent of adjacent voice units and by means of a second portion of the states dependent on adjacent voice units. The second portion of states dependent on adjacent voice units shares common parameters with a plurality of units sharing same phonemes.
    Type: Grant
    Filed: April 20, 2004
    Date of Patent: October 19, 2010
    Assignee: France Telecom
    Inventors: Ronaldo Messina, Denis Jouvet
  • Patent number: 7813928
    Abstract: A speech recognition device presenting whether a user's utterance is an unregistered word and whether the utterance should be repeated. The device includes a vocabulary storage unit (102) defining a vocabulary for speech recognition, and a speech recognition unit (101) checking the uttered speech against registered words. The device also includes a similarity calculation unit (103) calculating a similarity between the uttered speech and acoustic units, a judgment unit (104) judging, based on the check by the speech recognition unit (101) and the calculation performed by the similarity calculation unit (103), whether the uttered speech is a registered or unregistered word, an unregistered word unit (106) storing unregistered words, an unregistered word candidate search unit (105) searching the unregistered word unit (106) for unregistered word candidates, the, when the judgment unit (104) judges that the uttered speech is an unregistered word, and a display unit (107) displaying the result.
    Type: Grant
    Filed: June 2, 2005
    Date of Patent: October 12, 2010
    Assignee: Panasonic Corporation
    Inventors: Yoshiyuki Okimoto, Tsuyoshi Inoue, Takashi Tsuzuki
  • Patent number: 7813925
    Abstract: When adjacent times or the small change of an observation signal is determined, a distribution which maximizes the output probability of a mixture distribution does not change at a high possibility. By using this fact, when obtaining the output probability of the mixture distribution HMM, a distribution serving as a maximum output probability is stored. When adjacent times or the small change of the observation signal is determined, the output probability of the stored distribution serves as the output probability of the mixture distribution. This can reduce the output probability calculation of other distributions when calculating the output probability of the mixture distribution, thereby reducing the calculation amount required for output probabilities.
    Type: Grant
    Filed: April 6, 2006
    Date of Patent: October 12, 2010
    Assignee: Canon Kabushiki Kaisha
    Inventors: Hiroki Yamamoto, Masayuki Yamada
  • Patent number: 7801280
    Abstract: Described are methods, systems, and devices that include obtaining a first measured perceptual quality by measuring, at a first location associated with a communications network, a perceptual quality of a first communication transmitted from the first location to a second location associated with the communications network, obtaining a second measured perceptual quality by measuring perceptual quality of the first communication at the second location; and, based on the first measured perceptual quality and the second measured perceptual quality, generating a first value representative of degradation in the quality of the first communication.
    Type: Grant
    Filed: December 15, 2004
    Date of Patent: September 21, 2010
    Assignee: Verizon Laboratories Inc.
    Inventor: Adrian Evans Conway
  • Patent number: 7797157
    Abstract: Channel normalization for automatic speech recognition is provided. Statistics are measured from an initial portion of a speech utterance. Feature normalization parameters are estimated based on the measured statistics and a statistically derived mapping relating measured statistics and feature normalization parameters. In some examples, the measured statistics comprise measures of an energy from the initial portion of the speech utterance. In some examples, measures of the energy comprise extreme values of the energy.
    Type: Grant
    Filed: January 10, 2005
    Date of Patent: September 14, 2010
    Assignee: Voice Signal Technologies, Inc.
    Inventors: Igor Zlokarnik, Laurence S. Gillick, Jordan Cohen
  • Publication number: 20100217592
    Abstract: The present invention provides a method for identifying a turn, such as a sentence or phrase, for addition to a platform dialog comprising a plurality of turns. Lexical features of each of a set of candidate turns relative to one or more turns in the platform dialog are determined. Semantic features associated with each candidate turn and associated with the platform dialog are determined to identify one or more topics associated with each candidate turn and with the platform dialog. Lexical features of each candidate turn are compared to lexical features of the platform dialog and semantic features associated with each candidate turn are compared to semantic features of the platform dialog to rank the candidate turns based on similarity of lexical features and semantic features of each candidate turn to lexical features and semantic features of the platform dialog.
    Type: Application
    Filed: October 14, 2009
    Publication date: August 26, 2010
    Applicant: HONDA MOTOR CO., LTD.
    Inventors: Rakesh Gupta, Lev-Arie Ratinov
  • Patent number: 7783488
    Abstract: Methods and systems are provided for remote tuning and debugging of an automatic speech recognition system. Trace files are generated on-site from input speech by efficient, lossless compression of MFCC data, which is merged with compressed pitch and voicing information and stored as trace files. The trace files are transferred to a remote site where human-intelligible speech is reconstructed and analyzed. Based on the analysis, parameters of the automatic speech recognition system are remotely adjusted.
    Type: Grant
    Filed: December 19, 2005
    Date of Patent: August 24, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Shay Ben-David, Baiju Dhirajlal Mandalia, Zohar Sivan, Alexander Sorin
  • Publication number: 20100204985
    Abstract: A warping factor estimation system comprises label information generation unit that outputs voice/non-voice label information, warp model storage unit in which a probability model representing voice and non-voice occurrence probabilities is stored, and warp estimation unit that calculates a warping factor in the frequency axis direction using the probability model representing voice and non-voice occurrence probabilities, voice and non-voice labels, and a cepstrum.
    Type: Application
    Filed: September 22, 2008
    Publication date: August 12, 2010
    Inventor: Tadashi Emori
  • Patent number: 7774202
    Abstract: A speech activated control system for controlling aerial vehicle components, program product, and associated methods are provided. The system can include a host processor adapted to develop speech recognition models and to provide speech command recognition. The host processor can be positioned in communication with a database for storing and retrieving speech recognition models. The system can include an avionic computer in communication with the host processor and adapted to provide command function management, a display and control processor in communication with the avionic computer adapted to provide a user interface between a user and the avionic computer, and a data interface positioned in communication with the avionic computer and the host processor provided to divorce speech command recognition functionality from vehicle or aircraft-related speech-command functionality.
    Type: Grant
    Filed: June 12, 2006
    Date of Patent: August 10, 2010
    Assignee: Lockheed Martin Corporation
    Inventors: Richard P. Spengler, Jon C. Russo, Gregory W. Barnett, Kermit L. Armbruster
  • Publication number: 20100198597
    Abstract: Methods, speech recognition systems, and computer readable media are provided that recognize speech using dynamic pruning techniques. A search network is expanded based on a frame from a speech signal, a best hypothesis is determined in the search network, a default beam threshold is modified, and the search network is pruned using the modified beam threshold. The search network may be further pruned based on the search depth of the best hypothesis and/or the average number of frames per state for a search path.
    Type: Application
    Filed: January 30, 2009
    Publication date: August 5, 2010
    Inventor: Qifeng ZHU
  • Patent number: 7761293
    Abstract: Systems and methods are disclosed to operate a mobile device. The system includes a message center; an engine coupled to the message center; and a mobile device wirelessly coupled to the message center, wherein the engine specifies one or more meeting locations and wherein at least one meeting location comprises a location designated by an advertiser.
    Type: Grant
    Filed: March 6, 2006
    Date of Patent: July 20, 2010
    Inventor: Bao Q. Tran
  • Patent number: 7752036
    Abstract: Recognizing a stream of speech received as speech vectors over a lossy communications link includes constructing for a speech recognizer a series of speech vectors from packets received over a lossy packetized transmission link, wherein some of the packets associated with each speech vector are lost or corrupted during transmission. Each constructed speech vector is multi-dimensional and includes associated features. After waiting for a predetermined time, speech vectors are generated and potentially corrupted features within the speech vector are indicated to the speech recognizer when present. Speech recognition is attempted at the speech recognizer on the speech vectors when corrupted features are present. This recognition may be based only on certain or valid features within each speech vector. Retransmission of a missing or corrupted packet is requested when corrupted values are indicated by the indicating step and when the attempted recognition step fails.
    Type: Grant
    Filed: December 29, 2008
    Date of Patent: July 6, 2010
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Richard Vandervoort Cox, Stephen Michael Marcus, Mazin G. Rahim, Nambirajan Seshadri, Robert Douglas Sharp
  • Patent number: 7752045
    Abstract: A method for comparing a first audio data source with a plurality of audio data sources, wherein the first audio data source has an utterance spoken by a first person and the plurality of audio data sources have the same utterance spoken by a second person. The method includes performing a speech recognition function on the first audio data source to isolate at least one element of the first audio data source. The method also includes comparing the isolated element with a corresponding element in the plurality of audio data sources and determining whether the utterance spoken by the first person contained an error based on the comparison.
    Type: Grant
    Filed: October 7, 2002
    Date of Patent: July 6, 2010
    Assignees: Carnegie Mellon University, Carnegie Speech Company, Inc.
    Inventor: Maxine Eskenazi
  • Publication number: 20100161328
    Abstract: Embodiments are provided for utilizing a client-side cache for utterance processing to facilitate network based speech recognition. An utterance comprising a query is received in a client computing device. The query is sent from the client to a network server for results processing. The utterance is processed to determine a speech profile. A cache lookup is performed based on the speech profile to determine whether results data for the query is stored in the cache. If the results data is stored in the cache, then a query is sent to cancel the results processing on the network server and the cached results data is displayed on the client computing device.
    Type: Application
    Filed: December 18, 2008
    Publication date: June 24, 2010
    Applicant: Microsoft Corporation
    Inventors: Andrew K. Krumel, Shuangyu Chang, Robert L. Chambers
  • Patent number: 7738635
    Abstract: A method for improving the recognition confidence of alphanumeric spoken input, suitable for use in a speech recognition telephony application such as a voice response system. An alphanumeric candidate is determined from the spoken input, which may be the best available representation of the spoken input. Recognition confidence is compared with a preestablished threshold. If the recognition confidence exceeds the threshold, the alphanumeric candidate is selected to represent the spoken input. Otherwise, present call data associated with the spoken input is determined. Call data may include automatic number identification (ANI) information, caller-ID information, and/or dialed number information service (DNIS) information. Information associated with the alphanumeric candidate and information associated with the present call data are correlated in order to select alphanumeric information that best represents the spoken input.
    Type: Grant
    Filed: January 6, 2005
    Date of Patent: June 15, 2010
    Assignee: International Business Machines Corporation Nuance Communications, Inc.
    Inventors: Christopher Ryan Groves, Kevin James Muterspaugh
  • Publication number: 20100145680
    Abstract: A speech recognition method using a domain ontology includes: constructing domain ontology DB; forming a speech recognition grammar using the formed domain ontology DB; extracting a feature vector from a speech signal; modeling the speech signal using an acoustic model. The method performs speech recognition by using the acoustic model, the speech recognition dictionary and the speech recognition grammar on the basis of the feature vector.
    Type: Application
    Filed: September 1, 2009
    Publication date: June 10, 2010
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Seung YUN, Soo Jong Lee, Jeong Se Kim, Il Bin Lee, Jun Park, Sang Kyu Park
  • Patent number: 7734472
    Abstract: The invention concerns a speech recognition enhancer (51) and a speech recognition system comprising such speech recognition enhancer (51), an audio input unit (41) and a speech recognizer (61, 3). The speech recognition enhancer (51) is arranged between the audio input unit (41) and the speech recognizer (61, 3). The speech recognition enhancer (51) has a parametrizable pre-filtering unit (511), a parametrizable dynamic voice level control unit (512), a parametrizable noise reduction unit (513) and a parametrizable voice level control unit (514). The parameters of these parametrizable units (511, 512, 513, 514) are adjusted to the characteristics of the specific audio input unit (41) and/or the characteristics of the specific speech recognizer (61, 3) for adapting the audio input unit (41) to the speech recognizer (61, 3).
    Type: Grant
    Filed: September 29, 2004
    Date of Patent: June 8, 2010
    Assignee: Alcatel
    Inventor: Michael Walker
  • Patent number: 7729910
    Abstract: The invention comprises a method for lossy data compression, akin to vector quantization, in which there is no explicit codebook and no search, i.e. the codebook memory and associated search computation are eliminated. Some memory and computation are still required, but these are dramatically reduced, compared to systems that do not exploit this method. For this reason, both the memory and computation requirements of the method are exponentially smaller than comparable methods that do not exploit the invention. Because there is no explicit codebook to be stored or searched, no such codebook need be generated either. This makes the method well suited to adaptive coding schemes, where the compression system adapts to the statistics of the data presented for processing: both the complexity of the algorithm executed for adaptation, and the amount of data transmitted to synchronize the sender and receiver, are exponentially smaller than comparable existing methods.
    Type: Grant
    Filed: June 25, 2004
    Date of Patent: June 1, 2010
    Assignee: Agiletv Corporation
    Inventor: Harry Printz
  • Patent number: 7729912
    Abstract: A system and method is provided for reducing latency for automatic speech recognition. In one embodiment, intermediate results produced by multiple search passes are used to update a display of transcribed text.
    Type: Grant
    Filed: December 23, 2003
    Date of Patent: June 1, 2010
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Michiel Adriaan Unico Bacchiani, Brian Scott Amento
  • Publication number: 20100131262
    Abstract: Embodiments of the invention relate to methods for generating a multilingual acoustic model. A main acoustic model comprising a main acoustic model having probability distribution functions and a probabilistic state sequence model including first states is provided to a processor. At least one second acoustic model including probability distribution functions and a probabilistic state sequence model including states is also provided to the processor. The processor replaces each of the probability distribution functions of the at least one second acoustic model by one of the probability distribution functions and/or each of the states of the probabilistic state sequence model of the at least one second acoustic model with the state of the probabilistic state sequence model of the main acoustic model based on a criteria set to obtain at least one modified second acoustic model. The criteria set may be a distance measurement.
    Type: Application
    Filed: November 25, 2009
    Publication date: May 27, 2010
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Rainer Gruhn, Martin Raab, Raymond Brueckner
  • Patent number: 7725317
    Abstract: An interactive control system is disclosed with which the recognition rate and the responsiveness when operating a plurality of interactive services in parallel can be improved. A recognition lexicon and consolidated and reorganized information are generated for each individual interaction. Thus, excessive growth of the recognition lexicon can be avoided and a lowering of the recognition rate can be prevented. Moreover, based on the consolidated and reorganized information, it is possible to specify interactive services that may respond to the same input information, so that responses that are unexpected for the user can be prevented.
    Type: Grant
    Filed: July 14, 2004
    Date of Patent: May 25, 2010
    Assignee: Fujitsu Limited
    Inventors: Eiji Kitagawa, Toshiyuki Fukuoka, Ryosuke Miyata
  • Patent number: 7725314
    Abstract: A method and apparatus identify a clean speech signal from a noisy speech signal. To do this, a clean speech value and a noise value are estimated from the noisy speech signal. The clean speech value and the noise value are then used to define a gain on a filter. The noisy speech signal is applied to the filter to produce the clean speech signal. Under some embodiments, the noise value and the clean speech value are used in both the numerator and the denominator of the filter gain, with the numerator being guaranteed to be positive.
    Type: Grant
    Filed: February 16, 2004
    Date of Patent: May 25, 2010
    Assignee: Microsoft Corporation
    Inventors: Jian Wu, James G. Droppo, Li Deng, Alejandro Acero
  • Patent number: 7716047
    Abstract: A system and method for an automatic set-up of speech recognition engines may include a speech recognizer configured to perform speech recognition procedures to identify input speech data according to one or more operating parameters. A merit manager may be utilized to automatically calculate merit values corresponding to the foregoing recognition procedures. These merit values may incorporate recognition accuracy information, recognition speed information, and a user-specified weighting factor that shifts the relative effect of the recognition accuracy information and the recognition speed information on the merit values. The merit manager may then automatically perform a merit value optimization procedure to select operating parameters that correspond to an optimal one of the merit values.
    Type: Grant
    Filed: March 31, 2003
    Date of Patent: May 11, 2010
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Gustavo Hernandez-Abrego, Xavier Menendez-Pidal, Thomas Kemp, Katsuki Minamino, Helmut Lucke
  • Patent number: 7707032
    Abstract: A method and system used to determine the similarity between an input speech data and a sample speech data is provided. First, the input speech data is segmented into a plurality of input speech frames and the sample speech data is segmented into a plurality of sample speech frames. Then, the input speech frames and the sample speech frames are used to build a matching matrix, wherein the matching matrix comprises the distance values between each of the input speech frames and each of the sample speech frames. Next, the distance values are used to calculate a matching score. Finally, the similarity between the input speech data and the sample speech data is determined according to this matching score.
    Type: Grant
    Filed: October 20, 2005
    Date of Patent: April 27, 2010
    Assignee: National Cheng Kung University
    Inventors: Jhing-Fa Wang, Po-Chuan Lin, Li-Chang Wen
  • Patent number: 7684987
    Abstract: A phone set for use in speech processing such as speech recognition or text-to-speech conversion is used to model or form syllables of a tonal language having a plurality of different tones. Each syllable includes an initial part that can be glide dependent and a final part. The final part includes a plurality of phones. Each phones carries partial tonal information such that the phones taken together implicitly and jointly represent the different tones.
    Type: Grant
    Filed: January 21, 2004
    Date of Patent: March 23, 2010
    Assignee: Microsoft Corporation
    Inventors: Min Chu, Chao Huang
  • Patent number: 7680659
    Abstract: A method of training language model parameters trains discriminative model parameters in the language model based on a performance measure having discrete values.
    Type: Grant
    Filed: June 1, 2005
    Date of Patent: March 16, 2010
    Assignee: Microsoft Corporation
    Inventors: Jianfeng Gao, Hisami Suzuki
  • Patent number: 7680658
    Abstract: A method and apparatus for enhancing the performance of speech recognition by adaptively changing a process of determining the final, recognized word depending on a user's selection in a list of alternative words represented by a result of speech recognition. A speech recognition method comprising: inputting speech uttered by a user; recognizing the input speech and creating a predetermined number of alternative words to be recognized in the order of similarity; and displaying a list of alternative words arranged in a predetermined order and determining an alternative word that a cursor currently indicates as the final, recognized word if a user's selection from the list of alternative words has not been changed within a predetermined standby time.
    Type: Grant
    Filed: December 31, 2003
    Date of Patent: March 16, 2010
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Seung-nyung Chung, Myung-hyun Yoo, Jay-woo Kim, Joon-ah Park
  • Publication number: 20100063816
    Abstract: A method for processing an analog speech signal for speech recognition. The analog speech signal is sampled to produced a sampled speech signal. The sampled speech signal is framed into multiple frames of the sampled speech signal. The absolute value of the sampled speech signal is integrated within the frames and respective integrated-absolute values of the frames are determined. Based on the integrated-absolute values, the sampled speech signal is cut into segments of non-uniform duration. The segments are not as yet identified as parts of speech prior to and during the cutting.
    Type: Application
    Filed: September 7, 2008
    Publication date: March 11, 2010
    Inventors: Ronen Faifkov, Rabin Cohen-Tov
  • Patent number: 7668716
    Abstract: A device improves speech recognition accuracy by utilizing an external knowledge source. The device receives a speech recognition result from an automatic speech recognition (ASR) engine, the speech recognition result including a plurality of ordered interpretations, wherein each of the ordered interpretations includes a plurality of information items. The device analyzes and filters the plurality of interpretations using an external knowledge source to create a filtered plurality of ordered interpretations. The device stores the filtered plurality of ordered interpretations to a memory. The device transmits the filtered plurality of ordered interpretations to a dialog manager module to create a textual output. Alternatively, the dialog manager module retrieves the filtered plurality of ordered interpretations from a memory.
    Type: Grant
    Filed: May 5, 2006
    Date of Patent: February 23, 2010
    Assignee: Dictaphone Corporation
    Inventors: Marc Helbing, Klaus Reifenrath
  • Patent number: 7657433
    Abstract: A speech recognition system uses multiple confidence thresholds to improve the quality of speech recognition results. The choice of which confidence threshold to use for a particular utterance may be based on one or more features relating to the utterance. In one particular implementation, the speech recognition system includes a speech recognition engine that provides speech recognition results and a confidence score for an input utterance. The system also includes a threshold selection component that determines, based on the received input utterance, a threshold value corresponding to the input utterance. The system further includes a threshold component that accepts the recognition results based on a comparison of the confidence score to the threshold value.
    Type: Grant
    Filed: September 8, 2006
    Date of Patent: February 2, 2010
    Assignee: TellMe Networks, Inc.
    Inventor: Shuangyu Chang
  • Publication number: 20100017207
    Abstract: A signal is used to form intermediate feature vectors which are subjected to high-pass filtering. The high-pass-filtered intermediate feature vectors have a respective prescribed addition feature vector added to them.
    Type: Application
    Filed: September 24, 2009
    Publication date: January 21, 2010
    Applicant: Infineon Technologies AG
    Inventors: Werner Hemmert, Marcus Holmberg
  • Patent number: 7647224
    Abstract: A speech recognition apparatus includes a sound information acquiring unit that acquires sound information, a unit segment dividing unit that divides the sound information into plural unit segments, a segment information acquiring unit that acquires segment information that indicates a feature of each unit segment, a segment relation value calculating unit that calculates a segment relation value that indicates a relative feature of a target segment which is a unit segment to be processed with respect to an adjacent segment which is a unit segment adjacent to the target segment, based on segment information of the target segment and segment information of the adjacent segment among the segment information, a recognition candidate storing unit that stores recognition candidates that are targets of speech recognition, and a recognition result selecting unit that selects a recognition result from the recognition candidates stored in the recognition candidate storing unit utilizing the segment relation value.
    Type: Grant
    Filed: November 23, 2005
    Date of Patent: January 12, 2010
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masahide Ariu, Shinichi Tanaka, Takashi Masuko
  • Publication number: 20090319268
    Abstract: A method and an apparatus for measuring the intelligibility level of an audio announcement device (40), employ at least one speech recognition module (418; 518) for analyzing the reconstructed verbal content of the audio message announced by the audio announcement device (40), optionally by comparison with the verbal content of an original audio message.
    Type: Application
    Filed: June 19, 2009
    Publication date: December 24, 2009
    Applicant: ARCHEAN TECHNOLOGIES
    Inventors: Xavier AUMONT, Antoine WILHELM-JAUREGUIBERRY