Specialized Equations Or Comparisons Patents (Class 704/236)
-
Patent number: 7925505Abstract: Architecture is disclosed herewith for minimizing an empirical error rate by discriminative adaptation of a statistical language model in a dictation and/or dialog application. The architecture allows assignment of an improved weighting value to each term or phrase to reduce empirical error. Empirical errors are minimized whether a user provides correction results or not based on criteria for discriminatively adapting the user language model (LM)/context-free grammar (CFG) to the target. Moreover, algorithms are provided for the training and adaptation processes of LM/CFG parameters for criteria optimization.Type: GrantFiled: April 10, 2007Date of Patent: April 12, 2011Assignee: Microsoft CorporationInventor: Jian Wu
-
Publication number: 20110082695Abstract: An electronic device includes a call analysis module that is configured to analyze characteristics of a phone call and to generate an indicium that represents a prevailing mood associated with the phone call based on the analyzed characteristics.Type: ApplicationFiled: October 2, 2009Publication date: April 7, 2011Inventor: Emil Morgan Billing Bengt
-
Patent number: 7917361Abstract: A method for training a spoken language identification system to identify an unknown language as one of a plurality of known candidate languages includes the process of creating a sound inventory comprising a plurality of sound tokens, the collective plurality of sound tokens provided from a subset of the known candidate languages. The method further includes providing a plurality of training samples, each training sample composed within one of the known candidate languages. Further included is the process of generating one or more training vectors from each training database, wherein each training vector is defined as a function of said plurality of sound tokens provided from said subset of the known candidate languages. The method further includes associating each training vector with the candidate language of the corresponding training sample.Type: GrantFiled: September 19, 2005Date of Patent: March 29, 2011Assignee: Agency for Science, Technology and ResearchInventors: Haizhou Li, Bin Ma, George M. White
-
Patent number: 7912713Abstract: An automatic speech recognition method for identifying words from an input speech signal includes providing at least one hypothesis recognition based on the input speech signal, the hypothesis recognition being an individual hypothesis word or a sequence of individual hypothesis words, and computing a confidence measure for the hypothesis recognition, based on the input speech signal, wherein computing a confidence measure includes computing differential contributions to the confidence measure, each as a difference between a constrained acoustic score and an unconstrained acoustic score, weighting each differential contribution by applying thereto a cumulative distribution function of the differential contribution, so as to make the distributions of the confidence measures homogeneous in terms of rejection capability, as the language, vocabulary and grammar vary, and computing the confidence measure by averaging the weighted differential contributions.Type: GrantFiled: December 28, 2004Date of Patent: March 22, 2011Assignee: Loquendo S.p.A.Inventors: Claudio Vair, Daniele Colibro
-
Publication number: 20110064302Abstract: A method is disclosed for recognition of high-dimensional data in the presence of occlusion, including: receiving a target data that includes an occlusion and is of an unknown class, wherein the target data includes a known object; sampling a plurality of training data files comprising a plurality of distinct classes of the same object as that of the target data; and identifying the class of the target data through linear superposition of the sampled training data files using l1 minimization, wherein a linear superposition with a sparsest number of coefficients is used to identify the class of the target data.Type: ApplicationFiled: January 29, 2009Publication date: March 17, 2011Inventors: Yi Ma, Allen Yang Yang, John Norbert Wright, Andrew William Wagner
-
Publication number: 20110066433Abstract: Disclosed herein are methods, systems, and computer-readable storage media for automatic speech recognition. The method includes selecting a speaker independent model, and selecting a quantity of speaker dependent models, the quantity of speaker dependent models being based on available computing resources, the selected models including the speaker independent model and the quantity of speaker dependent models. The method also includes recognizing an utterance using each of the selected models in parallel, and selecting a dominant speech model from the selected models based on recognition accuracy using the group of selected models. The system includes a processor and modules configured to control the processor to perform the method. The computer-readable storage medium includes instructions for causing a computing device to perform the steps of the method.Type: ApplicationFiled: September 16, 2009Publication date: March 17, 2011Applicant: AT&T Intellectual Property I, L.P.Inventors: Andrej LJOLJE, Diamantino Antonio CASEIRO, Alistair D. CONKIE
-
Patent number: 7904294Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a mobile device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.Type: GrantFiled: April 9, 2007Date of Patent: March 8, 2011Assignee: AT&T Intellectual Property II, L.P.Inventors: Richard C. Rose, Sarangarajan Pathasarathy, Aaron Edward Rosenberg, Shrikanth Sambasivan Narayanan
-
Publication number: 20110046951Abstract: A system and a method to generate statistical utterance classifiers optimized for the individual states of a spoken dialog system is disclosed. The system and method make use of large databases of transcribed and annotated utterances from calls collected in a dialog system in production and log data reporting the association between the state of the system at the moment when the utterances were recorded and the utterance. From the system state, being a vector of multiple system variables, subsets of these variables, certain variable ranges, quantized variable values, etc. can be extracted to produce a multitude of distinct utterance subsets matching every possible system state. For each of these subset and variable combinations, statistical classifiers can be trained, tuned, and tested, and the classifiers can be stored together with the performance results and the state subset and variable combination.Type: ApplicationFiled: August 21, 2009Publication date: February 24, 2011Inventors: David Suendermann, Jackson Liscombe, Krishna Dayanidhi, Roberto Pieraccini
-
Patent number: 7894849Abstract: Methods, systems, and apparatus, including computer program products, for generating feedback. In one aspect, a method includes receiving sensor data from a plurality of sensors, wherein at least one of the plurality of sensors is associated with a mobile device of a user; aggregating the received sensor data to generate aggregated sensor data; processing the aggregated sensor data to determine an aggregated metric; comparing the aggregated metric to a target associated with the user to determine a measure of performance; and generating feedback based on the determined measure of performance. Further, the mobile device can comprise a mobile personal services device that includes one or more of an audio sensor, a video sensor, an environmental sensor, a biometric sensor, a location sensor, an activity detector, and a health monitor. The feedback can be displayed on the mobile personal services device. The feedback also can be displayed in near real-time.Type: GrantFiled: July 9, 2007Date of Patent: February 22, 2011Assignee: Accenture Global Services LimitedInventors: Alex M. Kass, Lucian P. Hughes, Owen E. Richter, Dana Le, Daniel Farina
-
Patent number: 7865364Abstract: A method for improving speech recognition accuracy includes utilizing skiplists or lists of values that cannot occur because of improbability or impossibility. A table or list is stored in a dialog manager module. The table includes a plurality of information items and a corresponding list of improbable values for each of the plurality of information items. A plurality of recognized ordered interpretations is received from an automatic speech recognition (ASR) engine. Each of the plurality of recognized ordered interpretations each includes a number of information items. A value of one or more of the received information items for a first recognized ordered interpretation is compared to a table to determine if the value of the one of the received information items matches any of the list of improbable values for the corresponding information item.Type: GrantFiled: May 5, 2006Date of Patent: January 4, 2011Assignee: Nuance Communications, Inc.Inventor: Marc Helbing
-
Publication number: 20100332228Abstract: According to some embodiments, a method and apparatus are provided to buffer N audio frames of a plurality of audio frames associated with an audio signal, pre-compute scores for a subset of context dependent models (CDMs), and perform a graphical model search associated with the N audio frames where a score of a context independent model (CIM) associated with a CDM is used in lieu of a score for the CDM when a score for the CDM is needed and has not been pre-computed.Type: ApplicationFiled: June 25, 2009Publication date: December 30, 2010Inventors: Michael Eugene Deisher, Tao Ma
-
Publication number: 20100332227Abstract: A method of detecting pre-determined phrases to determine compliance quality is provided. The method includes determining whether at least one of an event or a precursor event has occurred based on a comparison between pre-determined phrases and a communication between a sender and a recipient in a communications network, and rating the recipient based on the presence of the pre-determined phrases associated with the event or the presence of the pre-determined phrases associated with the precursor event in the communication.Type: ApplicationFiled: June 24, 2009Publication date: December 30, 2010Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.Inventors: I. Dan MELAMED, Yeon-Jun KIM, Andrej LJOLJE, Bernard S. RENGER, David J. SMITH
-
Patent number: 7860713Abstract: Systems and methods for annotating speech data. The present invention reduces the time required to annotate speech data by selecting utterances for annotation that will be of greatest benefit. A selection module uses speech models, including speech recognition models and spoken language understanding models, to identify utterances that should be annotated based on criteria such as confidence scores generated by the models. These utterances are placed in an annotation list along with a type of annotation to be performed for the utterances and an order in which the annotation should proceed. The utterances in the annotation list can be annotated for speech recognition purposes, spoken language understanding purposes, labeling purposes, etc. The selection module can also select utterances for annotation based on previously annotated speech data and deficiencies in the various models.Type: GrantFiled: July 1, 2008Date of Patent: December 28, 2010Assignee: AT&T Intellectual Property II, L.P.Inventors: Tirso M. Alonso, Ilana Bromberg, Dilek Z. Hakkani-Tur, Barbara B. Hollister, Mazin G. Rahim, Giuseppe Riccardi, Lawrence Lyon Rose, Daniel Leon Stern, Gokhan Tur, James M. Wilson
-
Patent number: 7853448Abstract: An electronic instrument includes: a display control unit for displaying a control content corresponding to the command information based on the result of the speech recognition; an instruction unit for instructing that a control for the control content displayed by the display control unit, is cancelled; a control unit for performing the control based on the command information based on the result of the speech recognition after a predetermined standby time elapses since the control content corresponding to the command information based on the result of the speech recognition starts to be displayed by the display control unit when the instruction unit does not instruct that the control for the control content is cancelled within the predetermined standby time, and for canceling the control based on the command information based on the result of the speech recognition when the instruction unit instructs that the control is cancelled within the predetermined standby time.Type: GrantFiled: April 16, 2007Date of Patent: December 14, 2010Assignee: Funai Electric Co., Ltd.Inventors: Shusuke Narita, Susumu Tokoshima
-
Patent number: 7831422Abstract: Methods and systems for handling speech recognition processing in effectively real-time, via the Internet, in order that users do not experience noticeable delays from the start of an exercise until they receive responsive feedback. A user uses a client to access the Internet and a server supporting speech recognition processing, e.g., for language learning activities. The user inputs speech to the client, which transmits the user speech to the server in approximate real-time. The server evaluates the user speech in context of the current speech recognition exercise being executed, and provides responsive feedback to the client, again, in approximate real-time, with minimum latency delays. The client upon receiving responsive feedback from the server, displays, or otherwise provides, the feedback to the user.Type: GrantFiled: October 26, 2007Date of Patent: November 9, 2010Assignee: GlobalEnglish CorporationInventor: Christopher S. Jochumson
-
Publication number: 20100280827Abstract: Embodiments for implementing a speech recognition system that includes a speech classifier ensemble are disclosed. In accordance with one embodiment, the speech recognition system includes a classifier ensemble to convert feature vectors that represent a speech vector into log probability sets. The classifier ensemble includes a plurality of classifiers. The speech recognition system includes a decoder ensemble to transform the log probability sets into output symbol sequences. The speech recognition system further includes a query component to retrieve one or more speech utterances from a speech database using the output symbol sequences.Type: ApplicationFiled: April 30, 2009Publication date: November 4, 2010Applicant: Microsoft CorporationInventors: Kunal Mukerjee, Kazuhito Koishida, Shankar Regunathan
-
Publication number: 20100268535Abstract: A problem to be solved is to robustly detect a pronunciation variation example and acquire a pronunciation variation rule having a high generalization property, with less effort. The problem can be solved by a pronunciation variation rule extraction apparatus including a speech data storage unit, a base form pronunciation storage unit, a sub word language model generation unit, a speech recognition unit, and a difference extraction unit. The speech data storage unit stores speech data. The base form pronunciation storage unit stores base form pronunciation data representing base form pronunciation of the speech data. The sub word language model generation unit generates a sub word language model from the base form pronunciation data. The speech recognition unit recognizes the speech data by using the sub word language model.Type: ApplicationFiled: November 27, 2008Publication date: October 21, 2010Inventor: Takafumi Koshinaka
-
Patent number: 7818172Abstract: The method of recognizing speech in an acoustic signal comprises developing acoustic stochastic models of voice units in the form of a set of states of an acoustic signal and using the acoustic models for recognition by a comparison of the signal with predetermined acoustic models obtained via a prior learning process. While developing the acoustic models, the voice units are modeled by means of a first portion of the states independent of adjacent voice units and by means of a second portion of the states dependent on adjacent voice units. The second portion of states dependent on adjacent voice units shares common parameters with a plurality of units sharing same phonemes.Type: GrantFiled: April 20, 2004Date of Patent: October 19, 2010Assignee: France TelecomInventors: Ronaldo Messina, Denis Jouvet
-
Patent number: 7813928Abstract: A speech recognition device presenting whether a user's utterance is an unregistered word and whether the utterance should be repeated. The device includes a vocabulary storage unit (102) defining a vocabulary for speech recognition, and a speech recognition unit (101) checking the uttered speech against registered words. The device also includes a similarity calculation unit (103) calculating a similarity between the uttered speech and acoustic units, a judgment unit (104) judging, based on the check by the speech recognition unit (101) and the calculation performed by the similarity calculation unit (103), whether the uttered speech is a registered or unregistered word, an unregistered word unit (106) storing unregistered words, an unregistered word candidate search unit (105) searching the unregistered word unit (106) for unregistered word candidates, the, when the judgment unit (104) judges that the uttered speech is an unregistered word, and a display unit (107) displaying the result.Type: GrantFiled: June 2, 2005Date of Patent: October 12, 2010Assignee: Panasonic CorporationInventors: Yoshiyuki Okimoto, Tsuyoshi Inoue, Takashi Tsuzuki
-
Patent number: 7813925Abstract: When adjacent times or the small change of an observation signal is determined, a distribution which maximizes the output probability of a mixture distribution does not change at a high possibility. By using this fact, when obtaining the output probability of the mixture distribution HMM, a distribution serving as a maximum output probability is stored. When adjacent times or the small change of the observation signal is determined, the output probability of the stored distribution serves as the output probability of the mixture distribution. This can reduce the output probability calculation of other distributions when calculating the output probability of the mixture distribution, thereby reducing the calculation amount required for output probabilities.Type: GrantFiled: April 6, 2006Date of Patent: October 12, 2010Assignee: Canon Kabushiki KaishaInventors: Hiroki Yamamoto, Masayuki Yamada
-
Patent number: 7801280Abstract: Described are methods, systems, and devices that include obtaining a first measured perceptual quality by measuring, at a first location associated with a communications network, a perceptual quality of a first communication transmitted from the first location to a second location associated with the communications network, obtaining a second measured perceptual quality by measuring perceptual quality of the first communication at the second location; and, based on the first measured perceptual quality and the second measured perceptual quality, generating a first value representative of degradation in the quality of the first communication.Type: GrantFiled: December 15, 2004Date of Patent: September 21, 2010Assignee: Verizon Laboratories Inc.Inventor: Adrian Evans Conway
-
Patent number: 7797157Abstract: Channel normalization for automatic speech recognition is provided. Statistics are measured from an initial portion of a speech utterance. Feature normalization parameters are estimated based on the measured statistics and a statistically derived mapping relating measured statistics and feature normalization parameters. In some examples, the measured statistics comprise measures of an energy from the initial portion of the speech utterance. In some examples, measures of the energy comprise extreme values of the energy.Type: GrantFiled: January 10, 2005Date of Patent: September 14, 2010Assignee: Voice Signal Technologies, Inc.Inventors: Igor Zlokarnik, Laurence S. Gillick, Jordan Cohen
-
Publication number: 20100217592Abstract: The present invention provides a method for identifying a turn, such as a sentence or phrase, for addition to a platform dialog comprising a plurality of turns. Lexical features of each of a set of candidate turns relative to one or more turns in the platform dialog are determined. Semantic features associated with each candidate turn and associated with the platform dialog are determined to identify one or more topics associated with each candidate turn and with the platform dialog. Lexical features of each candidate turn are compared to lexical features of the platform dialog and semantic features associated with each candidate turn are compared to semantic features of the platform dialog to rank the candidate turns based on similarity of lexical features and semantic features of each candidate turn to lexical features and semantic features of the platform dialog.Type: ApplicationFiled: October 14, 2009Publication date: August 26, 2010Applicant: HONDA MOTOR CO., LTD.Inventors: Rakesh Gupta, Lev-Arie Ratinov
-
Patent number: 7783488Abstract: Methods and systems are provided for remote tuning and debugging of an automatic speech recognition system. Trace files are generated on-site from input speech by efficient, lossless compression of MFCC data, which is merged with compressed pitch and voicing information and stored as trace files. The trace files are transferred to a remote site where human-intelligible speech is reconstructed and analyzed. Based on the analysis, parameters of the automatic speech recognition system are remotely adjusted.Type: GrantFiled: December 19, 2005Date of Patent: August 24, 2010Assignee: Nuance Communications, Inc.Inventors: Shay Ben-David, Baiju Dhirajlal Mandalia, Zohar Sivan, Alexander Sorin
-
Publication number: 20100204985Abstract: A warping factor estimation system comprises label information generation unit that outputs voice/non-voice label information, warp model storage unit in which a probability model representing voice and non-voice occurrence probabilities is stored, and warp estimation unit that calculates a warping factor in the frequency axis direction using the probability model representing voice and non-voice occurrence probabilities, voice and non-voice labels, and a cepstrum.Type: ApplicationFiled: September 22, 2008Publication date: August 12, 2010Inventor: Tadashi Emori
-
Patent number: 7774202Abstract: A speech activated control system for controlling aerial vehicle components, program product, and associated methods are provided. The system can include a host processor adapted to develop speech recognition models and to provide speech command recognition. The host processor can be positioned in communication with a database for storing and retrieving speech recognition models. The system can include an avionic computer in communication with the host processor and adapted to provide command function management, a display and control processor in communication with the avionic computer adapted to provide a user interface between a user and the avionic computer, and a data interface positioned in communication with the avionic computer and the host processor provided to divorce speech command recognition functionality from vehicle or aircraft-related speech-command functionality.Type: GrantFiled: June 12, 2006Date of Patent: August 10, 2010Assignee: Lockheed Martin CorporationInventors: Richard P. Spengler, Jon C. Russo, Gregory W. Barnett, Kermit L. Armbruster
-
Publication number: 20100198597Abstract: Methods, speech recognition systems, and computer readable media are provided that recognize speech using dynamic pruning techniques. A search network is expanded based on a frame from a speech signal, a best hypothesis is determined in the search network, a default beam threshold is modified, and the search network is pruned using the modified beam threshold. The search network may be further pruned based on the search depth of the best hypothesis and/or the average number of frames per state for a search path.Type: ApplicationFiled: January 30, 2009Publication date: August 5, 2010Inventor: Qifeng ZHU
-
Patent number: 7761293Abstract: Systems and methods are disclosed to operate a mobile device. The system includes a message center; an engine coupled to the message center; and a mobile device wirelessly coupled to the message center, wherein the engine specifies one or more meeting locations and wherein at least one meeting location comprises a location designated by an advertiser.Type: GrantFiled: March 6, 2006Date of Patent: July 20, 2010Inventor: Bao Q. Tran
-
Patent number: 7752036Abstract: Recognizing a stream of speech received as speech vectors over a lossy communications link includes constructing for a speech recognizer a series of speech vectors from packets received over a lossy packetized transmission link, wherein some of the packets associated with each speech vector are lost or corrupted during transmission. Each constructed speech vector is multi-dimensional and includes associated features. After waiting for a predetermined time, speech vectors are generated and potentially corrupted features within the speech vector are indicated to the speech recognizer when present. Speech recognition is attempted at the speech recognizer on the speech vectors when corrupted features are present. This recognition may be based only on certain or valid features within each speech vector. Retransmission of a missing or corrupted packet is requested when corrupted values are indicated by the indicating step and when the attempted recognition step fails.Type: GrantFiled: December 29, 2008Date of Patent: July 6, 2010Assignee: AT&T Intellectual Property II, L.P.Inventors: Richard Vandervoort Cox, Stephen Michael Marcus, Mazin G. Rahim, Nambirajan Seshadri, Robert Douglas Sharp
-
Patent number: 7752045Abstract: A method for comparing a first audio data source with a plurality of audio data sources, wherein the first audio data source has an utterance spoken by a first person and the plurality of audio data sources have the same utterance spoken by a second person. The method includes performing a speech recognition function on the first audio data source to isolate at least one element of the first audio data source. The method also includes comparing the isolated element with a corresponding element in the plurality of audio data sources and determining whether the utterance spoken by the first person contained an error based on the comparison.Type: GrantFiled: October 7, 2002Date of Patent: July 6, 2010Assignees: Carnegie Mellon University, Carnegie Speech Company, Inc.Inventor: Maxine Eskenazi
-
Publication number: 20100161328Abstract: Embodiments are provided for utilizing a client-side cache for utterance processing to facilitate network based speech recognition. An utterance comprising a query is received in a client computing device. The query is sent from the client to a network server for results processing. The utterance is processed to determine a speech profile. A cache lookup is performed based on the speech profile to determine whether results data for the query is stored in the cache. If the results data is stored in the cache, then a query is sent to cancel the results processing on the network server and the cached results data is displayed on the client computing device.Type: ApplicationFiled: December 18, 2008Publication date: June 24, 2010Applicant: Microsoft CorporationInventors: Andrew K. Krumel, Shuangyu Chang, Robert L. Chambers
-
Patent number: 7738635Abstract: A method for improving the recognition confidence of alphanumeric spoken input, suitable for use in a speech recognition telephony application such as a voice response system. An alphanumeric candidate is determined from the spoken input, which may be the best available representation of the spoken input. Recognition confidence is compared with a preestablished threshold. If the recognition confidence exceeds the threshold, the alphanumeric candidate is selected to represent the spoken input. Otherwise, present call data associated with the spoken input is determined. Call data may include automatic number identification (ANI) information, caller-ID information, and/or dialed number information service (DNIS) information. Information associated with the alphanumeric candidate and information associated with the present call data are correlated in order to select alphanumeric information that best represents the spoken input.Type: GrantFiled: January 6, 2005Date of Patent: June 15, 2010Assignee: International Business Machines Corporation Nuance Communications, Inc.Inventors: Christopher Ryan Groves, Kevin James Muterspaugh
-
Publication number: 20100145680Abstract: A speech recognition method using a domain ontology includes: constructing domain ontology DB; forming a speech recognition grammar using the formed domain ontology DB; extracting a feature vector from a speech signal; modeling the speech signal using an acoustic model. The method performs speech recognition by using the acoustic model, the speech recognition dictionary and the speech recognition grammar on the basis of the feature vector.Type: ApplicationFiled: September 1, 2009Publication date: June 10, 2010Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTEInventors: Seung YUN, Soo Jong Lee, Jeong Se Kim, Il Bin Lee, Jun Park, Sang Kyu Park
-
Patent number: 7734472Abstract: The invention concerns a speech recognition enhancer (51) and a speech recognition system comprising such speech recognition enhancer (51), an audio input unit (41) and a speech recognizer (61, 3). The speech recognition enhancer (51) is arranged between the audio input unit (41) and the speech recognizer (61, 3). The speech recognition enhancer (51) has a parametrizable pre-filtering unit (511), a parametrizable dynamic voice level control unit (512), a parametrizable noise reduction unit (513) and a parametrizable voice level control unit (514). The parameters of these parametrizable units (511, 512, 513, 514) are adjusted to the characteristics of the specific audio input unit (41) and/or the characteristics of the specific speech recognizer (61, 3) for adapting the audio input unit (41) to the speech recognizer (61, 3).Type: GrantFiled: September 29, 2004Date of Patent: June 8, 2010Assignee: AlcatelInventor: Michael Walker
-
Patent number: 7729910Abstract: The invention comprises a method for lossy data compression, akin to vector quantization, in which there is no explicit codebook and no search, i.e. the codebook memory and associated search computation are eliminated. Some memory and computation are still required, but these are dramatically reduced, compared to systems that do not exploit this method. For this reason, both the memory and computation requirements of the method are exponentially smaller than comparable methods that do not exploit the invention. Because there is no explicit codebook to be stored or searched, no such codebook need be generated either. This makes the method well suited to adaptive coding schemes, where the compression system adapts to the statistics of the data presented for processing: both the complexity of the algorithm executed for adaptation, and the amount of data transmitted to synchronize the sender and receiver, are exponentially smaller than comparable existing methods.Type: GrantFiled: June 25, 2004Date of Patent: June 1, 2010Assignee: Agiletv CorporationInventor: Harry Printz
-
Patent number: 7729912Abstract: A system and method is provided for reducing latency for automatic speech recognition. In one embodiment, intermediate results produced by multiple search passes are used to update a display of transcribed text.Type: GrantFiled: December 23, 2003Date of Patent: June 1, 2010Assignee: AT&T Intellectual Property II, L.P.Inventors: Michiel Adriaan Unico Bacchiani, Brian Scott Amento
-
Publication number: 20100131262Abstract: Embodiments of the invention relate to methods for generating a multilingual acoustic model. A main acoustic model comprising a main acoustic model having probability distribution functions and a probabilistic state sequence model including first states is provided to a processor. At least one second acoustic model including probability distribution functions and a probabilistic state sequence model including states is also provided to the processor. The processor replaces each of the probability distribution functions of the at least one second acoustic model by one of the probability distribution functions and/or each of the states of the probabilistic state sequence model of the at least one second acoustic model with the state of the probabilistic state sequence model of the main acoustic model based on a criteria set to obtain at least one modified second acoustic model. The criteria set may be a distance measurement.Type: ApplicationFiled: November 25, 2009Publication date: May 27, 2010Applicant: NUANCE COMMUNICATIONS, INC.Inventors: Rainer Gruhn, Martin Raab, Raymond Brueckner
-
Patent number: 7725317Abstract: An interactive control system is disclosed with which the recognition rate and the responsiveness when operating a plurality of interactive services in parallel can be improved. A recognition lexicon and consolidated and reorganized information are generated for each individual interaction. Thus, excessive growth of the recognition lexicon can be avoided and a lowering of the recognition rate can be prevented. Moreover, based on the consolidated and reorganized information, it is possible to specify interactive services that may respond to the same input information, so that responses that are unexpected for the user can be prevented.Type: GrantFiled: July 14, 2004Date of Patent: May 25, 2010Assignee: Fujitsu LimitedInventors: Eiji Kitagawa, Toshiyuki Fukuoka, Ryosuke Miyata
-
Patent number: 7725314Abstract: A method and apparatus identify a clean speech signal from a noisy speech signal. To do this, a clean speech value and a noise value are estimated from the noisy speech signal. The clean speech value and the noise value are then used to define a gain on a filter. The noisy speech signal is applied to the filter to produce the clean speech signal. Under some embodiments, the noise value and the clean speech value are used in both the numerator and the denominator of the filter gain, with the numerator being guaranteed to be positive.Type: GrantFiled: February 16, 2004Date of Patent: May 25, 2010Assignee: Microsoft CorporationInventors: Jian Wu, James G. Droppo, Li Deng, Alejandro Acero
-
Patent number: 7716047Abstract: A system and method for an automatic set-up of speech recognition engines may include a speech recognizer configured to perform speech recognition procedures to identify input speech data according to one or more operating parameters. A merit manager may be utilized to automatically calculate merit values corresponding to the foregoing recognition procedures. These merit values may incorporate recognition accuracy information, recognition speed information, and a user-specified weighting factor that shifts the relative effect of the recognition accuracy information and the recognition speed information on the merit values. The merit manager may then automatically perform a merit value optimization procedure to select operating parameters that correspond to an optimal one of the merit values.Type: GrantFiled: March 31, 2003Date of Patent: May 11, 2010Assignees: Sony Corporation, Sony Electronics Inc.Inventors: Gustavo Hernandez-Abrego, Xavier Menendez-Pidal, Thomas Kemp, Katsuki Minamino, Helmut Lucke
-
Patent number: 7707032Abstract: A method and system used to determine the similarity between an input speech data and a sample speech data is provided. First, the input speech data is segmented into a plurality of input speech frames and the sample speech data is segmented into a plurality of sample speech frames. Then, the input speech frames and the sample speech frames are used to build a matching matrix, wherein the matching matrix comprises the distance values between each of the input speech frames and each of the sample speech frames. Next, the distance values are used to calculate a matching score. Finally, the similarity between the input speech data and the sample speech data is determined according to this matching score.Type: GrantFiled: October 20, 2005Date of Patent: April 27, 2010Assignee: National Cheng Kung UniversityInventors: Jhing-Fa Wang, Po-Chuan Lin, Li-Chang Wen
-
Patent number: 7684987Abstract: A phone set for use in speech processing such as speech recognition or text-to-speech conversion is used to model or form syllables of a tonal language having a plurality of different tones. Each syllable includes an initial part that can be glide dependent and a final part. The final part includes a plurality of phones. Each phones carries partial tonal information such that the phones taken together implicitly and jointly represent the different tones.Type: GrantFiled: January 21, 2004Date of Patent: March 23, 2010Assignee: Microsoft CorporationInventors: Min Chu, Chao Huang
-
Patent number: 7680659Abstract: A method of training language model parameters trains discriminative model parameters in the language model based on a performance measure having discrete values.Type: GrantFiled: June 1, 2005Date of Patent: March 16, 2010Assignee: Microsoft CorporationInventors: Jianfeng Gao, Hisami Suzuki
-
Patent number: 7680658Abstract: A method and apparatus for enhancing the performance of speech recognition by adaptively changing a process of determining the final, recognized word depending on a user's selection in a list of alternative words represented by a result of speech recognition. A speech recognition method comprising: inputting speech uttered by a user; recognizing the input speech and creating a predetermined number of alternative words to be recognized in the order of similarity; and displaying a list of alternative words arranged in a predetermined order and determining an alternative word that a cursor currently indicates as the final, recognized word if a user's selection from the list of alternative words has not been changed within a predetermined standby time.Type: GrantFiled: December 31, 2003Date of Patent: March 16, 2010Assignee: Samsung Electronics Co., Ltd.Inventors: Seung-nyung Chung, Myung-hyun Yoo, Jay-woo Kim, Joon-ah Park
-
Publication number: 20100063816Abstract: A method for processing an analog speech signal for speech recognition. The analog speech signal is sampled to produced a sampled speech signal. The sampled speech signal is framed into multiple frames of the sampled speech signal. The absolute value of the sampled speech signal is integrated within the frames and respective integrated-absolute values of the frames are determined. Based on the integrated-absolute values, the sampled speech signal is cut into segments of non-uniform duration. The segments are not as yet identified as parts of speech prior to and during the cutting.Type: ApplicationFiled: September 7, 2008Publication date: March 11, 2010Inventors: Ronen Faifkov, Rabin Cohen-Tov
-
Patent number: 7668716Abstract: A device improves speech recognition accuracy by utilizing an external knowledge source. The device receives a speech recognition result from an automatic speech recognition (ASR) engine, the speech recognition result including a plurality of ordered interpretations, wherein each of the ordered interpretations includes a plurality of information items. The device analyzes and filters the plurality of interpretations using an external knowledge source to create a filtered plurality of ordered interpretations. The device stores the filtered plurality of ordered interpretations to a memory. The device transmits the filtered plurality of ordered interpretations to a dialog manager module to create a textual output. Alternatively, the dialog manager module retrieves the filtered plurality of ordered interpretations from a memory.Type: GrantFiled: May 5, 2006Date of Patent: February 23, 2010Assignee: Dictaphone CorporationInventors: Marc Helbing, Klaus Reifenrath
-
Patent number: 7657433Abstract: A speech recognition system uses multiple confidence thresholds to improve the quality of speech recognition results. The choice of which confidence threshold to use for a particular utterance may be based on one or more features relating to the utterance. In one particular implementation, the speech recognition system includes a speech recognition engine that provides speech recognition results and a confidence score for an input utterance. The system also includes a threshold selection component that determines, based on the received input utterance, a threshold value corresponding to the input utterance. The system further includes a threshold component that accepts the recognition results based on a comparison of the confidence score to the threshold value.Type: GrantFiled: September 8, 2006Date of Patent: February 2, 2010Assignee: TellMe Networks, Inc.Inventor: Shuangyu Chang
-
Publication number: 20100017207Abstract: A signal is used to form intermediate feature vectors which are subjected to high-pass filtering. The high-pass-filtered intermediate feature vectors have a respective prescribed addition feature vector added to them.Type: ApplicationFiled: September 24, 2009Publication date: January 21, 2010Applicant: Infineon Technologies AGInventors: Werner Hemmert, Marcus Holmberg
-
Patent number: 7647224Abstract: A speech recognition apparatus includes a sound information acquiring unit that acquires sound information, a unit segment dividing unit that divides the sound information into plural unit segments, a segment information acquiring unit that acquires segment information that indicates a feature of each unit segment, a segment relation value calculating unit that calculates a segment relation value that indicates a relative feature of a target segment which is a unit segment to be processed with respect to an adjacent segment which is a unit segment adjacent to the target segment, based on segment information of the target segment and segment information of the adjacent segment among the segment information, a recognition candidate storing unit that stores recognition candidates that are targets of speech recognition, and a recognition result selecting unit that selects a recognition result from the recognition candidates stored in the recognition candidate storing unit utilizing the segment relation value.Type: GrantFiled: November 23, 2005Date of Patent: January 12, 2010Assignee: Kabushiki Kaisha ToshibaInventors: Masahide Ariu, Shinichi Tanaka, Takashi Masuko
-
Publication number: 20090319268Abstract: A method and an apparatus for measuring the intelligibility level of an audio announcement device (40), employ at least one speech recognition module (418; 518) for analyzing the reconstructed verbal content of the audio message announced by the audio announcement device (40), optionally by comparison with the verbal content of an original audio message.Type: ApplicationFiled: June 19, 2009Publication date: December 24, 2009Applicant: ARCHEAN TECHNOLOGIESInventors: Xavier AUMONT, Antoine WILHELM-JAUREGUIBERRY