Distance Patents (Class 704/238)
-
Patent number: 8595009Abstract: Methods and apparatuses for performing song detection on an audio signal are described. Clips of the audio signal are classified into classes comprising music. Class boundaries of music clips are detected as candidate boundaries of a first type. Combinations including non-overlapped sections are derived. Each section meets the following conditions: 1) including at least one music segment longer than a predetermined minimum song duration, 2) shorter than a predetermined maximum song duration, 3) both starting and ending with a music clip, and 4) a proportion of the music clips in each of the sections is greater than a predetermined minimum proportion. In this way, various possible song partitions in the audio signal can be obtained for investigation.Type: GrantFiled: July 26, 2012Date of Patent: November 26, 2013Assignee: Dolby Laboratories Licensing CorporationInventors: Lie Lu, Claus Bauer
-
Patent number: 8593673Abstract: A method for routing a confirmation of receipt of a facsimile or portion thereof according to one embodiment of the present invention includes analyzing text of a facsimile for at least one of a meaning and a context of the text; and routing one or more confirmations to one or more destinations based on the analysis. A method for routing one or more confirmations according to another embodiment of the present invention includes analyzing a pattern of light and dark areas of a facsimile; correlating the pattern to one or more forms; and routing one or more confirmations to one or more destinations based on the correlation. Additional systems and methods are also presented.Type: GrantFiled: October 20, 2010Date of Patent: November 26, 2013Assignee: Kofax, Inc.Inventors: Roy Couchman, Roland G. Borrey
-
Patent number: 8554560Abstract: Discrimination between two classes comprises receiving a set of frames including an input signal and determining at least two different feature vectors for each of the frames. Discrimination between two classes further comprises classifying the two different feature vectors using sets of preclassifiers trained for at least two classes of events and from that classification, and determining values for at least one weighting factor. Discrimination between two classes still further comprises calculating a combined feature vector for each of the received frames by applying the weighting factor to the feature vectors and classifying the combined feature vector for each of the frames by using a set of classifiers trained for at least two classes of events.Type: GrantFiled: September 4, 2012Date of Patent: October 8, 2013Assignee: International Business Machines CorporationInventor: Zica Valsan
-
Patent number: 8548806Abstract: A voice recognition device, a voice recognition method and a voice recognition program capable of appropriately restricting recognition objects based on voice input from a user to recognize the input voice with accuracy are provided.Type: GrantFiled: September 11, 2007Date of Patent: October 1, 2013Assignee: Honda Motor Co. Ltd.Inventor: Hisayuki Nagashima
-
Patent number: 8532988Abstract: A method for searching for an input symbol string, includes receiving (B) an input symbol string, proceeding (C) in a trie data structure to a calculation point indicated by the next symbol, calculating (D) distances at the calculation point, selecting (E) repeatedly the next branch to follow (C) to the next calculation point to repeat the calculation (D). After the calculation (G), selecting the symbol string having the shortest distance to the input symbol string on the basis of the performed calculations. To minimize the number of calculations, not only the distances are calculated (D) at the calculation points, but also the smallest possible length difference corresponding to each distance, and on the basis of each distance and corresponding length difference a reference value is calculated, and the branch is selected (E) in such a manner that next the routine proceeds from the calculation point producing the lowest reference value.Type: GrantFiled: July 3, 2003Date of Patent: September 10, 2013Assignee: Syslore OyInventor: Jorkki Hyvonen
-
Patent number: 8521529Abstract: An input signal is converted to a feature-space representation. The feature-space representation is projected onto a discriminant subspace using a linear discriminant analysis transform to enhance the separation of feature clusters. Dynamic programming is used to find global changes to derive optimal cluster boundaries. The cluster boundaries are used to identify the segments of the audio signal.Type: GrantFiled: April 18, 2005Date of Patent: August 27, 2013Assignee: Creative Technology LtdInventors: Michael M. Goodwin, Jean Laroche
-
Publication number: 20130166294Abstract: A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.Type: ApplicationFiled: November 30, 2012Publication date: June 27, 2013Applicant: AT&T INTELLECTUAL PROPERTY II, L.P.Inventor: AT&T Intellectual Property II, L.P.
-
Patent number: 8451475Abstract: A method for routing a confirmation of receipt of a facsimile or portion thereof according to one embodiment of the present invention includes analyzing text of a facsimile for at least one of a meaning and a context of the text; and routing one or more confirmations to one or more destinations based on the analysis. A method for routing one or more confirmations according to another embodiment of the present invention includes analyzing a pattern of light and dark areas of a facsimile; correlating the pattern to one or more forms; and routing one or more confirmations to one or more destinations based on the correlation. Additional systems and methods are also presented.Type: GrantFiled: May 1, 2007Date of Patent: May 28, 2013Assignee: Kofax, Inc.Inventors: Roy Couchman, Roland G. Borrey
-
Patent number: 8380511Abstract: There is disclosed a system and method for automatically performing semantic categorization. In one embodiment at least one text description pertaining to a category set is accepted along with words that are anticipated to be uttered by a user pertaining to that category set; lexical chaining confidence score is attached to each pair matched between the anticipated words and the accepted text description. These confidence scores are used subsequently by a categorization circuit that accepts a text phrase utterance from an input source along with a category set pertaining to the accepted utterance. The categorization circuit, in one embodiment, creates word pairs matched between the accepted text phrase utterance and the accepted category set. From these word scores, the category pertaining to the utterance is determined based, at least in part, on the assigned lexical chaining confidence scores as previously determined.Type: GrantFiled: February 20, 2007Date of Patent: February 19, 2013Assignees: Intervoice Limited Partnership, Lymba CorporationInventors: Ellis K. Cave, Mithun Balakrishna, Vincent Mo
-
Patent number: 8352262Abstract: A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words; a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, the memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying the scores by adding the score updates to the scores; and a selector circuit for selecting at least one nType: GrantFiled: June 16, 2011Date of Patent: January 8, 2013Assignee: Zentian LimitedInventors: Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
-
Patent number: 8352263Abstract: The invention can recognize all languages and input words. It needs m unknown voices to represent m categories of known words with similar pronunciations. Words can be pronounced in any languages, dialects or accents. Each will be classified into one of m categories represented by its most similar unknown voice. When user pronounces a word, the invention finds its F most similar unknown voices. All words in F categories represented by F unknown voices will be arranged according to their pronunciation similarity and alphabetic letters. The pronounced word should be among the top words. Since we only find the F most similar unknown voices from m (=500) unknown voices and since the same word can be classified into several categories, our recognition method is stable for all users and can fast and accurately recognize all languages (English, Chinese and etc.) and input much more words without using samples.Type: GrantFiled: September 29, 2009Date of Patent: January 8, 2013Inventors: Tze-Fen Li, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
-
Patent number: 8346550Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a mobile device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.Type: GrantFiled: February 14, 2011Date of Patent: January 1, 2013Assignee: AT&T Intellectual Property II, L.P.Inventors: Richard C. Rose, Sarangarajan Pathasarathy, Aaron Edward Rosenberg, Shrikanth Sambasivan Narayanan
-
Publication number: 20120310646Abstract: A speech recognition device and a speech recognition method thereof are disclosed. In the speech recognition method, a key phrase containing at least one key word is received. The speech recognition method comprises steps: receiving a sound source signal of a key word and generating a plurality of audio signals; transforming the audio signals into a plurality of frequency signals; receiving the frequency signals to obtain a space-frequency spectrum and an angular estimation value thereof; receiving the space-frequency spectrum to define and output at least one spatial eigenparameter and, and using the angular estimation value and the frequency signals to perform spotting and evaluation and outputting a Bhattacharyya distance; and receiving the spatial eigenparameter and the Bhattacharyya distance and using corresponding thresholds to determine correctness of the key phrase. Thereby this invention robustly achieves high speech recognition rate under very low SNR conditions.Type: ApplicationFiled: July 7, 2011Publication date: December 6, 2012Applicant: NATIONAL CHIAO TUNG UNIVERSITYInventors: JWU-SHENG HU, MING-TANG LEE, TING-CHAO WANG, CHIA HSIN YANG
-
Patent number: 8311813Abstract: Discrimination between at least two classes of events in an input signal is carried out in the following way. A set of frames containing an input signal is received, and at least two different feature vectors are determined for each of said frames. Said at least two different feature vectors are classified using respective sets of preclassifiers trained for said at least two classes of events. Values for at least one weighting factor are determined based on outputs of said preclassifiers for each of said frames. A combined feature vector is calculated for each of said frames by applying said at least one weighting factor to said at least two different feature vectors. Said combined feature vector is classified using a set of classifiers trained for said at least two classes of events.Type: GrantFiled: October 26, 2007Date of Patent: November 13, 2012Assignee: International Business Machines CorporationInventor: Zica Valsan
-
Patent number: 8279465Abstract: A method for routing a facsimile according to one embodiment of the present invention includes analyzing text of a facsimile for at least one of a meaning and a context of the text; and routing the facsimile to one or more destinations based on the analysis. A method for routing a facsimile according to another embodiment of the present invention includes analyzing a pattern of light and dark areas of a facsimile; correlating the pattern to one or more forms; and routing the facsimile to one or more destinations based on the correlation. Additional systems and methods are also presented.Type: GrantFiled: May 1, 2007Date of Patent: October 2, 2012Assignee: Kofax, Inc.Inventor: Roy Couchman
-
Patent number: 8265932Abstract: A system and method for identifying audio command prompts for use in a voice response environment is provided. A signature is generated for audio samples each having preceding audio, reference phrase audio, and trailing audio segments. The trailing segment is removed and each of the preceding and reference phrase segments are divided into buffers. The buffers are transformed into discrete fourier transform buffers. One of the discrete fourier transform buffers from the reference phrase segment that is dissimilar to each of the discrete fourier transform buffers from the preceding segment is selected as the signature. Audio command prompts are processed to generate a discrete fourier transform. Each discrete fourier transform for the audio command prompts is compared with each of the signatures and a correlation value is determined. One such audio command prompt matches one such signature when the correlation value for that audio command prompt satisfies a threshold.Type: GrantFiled: October 3, 2011Date of Patent: September 11, 2012Assignee: Intellisist, Inc.Inventor: Martin R. M. Dunsmuir
-
Patent number: 8239201Abstract: Disclosed herein are methods for presenting speech from a selected text that is on a computing device. This method includes presenting text on a touch-sensitive display and having that text size within a threshold level so that the computing device can accurately determine the intent of the user when the user touches the touch screen. Once the user touch has been received, the computing device identifies and interprets the portion of text that is to be selected, and subsequently presents the text audibly to the user.Type: GrantFiled: October 24, 2008Date of Patent: August 7, 2012Assignee: AT&T Intellectual Property I, L.P.Inventors: Alistair D. Conkie, Horst Schroeter
-
Patent number: 8234116Abstract: Measurement of Kullback-Leibler Divergence (KLD) between hidden Markov models (HMM) of acoustic units utilizes an unscented transform to approximate KLD between Gaussian mixtures. Dynamic programming equalizes the number of states between HMMs having a different number of states, while the total KLD of the HMMs is obtained by summing individual KLDs calculated by state pair by state pair comparisons.Type: GrantFiled: August 22, 2006Date of Patent: July 31, 2012Assignee: Microsoft CorporationInventors: Peng Liu, Frank Kao-Ping K. Soong, Jian-Lai Zhou
-
Publication number: 20120166194Abstract: Disclosed herein are an apparatus and method for recognizing speech. The apparatus includes a frame-based speech recognition unit, a segment division unit, a segment feature extraction unit, a segment speech recognition performance unit, and a combination and synchronization unit. The frame-based speech recognition unit extracts frame speech feature vectors from a speech signal, and performs speech recognition on frames of the speech signal using the frame speech feature vectors and a frame-based probability model. The segment division unit divides the speech signal into segments. The segment feature extraction unit extracts segment speech feature vectors around a boundary between the segments. The segment speech recognition performance unit performs speech recognition on the segments of the speech signal using the segment speech feature vectors and a segment-based probability model.Type: ApplicationFiled: December 22, 2011Publication date: June 28, 2012Applicant: Electronics and Telecommunications Research InstituteInventors: Ho-Young JUNG, Jeon-Gue PARK, Hoon CHUNG
-
Patent number: 8209174Abstract: A text-independent speaker verification system utilizes mel frequency cepstral coefficients analysis in the feature extraction blocks, template modeling with vector quantization in the pattern matching blocks, an adaptive threshold and an adaptive decision verdict and is implemented in a stand-alone device using less powerful microprocessors and smaller data storage devices than used by comparable systems of the prior art.Type: GrantFiled: April 17, 2009Date of Patent: June 26, 2012Assignee: Saudi Arabian Oil CompanyInventor: Essam Abed Al-Telmissani
-
Patent number: 8204749Abstract: A system, method and computer-readable medium for practicing a method of emotion detection during a natural language dialog between a human and a computing device are disclosed. The method includes receiving an utterance from a user in a natural language dialog, receiving contextual information regarding the natural language dialog which is related to changes of emotion over time in the dialog, and detecting an emotion of the user based on the received contextual information. Examples of contextual information include, for example, differential statistics, joint statistics and distance statistics.Type: GrantFiled: March 21, 2011Date of Patent: June 19, 2012Assignee: AT&T Intellectual Property II, L.P.Inventors: Dilek Z. Hakkani-Tur, Jackson J. Liscombe, Guiseppe Riccardi
-
Patent number: 8195451Abstract: In an information detecting apparatus (1), a speech kind discrimination unit (11) discriminates and classifies an audio signal at an information source into kind (category) such as music or speech, etc. on a predetermined time basis, and a memory unit/recording medium (13) records discrimination information thereof. A discrimination frequency calculating unit (15) calculates, on a predetermined time basis, discrimination frequency every kind at a predetermined time period longer than the time unit.Type: GrantFiled: February 10, 2004Date of Patent: June 5, 2012Assignee: Sony CorporationInventor: Yasuhiro Toguri
-
Publication number: 20120109650Abstract: Disclosed herein is an apparatus and method for creating an acoustic model. The apparatus includes a binary tree creation unit, an information creation unit, and a binary tree reduction unit. The binary tree creation unit creates a binary tree by repeatedly merging a plurality of Gaussian components for each Hidden Markov Model (HMM) state of an acoustic model based on a distance measure reflecting a variation in likelihood score. The information creation unit creates information about information about the largest size of the acoustic model in accordance with a platform including a speech recognizer. The binary tree reduction unit reduces the binary tree in accordance with the information about the largest size of the acoustic model.Type: ApplicationFiled: October 28, 2011Publication date: May 3, 2012Applicant: Electronics and Telecommunications Research InstituteInventors: Hoon-Young CHO, Young-Ik Kim, Il-Bin Lee, Seung-Hi Kim, Jun Park, Dong-Hyun Kim, Sang-Hun Kim
-
Patent number: 8170873Abstract: An approach to comparing events in word spotting, such as comparing putative and reference instances of a keyword, makes use of a set of models of subword units. For each of two acoustic events and for each of a series of times in each of the events, a probability associated with each of the models of the set of subword units is computed. Then, a quantity characterizing a comparison of the two acoustic events, one occurring in each of the two acoustic signals, is computed using the computed probabilities associated with each of the models.Type: GrantFiled: July 22, 2004Date of Patent: May 1, 2012Assignee: Nexidia Inc.Inventor: Robert W. Morris
-
Patent number: 8140330Abstract: Embodiments of a method and system for detecting repeated patterns in dialog systems are described. The system includes a dynamic time warping (DTW) based pattern comparison algorithm that is used to find the best matching parts between a correction utterance and an original utterance. Reference patterns are generated from the correction utterance by an unsupervised segmentation scheme. No significant information about the position of the repeated parts in the correction utterance is assumed, as each reference pattern is compared with the original utterance from the beginning of the utterance to the end. A pattern comparison process with DTW is executed without knowledge of fixed end-points. A recursive DTW computation is executed to find the best matching parts that are considered as the repeated parts as well as the end-points of the utterance.Type: GrantFiled: June 13, 2008Date of Patent: March 20, 2012Assignee: Robert Bosch GmbHInventors: Mert Cevik, Fuliang Weng
-
Patent number: 8050917Abstract: A system including a conferencing telephone coupled to or in communication with an identification service. The identification service is configured to poll user devices of conference participants to determine or confirm identities. In response, the user devices transmit audio electronic business cards, which can include user voice samples and/or preprocessed voice recognition data. The identification service stores the resulting audio electronic business card data. When the corresponding participant speaks during the conference, the identification service identifies the speaker.Type: GrantFiled: September 27, 2007Date of Patent: November 1, 2011Assignee: Siemens Enterprise Communications, Inc.Inventors: Rami Caspi, William J. Beyda
-
Patent number: 8036888Abstract: A sound input from sound sources existing in a plurality of directions is accepted and converted into a signal on a frequency axis. A suppressing function to suppress the converted signal on a frequency axis is computed, an amplitude component of a signal on a frequency axis is multiplied by the computed suppressing function and the converted signal on a frequency axis is corrected. A phase component of each converted signal on a frequency axis is computed for each frequency and a difference of phase components is computed. A probability value indicative of probability of existence of a sound source in a predetermined direction is specified based on the computed difference and a suppressing function to suppress a sound input from a sound source other than a sound source in a predetermined direction is computed based on the specified probability value.Type: GrantFiled: September 13, 2006Date of Patent: October 11, 2011Assignee: Fujitsu LimitedInventor: Naoshi Matsuo
-
Patent number: 8024186Abstract: Embodiments of these location-based systems and methods for device interaction may allow a content delivery system to provide certain content to a device, or restrict certain content from being delivered to the device, based on the location of the device. When a user requests certain content the location of the device may be determined and compared against an access control list defining a set or rules regarding that content to determine if the requested content may be accessed from that location. If the content may be accessed from this location the content may be delivered, otherwise an error message, or another option, may be delivered to the device. Similarly, the location of a device may be utilized to tailor the delivery of content to a device, such that content may be provided to a user based on the user's location, in certain cases with little or no stimulus from the user.Type: GrantFiled: May 24, 2006Date of Patent: September 20, 2011Assignee: Mobitv, Inc.Inventor: Jeremy S. De Bonet
-
Patent number: 8024188Abstract: An optimal selection or decision strategy is described through an example that includes use in dialog systems. The selection strategy or method includes receiving multiple predictions and multiple probabilities. The received predictions predict the content of a received input and each of the probabilities corresponds to one of the predictions. In an example dialog system, the received input includes an utterance. The selection method includes dynamically selecting a set of predictions from the received predictions by generating ranked predictions. The ranked predictions are generated by ordering the plurality of predictions according to descending probability.Type: GrantFiled: August 24, 2007Date of Patent: September 20, 2011Assignee: Robert Bosch GmbHInventors: Junling Hu, Fabrizio Morbini, Fuliang Weng, Xue Liu
-
Patent number: 8005674Abstract: A recognition model set is generated. A technique is described to take advantage of the logarithm likelihood of real data for cross entropy to measure the mismatch between a training data and a training data derived model, and compare such type of mismatches between class dependent models and class independent model for evidence of model replacement. By using change of cross entropies in the decision of adding class independent Gaussian Mixture Models (GMMs), the good performance of class dependent models is largely retained, while decreasing the size and complexity of the model.Type: GrantFiled: July 10, 2007Date of Patent: August 23, 2011Assignee: International Business Machines CorporationInventors: Eric W Janke, Bin Jia
-
Publication number: 20110172999Abstract: A system, method and computer-readable medium for practicing a method of emotion detection during a natural language dialog between a human and a computing device are disclosed. The method includes receiving an utterance from a user in a natural language dialog, receiving contextual information regarding the natural language dialog which is related to changes of emotion over time in the dialog, and detecting an emotion of the user based on the received contextual information. Examples of contextual information include, for example, differential statistics, joint statistics and distance statistics.Type: ApplicationFiled: March 21, 2011Publication date: July 14, 2011Applicant: AT&T Corp.Inventors: Dilek Z. Hakkani-Tur, Jackson J. Liscombe, Guiseppe Riccardi
-
Patent number: 7979277Abstract: A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words; a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, said memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying said scores by adding said score updates to said scores; and a selector circuit for selecting at least oType: GrantFiled: September 14, 2005Date of Patent: July 12, 2011Assignee: Zentian LimitedInventors: Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
-
Publication number: 20110166857Abstract: A human voice distinguishing method and device are provided. The method involves: taking every n sampling points of the current frame of audio signals as one subsection, wherein n is a positive integer, judging whether two adjacent subsections have transition relative to a distinguishing threshold, wherein the sliding maximum absolute value of the two adjacent subsections is more and less than the distinguishing threshold respectively, if so, then determining the current frame to be human voice, where the sliding maximum absolute value of the subsection is obtained by the following method: taking the maximum value of absolute intensity of every sampling point in this subsection as the initial maximum absolute value of this subsection, and taking the maximum value of the initial maximum absolute value of this subsection and m subsections following this subsection as the sliding maximum absolute value of this subsection, wherein m is a positive integer.Type: ApplicationFiled: September 15, 2009Publication date: July 7, 2011Applicant: ACTIONS SEMICONDUCTOR CO. LTD.Inventors: Xiangyong Xie, Zhan Chen
-
Patent number: 7970609Abstract: Sound signals from sound sources present in multiple directions are accepted as inputs of multiple channels, and signal of each channel is transformed into a signal on a frequency axis. A phase component of the transformed signal is calculated for each identical frequency, and phase difference between the multiple channels is calculated. An amplitude component of the transformed signal is calculated, and a noise component is estimated from the calculated amplitude component. An SN ratio for each frequency is calculated on the basis of the amplitude component and the estimated noise component, and frequencies at which the SN ratios are larger than a predetermined value are extracted. Difference between arrival distances is calculated on the basis of the phase difference at selected frequency, and the arrival direction in which it is estimated that the target sound source is present is calculated.Type: GrantFiled: July 20, 2007Date of Patent: June 28, 2011Assignee: Fujitsu LimitedInventor: Shoji Hayakawa
-
Patent number: 7970115Abstract: A communications system is provided that includes: (a) a speech discrimination agent 136 operable to generate a speech profile of a first party to a voice call; and (b) a speech modification agent 140 operable to adjust, based on the speech profile, a spectral characteristic of a voice stream from the first party to form a modified voice stream, the modified voice stream being provided to the second party.Type: GrantFiled: October 5, 2005Date of Patent: June 28, 2011Assignee: Avaya Inc.Inventors: Marc W. J. Coughlan, Alexander Q. Forbes, Alexander M. Scholte, Peter D. Runcie, Ralph Warta
-
Patent number: 7949525Abstract: A spoken language understanding method and system are provided. The method includes classifying a set of labeled candidate utterances based on a previously trained classifier, generating classification types for each candidate utterance, receiving confidence scores for the classification types from the trained classifier, sorting the classified utterances based on an analysis of the confidence score of each candidate utterance compared to a respective label of the candidate utterance, and rechecking candidate utterances according to the analysis. The system includes modules configured to control a processor in the system to perform the steps of the method.Type: GrantFiled: June 16, 2009Date of Patent: May 24, 2011Assignee: AT&T Intellectual Property II, L.P.Inventors: Dilek Z. Hakkani-Tur, Mazin G. Rahim, Gokhan Tur
-
Patent number: 7912720Abstract: A system, method and computer-readable medium for practicing a method of emotion detection during a natural language dialog between a human and a computing device are disclosed. The method includes receiving an utterance from a user in a natural language dialog between a human and a computing device, receiving contextual information regarding the natural language dialog which is related to changes of emotion over time in the dialog, and detecting an emotion of the user based on the received contextual information. Examples of contextual information include, for example, differential statistics, joint statistics and distance statistics.Type: GrantFiled: July 20, 2005Date of Patent: March 22, 2011Assignee: AT&T Intellectual Property II, L.P.Inventors: Dilek Z. Hakkani-Tur, Jackson J. Liscombe, Guiseppe Riccardi
-
Patent number: 7904294Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a mobile device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.Type: GrantFiled: April 9, 2007Date of Patent: March 8, 2011Assignee: AT&T Intellectual Property II, L.P.Inventors: Richard C. Rose, Sarangarajan Pathasarathy, Aaron Edward Rosenberg, Shrikanth Sambasivan Narayanan
-
Patent number: 7869997Abstract: A system and method of recognizing speech comprises an audio receiving element and a computer server. The audio receiving element and the computer server perform the process steps of the method. The method involves training a stored set of phonemes by converting them into n-dimensional space, where n is a relatively large number. Once the stored phonemes are converted, they are transformed using single value decomposition to conform the data generally into a hypersphere. The received phonemes from the audio-receiving element are also converted into n-dimensional space and transformed using single value decomposition to conform the data into a hypersphere. The method compares the transformed received phoneme to each transformed stored phoneme by comparing a first distance from a center of the hypersphere to a point associated with the transformed received phoneme and a second distance from the center of the hypersphere to a point associated with the respective transformed stored phoneme.Type: GrantFiled: March 28, 2008Date of Patent: January 11, 2011Assignee: AT&T Intellectual Property II, L.P.Inventor: Bishnu Saroop Atal
-
Publication number: 20100324896Abstract: A method for managing interactive dialog between a machine and a user. In one embodiment, an interaction between the machine and the user is managed by determining at least one likelihood value which is dependent upon a possible speech onset of the user. In another embodiment, the likelihood value can be dependent on a model of a desire of the user for specific items, a model of an attention of the user to specific items, or a model of turn-taking cues. The values can be used to determine a mode confidence value that is used by the system to determine the nature of prompts provided to the user.Type: ApplicationFiled: August 24, 2010Publication date: December 23, 2010Inventors: David Attwater, Bruce Balentine
-
Patent number: 7853449Abstract: Techniques are provided for generating improved language modeling. Such improved modeling is achieved by conditioning a language model on a state of a dialog for which the language model is employed. For example, the techniques of the invention may improve modeling of language for use in a speech recognizer of an automatic natural language based dialog system. Improved usability of the dialog system arises from better recognition of a user's utterances by a speech recognizer, associated with the dialog system, using the dialog state-conditioned language models. By way of example, the state of the dialog may be quantified as: (i) the internal state of the natural language understanding part of the dialog system; or (ii) words in the prompt that the dialog system played to the user.Type: GrantFiled: March 28, 2008Date of Patent: December 14, 2010Assignee: Nuance Communications, Inc.Inventors: Satyanarayana Dharanipragada, Michael Daniel Monkowski, Harry W. Printz, Karthik Visweswariah
-
Publication number: 20100277579Abstract: Disclosed are an apparatus and method of deducing a user's intention using motion information. The user's intention deduction apparatus includes a speech intention determining unit configured to predict a speech intention regarding a user's speech using motion information sensed by at least one motion capture sensor, and a controller configured to control operation of detecting a voice section from a received sound signal based on the predicted speech intention.Type: ApplicationFiled: April 29, 2010Publication date: November 4, 2010Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Jeong-Mi CHO, Jeong-Su Kim, Won-Chul Bang, Nam-Hoon Kim
-
Patent number: 7797148Abstract: A phrase-based translation system and method includes a statistically integrated phrase lattice (SIPL) (H) which represents an entire translational model. An input (I) is translated by determining a best path through an entire lattice (S) by performing an efficient composition operation between the input and the SIPL. The efficient composition operation is performed by a multiple level search where each operand in the efficient composition operation represents a different search level.Type: GrantFiled: June 4, 2008Date of Patent: September 14, 2010Assignee: International Business Machines CorporationInventors: Stanley Chen, Yuqing Gao, Bowen Zhou
-
Patent number: 7761296Abstract: A system and method for rescoring the N-best hypotheses from an automatic speech recognition system by comparing an original speech waveform to synthetic speech waveforms that are generated for each text sequence of the N-best hypotheses. A distance is calculated from the original speech waveform to each of the synthesized waveforms, and the text associated with the synthesized waveform that is determined to be closest to the original waveform is selected as the final hypothesis. The original waveform and each synthesized waveform are aligned to a corresponding text sequence on a phoneme level. The mean of the feature vectors which align to each phoneme is computed for the original waveform as well as for each of the synthesized hypotheses.Type: GrantFiled: April 2, 1999Date of Patent: July 20, 2010Assignee: International Business Machines CorporationInventors: Raimo Bakis, Ellen M. Eide
-
Patent number: 7747439Abstract: A method of correctly segmenting phonemes by determining a boundary indicating a start point and an end point of each of the segmented phonemes, and correctly finding the phoneme in the speech signal by determining which phoneme in a phoneme recognition standard table corresponds to each of the segmented phonemes. Using this phoneme recognition method, an amount of computation can be significantly reduced, and the phoneme in the speech signal can be easily found by calculating probability distances between phonemes.Type: GrantFiled: March 5, 2007Date of Patent: June 29, 2010Assignee: Samsung Electronics Co., LtdInventor: Hyun-Soo Kim
-
Patent number: 7739102Abstract: A system, method, and computer program product for domain-independent natural language understanding, including at least one of forming pairs of words and/or phrases in a sentence, wherein each word and/or phrase is paired with every other word and/or phrase; determining meanings for the words and/or phrases; assigning numeric codes that uniquely identify semantic concepts to those word and/or phrase meanings; comparing the numeric code of each word and/or phrase with each numeric code of the other word and/or phrase in the pair; selecting the pairs with the best relationships; combining highly-related pairs with other highly-related pairs to form longer groups of words; exchanging numeric codes for the longer groups with numeric codes having a weaker relationship to determine if the exchanged numeric codes provide an overall stronger relationship; and forming longer and longer groups until the sentence is understood.Type: GrantFiled: October 7, 2004Date of Patent: June 15, 2010Inventor: Howard J. Bender
-
Patent number: 7739111Abstract: A pattern matching method for matching between a first symbol sequence and a second symbol sequence which is shorter than the first symbol sequence is provided. The method includes the steps of performing DP matching between the first and second symbol sequences to create a matrix of the DP matching transition, detecting the maximum length of lengths of consecutive correct answers based on the matrix of the DP matching transition, and calculating similarity based on the maximum length.Type: GrantFiled: August 9, 2006Date of Patent: June 15, 2010Assignee: Canon Kabushiki KaishaInventor: Kazue Kaneko
-
Patent number: 7725307Abstract: An intelligent query system for processing voiced-based queries is disclosed. This distributed client-server system, typically implemented on an intranet or over the Internet accepts a user's queries at his/her computer, PDA or workstation using a speech input interface. After converting the user's query from speech to text, a natural language engine, a database processor and a full-text SQL database is implemented to find a single answer that best matches the user's query. Both statistical and semantic decoding are used to assist and improve the performance of the query recognition.Type: GrantFiled: August 29, 2003Date of Patent: May 25, 2010Assignee: Phoenix Solutions, Inc.Inventor: Ian M. Bennett
-
Patent number: 7707032Abstract: A method and system used to determine the similarity between an input speech data and a sample speech data is provided. First, the input speech data is segmented into a plurality of input speech frames and the sample speech data is segmented into a plurality of sample speech frames. Then, the input speech frames and the sample speech frames are used to build a matching matrix, wherein the matching matrix comprises the distance values between each of the input speech frames and each of the sample speech frames. Next, the distance values are used to calculate a matching score. Finally, the similarity between the input speech data and the sample speech data is determined according to this matching score.Type: GrantFiled: October 20, 2005Date of Patent: April 27, 2010Assignee: National Cheng Kung UniversityInventors: Jhing-Fa Wang, Po-Chuan Lin, Li-Chang Wen
-
Patent number: 7672835Abstract: An FFT unit performs an FFT process on high-frequency-eliminated, pitch-shifted voice data for one frame. A time scaling unit calculates a frequency amplitude, a phase, a phase difference between the present and immediately preceding frames, and an unwrapped version of the phase difference for each channel from which the frequency component was obtained by the FFT, detects a reference channel based on a peak one of the frequency amplitudes, and calculates the phase of each channel in a synthesized voice based on the reference channel, using results of the calculation. An IFFT unit processes each frequency component in accordance with the calculated phase, performs an IFFT process on the resulting frequency component, and produces synthesized voice data for one frame.Type: GrantFiled: December 19, 2005Date of Patent: March 2, 2010Assignee: Casio Computer Co., Ltd.Inventor: Masaru Setoguchi