Distance Patents (Class 704/238)
  • Patent number: 8595009
    Abstract: Methods and apparatuses for performing song detection on an audio signal are described. Clips of the audio signal are classified into classes comprising music. Class boundaries of music clips are detected as candidate boundaries of a first type. Combinations including non-overlapped sections are derived. Each section meets the following conditions: 1) including at least one music segment longer than a predetermined minimum song duration, 2) shorter than a predetermined maximum song duration, 3) both starting and ending with a music clip, and 4) a proportion of the music clips in each of the sections is greater than a predetermined minimum proportion. In this way, various possible song partitions in the audio signal can be obtained for investigation.
    Type: Grant
    Filed: July 26, 2012
    Date of Patent: November 26, 2013
    Assignee: Dolby Laboratories Licensing Corporation
    Inventors: Lie Lu, Claus Bauer
  • Patent number: 8593673
    Abstract: A method for routing a confirmation of receipt of a facsimile or portion thereof according to one embodiment of the present invention includes analyzing text of a facsimile for at least one of a meaning and a context of the text; and routing one or more confirmations to one or more destinations based on the analysis. A method for routing one or more confirmations according to another embodiment of the present invention includes analyzing a pattern of light and dark areas of a facsimile; correlating the pattern to one or more forms; and routing one or more confirmations to one or more destinations based on the correlation. Additional systems and methods are also presented.
    Type: Grant
    Filed: October 20, 2010
    Date of Patent: November 26, 2013
    Assignee: Kofax, Inc.
    Inventors: Roy Couchman, Roland G. Borrey
  • Patent number: 8554560
    Abstract: Discrimination between two classes comprises receiving a set of frames including an input signal and determining at least two different feature vectors for each of the frames. Discrimination between two classes further comprises classifying the two different feature vectors using sets of preclassifiers trained for at least two classes of events and from that classification, and determining values for at least one weighting factor. Discrimination between two classes still further comprises calculating a combined feature vector for each of the received frames by applying the weighting factor to the feature vectors and classifying the combined feature vector for each of the frames by using a set of classifiers trained for at least two classes of events.
    Type: Grant
    Filed: September 4, 2012
    Date of Patent: October 8, 2013
    Assignee: International Business Machines Corporation
    Inventor: Zica Valsan
  • Patent number: 8548806
    Abstract: A voice recognition device, a voice recognition method and a voice recognition program capable of appropriately restricting recognition objects based on voice input from a user to recognize the input voice with accuracy are provided.
    Type: Grant
    Filed: September 11, 2007
    Date of Patent: October 1, 2013
    Assignee: Honda Motor Co. Ltd.
    Inventor: Hisayuki Nagashima
  • Patent number: 8532988
    Abstract: A method for searching for an input symbol string, includes receiving (B) an input symbol string, proceeding (C) in a trie data structure to a calculation point indicated by the next symbol, calculating (D) distances at the calculation point, selecting (E) repeatedly the next branch to follow (C) to the next calculation point to repeat the calculation (D). After the calculation (G), selecting the symbol string having the shortest distance to the input symbol string on the basis of the performed calculations. To minimize the number of calculations, not only the distances are calculated (D) at the calculation points, but also the smallest possible length difference corresponding to each distance, and on the basis of each distance and corresponding length difference a reference value is calculated, and the branch is selected (E) in such a manner that next the routine proceeds from the calculation point producing the lowest reference value.
    Type: Grant
    Filed: July 3, 2003
    Date of Patent: September 10, 2013
    Assignee: Syslore Oy
    Inventor: Jorkki Hyvonen
  • Patent number: 8521529
    Abstract: An input signal is converted to a feature-space representation. The feature-space representation is projected onto a discriminant subspace using a linear discriminant analysis transform to enhance the separation of feature clusters. Dynamic programming is used to find global changes to derive optimal cluster boundaries. The cluster boundaries are used to identify the segments of the audio signal.
    Type: Grant
    Filed: April 18, 2005
    Date of Patent: August 27, 2013
    Assignee: Creative Technology Ltd
    Inventors: Michael M. Goodwin, Jean Laroche
  • Publication number: 20130166294
    Abstract: A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.
    Type: Application
    Filed: November 30, 2012
    Publication date: June 27, 2013
    Applicant: AT&T INTELLECTUAL PROPERTY II, L.P.
    Inventor: AT&T Intellectual Property II, L.P.
  • Patent number: 8451475
    Abstract: A method for routing a confirmation of receipt of a facsimile or portion thereof according to one embodiment of the present invention includes analyzing text of a facsimile for at least one of a meaning and a context of the text; and routing one or more confirmations to one or more destinations based on the analysis. A method for routing one or more confirmations according to another embodiment of the present invention includes analyzing a pattern of light and dark areas of a facsimile; correlating the pattern to one or more forms; and routing one or more confirmations to one or more destinations based on the correlation. Additional systems and methods are also presented.
    Type: Grant
    Filed: May 1, 2007
    Date of Patent: May 28, 2013
    Assignee: Kofax, Inc.
    Inventors: Roy Couchman, Roland G. Borrey
  • Patent number: 8380511
    Abstract: There is disclosed a system and method for automatically performing semantic categorization. In one embodiment at least one text description pertaining to a category set is accepted along with words that are anticipated to be uttered by a user pertaining to that category set; lexical chaining confidence score is attached to each pair matched between the anticipated words and the accepted text description. These confidence scores are used subsequently by a categorization circuit that accepts a text phrase utterance from an input source along with a category set pertaining to the accepted utterance. The categorization circuit, in one embodiment, creates word pairs matched between the accepted text phrase utterance and the accepted category set. From these word scores, the category pertaining to the utterance is determined based, at least in part, on the assigned lexical chaining confidence scores as previously determined.
    Type: Grant
    Filed: February 20, 2007
    Date of Patent: February 19, 2013
    Assignees: Intervoice Limited Partnership, Lymba Corporation
    Inventors: Ellis K. Cave, Mithun Balakrishna, Vincent Mo
  • Patent number: 8352262
    Abstract: A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words; a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, the memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying the scores by adding the score updates to the scores; and a selector circuit for selecting at least one n
    Type: Grant
    Filed: June 16, 2011
    Date of Patent: January 8, 2013
    Assignee: Zentian Limited
    Inventors: Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
  • Patent number: 8352263
    Abstract: The invention can recognize all languages and input words. It needs m unknown voices to represent m categories of known words with similar pronunciations. Words can be pronounced in any languages, dialects or accents. Each will be classified into one of m categories represented by its most similar unknown voice. When user pronounces a word, the invention finds its F most similar unknown voices. All words in F categories represented by F unknown voices will be arranged according to their pronunciation similarity and alphabetic letters. The pronounced word should be among the top words. Since we only find the F most similar unknown voices from m (=500) unknown voices and since the same word can be classified into several categories, our recognition method is stable for all users and can fast and accurately recognize all languages (English, Chinese and etc.) and input much more words without using samples.
    Type: Grant
    Filed: September 29, 2009
    Date of Patent: January 8, 2013
    Inventors: Tze-Fen Li, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
  • Patent number: 8346550
    Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a mobile device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.
    Type: Grant
    Filed: February 14, 2011
    Date of Patent: January 1, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Richard C. Rose, Sarangarajan Pathasarathy, Aaron Edward Rosenberg, Shrikanth Sambasivan Narayanan
  • Publication number: 20120310646
    Abstract: A speech recognition device and a speech recognition method thereof are disclosed. In the speech recognition method, a key phrase containing at least one key word is received. The speech recognition method comprises steps: receiving a sound source signal of a key word and generating a plurality of audio signals; transforming the audio signals into a plurality of frequency signals; receiving the frequency signals to obtain a space-frequency spectrum and an angular estimation value thereof; receiving the space-frequency spectrum to define and output at least one spatial eigenparameter and, and using the angular estimation value and the frequency signals to perform spotting and evaluation and outputting a Bhattacharyya distance; and receiving the spatial eigenparameter and the Bhattacharyya distance and using corresponding thresholds to determine correctness of the key phrase. Thereby this invention robustly achieves high speech recognition rate under very low SNR conditions.
    Type: Application
    Filed: July 7, 2011
    Publication date: December 6, 2012
    Applicant: NATIONAL CHIAO TUNG UNIVERSITY
    Inventors: JWU-SHENG HU, MING-TANG LEE, TING-CHAO WANG, CHIA HSIN YANG
  • Patent number: 8311813
    Abstract: Discrimination between at least two classes of events in an input signal is carried out in the following way. A set of frames containing an input signal is received, and at least two different feature vectors are determined for each of said frames. Said at least two different feature vectors are classified using respective sets of preclassifiers trained for said at least two classes of events. Values for at least one weighting factor are determined based on outputs of said preclassifiers for each of said frames. A combined feature vector is calculated for each of said frames by applying said at least one weighting factor to said at least two different feature vectors. Said combined feature vector is classified using a set of classifiers trained for said at least two classes of events.
    Type: Grant
    Filed: October 26, 2007
    Date of Patent: November 13, 2012
    Assignee: International Business Machines Corporation
    Inventor: Zica Valsan
  • Patent number: 8279465
    Abstract: A method for routing a facsimile according to one embodiment of the present invention includes analyzing text of a facsimile for at least one of a meaning and a context of the text; and routing the facsimile to one or more destinations based on the analysis. A method for routing a facsimile according to another embodiment of the present invention includes analyzing a pattern of light and dark areas of a facsimile; correlating the pattern to one or more forms; and routing the facsimile to one or more destinations based on the correlation. Additional systems and methods are also presented.
    Type: Grant
    Filed: May 1, 2007
    Date of Patent: October 2, 2012
    Assignee: Kofax, Inc.
    Inventor: Roy Couchman
  • Patent number: 8265932
    Abstract: A system and method for identifying audio command prompts for use in a voice response environment is provided. A signature is generated for audio samples each having preceding audio, reference phrase audio, and trailing audio segments. The trailing segment is removed and each of the preceding and reference phrase segments are divided into buffers. The buffers are transformed into discrete fourier transform buffers. One of the discrete fourier transform buffers from the reference phrase segment that is dissimilar to each of the discrete fourier transform buffers from the preceding segment is selected as the signature. Audio command prompts are processed to generate a discrete fourier transform. Each discrete fourier transform for the audio command prompts is compared with each of the signatures and a correlation value is determined. One such audio command prompt matches one such signature when the correlation value for that audio command prompt satisfies a threshold.
    Type: Grant
    Filed: October 3, 2011
    Date of Patent: September 11, 2012
    Assignee: Intellisist, Inc.
    Inventor: Martin R. M. Dunsmuir
  • Patent number: 8239201
    Abstract: Disclosed herein are methods for presenting speech from a selected text that is on a computing device. This method includes presenting text on a touch-sensitive display and having that text size within a threshold level so that the computing device can accurately determine the intent of the user when the user touches the touch screen. Once the user touch has been received, the computing device identifies and interprets the portion of text that is to be selected, and subsequently presents the text audibly to the user.
    Type: Grant
    Filed: October 24, 2008
    Date of Patent: August 7, 2012
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Alistair D. Conkie, Horst Schroeter
  • Patent number: 8234116
    Abstract: Measurement of Kullback-Leibler Divergence (KLD) between hidden Markov models (HMM) of acoustic units utilizes an unscented transform to approximate KLD between Gaussian mixtures. Dynamic programming equalizes the number of states between HMMs having a different number of states, while the total KLD of the HMMs is obtained by summing individual KLDs calculated by state pair by state pair comparisons.
    Type: Grant
    Filed: August 22, 2006
    Date of Patent: July 31, 2012
    Assignee: Microsoft Corporation
    Inventors: Peng Liu, Frank Kao-Ping K. Soong, Jian-Lai Zhou
  • Publication number: 20120166194
    Abstract: Disclosed herein are an apparatus and method for recognizing speech. The apparatus includes a frame-based speech recognition unit, a segment division unit, a segment feature extraction unit, a segment speech recognition performance unit, and a combination and synchronization unit. The frame-based speech recognition unit extracts frame speech feature vectors from a speech signal, and performs speech recognition on frames of the speech signal using the frame speech feature vectors and a frame-based probability model. The segment division unit divides the speech signal into segments. The segment feature extraction unit extracts segment speech feature vectors around a boundary between the segments. The segment speech recognition performance unit performs speech recognition on the segments of the speech signal using the segment speech feature vectors and a segment-based probability model.
    Type: Application
    Filed: December 22, 2011
    Publication date: June 28, 2012
    Applicant: Electronics and Telecommunications Research Institute
    Inventors: Ho-Young JUNG, Jeon-Gue PARK, Hoon CHUNG
  • Patent number: 8209174
    Abstract: A text-independent speaker verification system utilizes mel frequency cepstral coefficients analysis in the feature extraction blocks, template modeling with vector quantization in the pattern matching blocks, an adaptive threshold and an adaptive decision verdict and is implemented in a stand-alone device using less powerful microprocessors and smaller data storage devices than used by comparable systems of the prior art.
    Type: Grant
    Filed: April 17, 2009
    Date of Patent: June 26, 2012
    Assignee: Saudi Arabian Oil Company
    Inventor: Essam Abed Al-Telmissani
  • Patent number: 8204749
    Abstract: A system, method and computer-readable medium for practicing a method of emotion detection during a natural language dialog between a human and a computing device are disclosed. The method includes receiving an utterance from a user in a natural language dialog, receiving contextual information regarding the natural language dialog which is related to changes of emotion over time in the dialog, and detecting an emotion of the user based on the received contextual information. Examples of contextual information include, for example, differential statistics, joint statistics and distance statistics.
    Type: Grant
    Filed: March 21, 2011
    Date of Patent: June 19, 2012
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Dilek Z. Hakkani-Tur, Jackson J. Liscombe, Guiseppe Riccardi
  • Patent number: 8195451
    Abstract: In an information detecting apparatus (1), a speech kind discrimination unit (11) discriminates and classifies an audio signal at an information source into kind (category) such as music or speech, etc. on a predetermined time basis, and a memory unit/recording medium (13) records discrimination information thereof. A discrimination frequency calculating unit (15) calculates, on a predetermined time basis, discrimination frequency every kind at a predetermined time period longer than the time unit.
    Type: Grant
    Filed: February 10, 2004
    Date of Patent: June 5, 2012
    Assignee: Sony Corporation
    Inventor: Yasuhiro Toguri
  • Publication number: 20120109650
    Abstract: Disclosed herein is an apparatus and method for creating an acoustic model. The apparatus includes a binary tree creation unit, an information creation unit, and a binary tree reduction unit. The binary tree creation unit creates a binary tree by repeatedly merging a plurality of Gaussian components for each Hidden Markov Model (HMM) state of an acoustic model based on a distance measure reflecting a variation in likelihood score. The information creation unit creates information about information about the largest size of the acoustic model in accordance with a platform including a speech recognizer. The binary tree reduction unit reduces the binary tree in accordance with the information about the largest size of the acoustic model.
    Type: Application
    Filed: October 28, 2011
    Publication date: May 3, 2012
    Applicant: Electronics and Telecommunications Research Institute
    Inventors: Hoon-Young CHO, Young-Ik Kim, Il-Bin Lee, Seung-Hi Kim, Jun Park, Dong-Hyun Kim, Sang-Hun Kim
  • Patent number: 8170873
    Abstract: An approach to comparing events in word spotting, such as comparing putative and reference instances of a keyword, makes use of a set of models of subword units. For each of two acoustic events and for each of a series of times in each of the events, a probability associated with each of the models of the set of subword units is computed. Then, a quantity characterizing a comparison of the two acoustic events, one occurring in each of the two acoustic signals, is computed using the computed probabilities associated with each of the models.
    Type: Grant
    Filed: July 22, 2004
    Date of Patent: May 1, 2012
    Assignee: Nexidia Inc.
    Inventor: Robert W. Morris
  • Patent number: 8140330
    Abstract: Embodiments of a method and system for detecting repeated patterns in dialog systems are described. The system includes a dynamic time warping (DTW) based pattern comparison algorithm that is used to find the best matching parts between a correction utterance and an original utterance. Reference patterns are generated from the correction utterance by an unsupervised segmentation scheme. No significant information about the position of the repeated parts in the correction utterance is assumed, as each reference pattern is compared with the original utterance from the beginning of the utterance to the end. A pattern comparison process with DTW is executed without knowledge of fixed end-points. A recursive DTW computation is executed to find the best matching parts that are considered as the repeated parts as well as the end-points of the utterance.
    Type: Grant
    Filed: June 13, 2008
    Date of Patent: March 20, 2012
    Assignee: Robert Bosch GmbH
    Inventors: Mert Cevik, Fuliang Weng
  • Patent number: 8050917
    Abstract: A system including a conferencing telephone coupled to or in communication with an identification service. The identification service is configured to poll user devices of conference participants to determine or confirm identities. In response, the user devices transmit audio electronic business cards, which can include user voice samples and/or preprocessed voice recognition data. The identification service stores the resulting audio electronic business card data. When the corresponding participant speaks during the conference, the identification service identifies the speaker.
    Type: Grant
    Filed: September 27, 2007
    Date of Patent: November 1, 2011
    Assignee: Siemens Enterprise Communications, Inc.
    Inventors: Rami Caspi, William J. Beyda
  • Patent number: 8036888
    Abstract: A sound input from sound sources existing in a plurality of directions is accepted and converted into a signal on a frequency axis. A suppressing function to suppress the converted signal on a frequency axis is computed, an amplitude component of a signal on a frequency axis is multiplied by the computed suppressing function and the converted signal on a frequency axis is corrected. A phase component of each converted signal on a frequency axis is computed for each frequency and a difference of phase components is computed. A probability value indicative of probability of existence of a sound source in a predetermined direction is specified based on the computed difference and a suppressing function to suppress a sound input from a sound source other than a sound source in a predetermined direction is computed based on the specified probability value.
    Type: Grant
    Filed: September 13, 2006
    Date of Patent: October 11, 2011
    Assignee: Fujitsu Limited
    Inventor: Naoshi Matsuo
  • Patent number: 8024186
    Abstract: Embodiments of these location-based systems and methods for device interaction may allow a content delivery system to provide certain content to a device, or restrict certain content from being delivered to the device, based on the location of the device. When a user requests certain content the location of the device may be determined and compared against an access control list defining a set or rules regarding that content to determine if the requested content may be accessed from that location. If the content may be accessed from this location the content may be delivered, otherwise an error message, or another option, may be delivered to the device. Similarly, the location of a device may be utilized to tailor the delivery of content to a device, such that content may be provided to a user based on the user's location, in certain cases with little or no stimulus from the user.
    Type: Grant
    Filed: May 24, 2006
    Date of Patent: September 20, 2011
    Assignee: Mobitv, Inc.
    Inventor: Jeremy S. De Bonet
  • Patent number: 8024188
    Abstract: An optimal selection or decision strategy is described through an example that includes use in dialog systems. The selection strategy or method includes receiving multiple predictions and multiple probabilities. The received predictions predict the content of a received input and each of the probabilities corresponds to one of the predictions. In an example dialog system, the received input includes an utterance. The selection method includes dynamically selecting a set of predictions from the received predictions by generating ranked predictions. The ranked predictions are generated by ordering the plurality of predictions according to descending probability.
    Type: Grant
    Filed: August 24, 2007
    Date of Patent: September 20, 2011
    Assignee: Robert Bosch GmbH
    Inventors: Junling Hu, Fabrizio Morbini, Fuliang Weng, Xue Liu
  • Patent number: 8005674
    Abstract: A recognition model set is generated. A technique is described to take advantage of the logarithm likelihood of real data for cross entropy to measure the mismatch between a training data and a training data derived model, and compare such type of mismatches between class dependent models and class independent model for evidence of model replacement. By using change of cross entropies in the decision of adding class independent Gaussian Mixture Models (GMMs), the good performance of class dependent models is largely retained, while decreasing the size and complexity of the model.
    Type: Grant
    Filed: July 10, 2007
    Date of Patent: August 23, 2011
    Assignee: International Business Machines Corporation
    Inventors: Eric W Janke, Bin Jia
  • Publication number: 20110172999
    Abstract: A system, method and computer-readable medium for practicing a method of emotion detection during a natural language dialog between a human and a computing device are disclosed. The method includes receiving an utterance from a user in a natural language dialog, receiving contextual information regarding the natural language dialog which is related to changes of emotion over time in the dialog, and detecting an emotion of the user based on the received contextual information. Examples of contextual information include, for example, differential statistics, joint statistics and distance statistics.
    Type: Application
    Filed: March 21, 2011
    Publication date: July 14, 2011
    Applicant: AT&T Corp.
    Inventors: Dilek Z. Hakkani-Tur, Jackson J. Liscombe, Guiseppe Riccardi
  • Patent number: 7979277
    Abstract: A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words; a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, said memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying said scores by adding said score updates to said scores; and a selector circuit for selecting at least o
    Type: Grant
    Filed: September 14, 2005
    Date of Patent: July 12, 2011
    Assignee: Zentian Limited
    Inventors: Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
  • Publication number: 20110166857
    Abstract: A human voice distinguishing method and device are provided. The method involves: taking every n sampling points of the current frame of audio signals as one subsection, wherein n is a positive integer, judging whether two adjacent subsections have transition relative to a distinguishing threshold, wherein the sliding maximum absolute value of the two adjacent subsections is more and less than the distinguishing threshold respectively, if so, then determining the current frame to be human voice, where the sliding maximum absolute value of the subsection is obtained by the following method: taking the maximum value of absolute intensity of every sampling point in this subsection as the initial maximum absolute value of this subsection, and taking the maximum value of the initial maximum absolute value of this subsection and m subsections following this subsection as the sliding maximum absolute value of this subsection, wherein m is a positive integer.
    Type: Application
    Filed: September 15, 2009
    Publication date: July 7, 2011
    Applicant: ACTIONS SEMICONDUCTOR CO. LTD.
    Inventors: Xiangyong Xie, Zhan Chen
  • Patent number: 7970609
    Abstract: Sound signals from sound sources present in multiple directions are accepted as inputs of multiple channels, and signal of each channel is transformed into a signal on a frequency axis. A phase component of the transformed signal is calculated for each identical frequency, and phase difference between the multiple channels is calculated. An amplitude component of the transformed signal is calculated, and a noise component is estimated from the calculated amplitude component. An SN ratio for each frequency is calculated on the basis of the amplitude component and the estimated noise component, and frequencies at which the SN ratios are larger than a predetermined value are extracted. Difference between arrival distances is calculated on the basis of the phase difference at selected frequency, and the arrival direction in which it is estimated that the target sound source is present is calculated.
    Type: Grant
    Filed: July 20, 2007
    Date of Patent: June 28, 2011
    Assignee: Fujitsu Limited
    Inventor: Shoji Hayakawa
  • Patent number: 7970115
    Abstract: A communications system is provided that includes: (a) a speech discrimination agent 136 operable to generate a speech profile of a first party to a voice call; and (b) a speech modification agent 140 operable to adjust, based on the speech profile, a spectral characteristic of a voice stream from the first party to form a modified voice stream, the modified voice stream being provided to the second party.
    Type: Grant
    Filed: October 5, 2005
    Date of Patent: June 28, 2011
    Assignee: Avaya Inc.
    Inventors: Marc W. J. Coughlan, Alexander Q. Forbes, Alexander M. Scholte, Peter D. Runcie, Ralph Warta
  • Patent number: 7949525
    Abstract: A spoken language understanding method and system are provided. The method includes classifying a set of labeled candidate utterances based on a previously trained classifier, generating classification types for each candidate utterance, receiving confidence scores for the classification types from the trained classifier, sorting the classified utterances based on an analysis of the confidence score of each candidate utterance compared to a respective label of the candidate utterance, and rechecking candidate utterances according to the analysis. The system includes modules configured to control a processor in the system to perform the steps of the method.
    Type: Grant
    Filed: June 16, 2009
    Date of Patent: May 24, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Dilek Z. Hakkani-Tur, Mazin G. Rahim, Gokhan Tur
  • Patent number: 7912720
    Abstract: A system, method and computer-readable medium for practicing a method of emotion detection during a natural language dialog between a human and a computing device are disclosed. The method includes receiving an utterance from a user in a natural language dialog between a human and a computing device, receiving contextual information regarding the natural language dialog which is related to changes of emotion over time in the dialog, and detecting an emotion of the user based on the received contextual information. Examples of contextual information include, for example, differential statistics, joint statistics and distance statistics.
    Type: Grant
    Filed: July 20, 2005
    Date of Patent: March 22, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Dilek Z. Hakkani-Tur, Jackson J. Liscombe, Guiseppe Riccardi
  • Patent number: 7904294
    Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a mobile device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.
    Type: Grant
    Filed: April 9, 2007
    Date of Patent: March 8, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Richard C. Rose, Sarangarajan Pathasarathy, Aaron Edward Rosenberg, Shrikanth Sambasivan Narayanan
  • Patent number: 7869997
    Abstract: A system and method of recognizing speech comprises an audio receiving element and a computer server. The audio receiving element and the computer server perform the process steps of the method. The method involves training a stored set of phonemes by converting them into n-dimensional space, where n is a relatively large number. Once the stored phonemes are converted, they are transformed using single value decomposition to conform the data generally into a hypersphere. The received phonemes from the audio-receiving element are also converted into n-dimensional space and transformed using single value decomposition to conform the data into a hypersphere. The method compares the transformed received phoneme to each transformed stored phoneme by comparing a first distance from a center of the hypersphere to a point associated with the transformed received phoneme and a second distance from the center of the hypersphere to a point associated with the respective transformed stored phoneme.
    Type: Grant
    Filed: March 28, 2008
    Date of Patent: January 11, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Bishnu Saroop Atal
  • Publication number: 20100324896
    Abstract: A method for managing interactive dialog between a machine and a user. In one embodiment, an interaction between the machine and the user is managed by determining at least one likelihood value which is dependent upon a possible speech onset of the user. In another embodiment, the likelihood value can be dependent on a model of a desire of the user for specific items, a model of an attention of the user to specific items, or a model of turn-taking cues. The values can be used to determine a mode confidence value that is used by the system to determine the nature of prompts provided to the user.
    Type: Application
    Filed: August 24, 2010
    Publication date: December 23, 2010
    Inventors: David Attwater, Bruce Balentine
  • Patent number: 7853449
    Abstract: Techniques are provided for generating improved language modeling. Such improved modeling is achieved by conditioning a language model on a state of a dialog for which the language model is employed. For example, the techniques of the invention may improve modeling of language for use in a speech recognizer of an automatic natural language based dialog system. Improved usability of the dialog system arises from better recognition of a user's utterances by a speech recognizer, associated with the dialog system, using the dialog state-conditioned language models. By way of example, the state of the dialog may be quantified as: (i) the internal state of the natural language understanding part of the dialog system; or (ii) words in the prompt that the dialog system played to the user.
    Type: Grant
    Filed: March 28, 2008
    Date of Patent: December 14, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Satyanarayana Dharanipragada, Michael Daniel Monkowski, Harry W. Printz, Karthik Visweswariah
  • Publication number: 20100277579
    Abstract: Disclosed are an apparatus and method of deducing a user's intention using motion information. The user's intention deduction apparatus includes a speech intention determining unit configured to predict a speech intention regarding a user's speech using motion information sensed by at least one motion capture sensor, and a controller configured to control operation of detecting a voice section from a received sound signal based on the predicted speech intention.
    Type: Application
    Filed: April 29, 2010
    Publication date: November 4, 2010
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jeong-Mi CHO, Jeong-Su Kim, Won-Chul Bang, Nam-Hoon Kim
  • Patent number: 7797148
    Abstract: A phrase-based translation system and method includes a statistically integrated phrase lattice (SIPL) (H) which represents an entire translational model. An input (I) is translated by determining a best path through an entire lattice (S) by performing an efficient composition operation between the input and the SIPL. The efficient composition operation is performed by a multiple level search where each operand in the efficient composition operation represents a different search level.
    Type: Grant
    Filed: June 4, 2008
    Date of Patent: September 14, 2010
    Assignee: International Business Machines Corporation
    Inventors: Stanley Chen, Yuqing Gao, Bowen Zhou
  • Patent number: 7761296
    Abstract: A system and method for rescoring the N-best hypotheses from an automatic speech recognition system by comparing an original speech waveform to synthetic speech waveforms that are generated for each text sequence of the N-best hypotheses. A distance is calculated from the original speech waveform to each of the synthesized waveforms, and the text associated with the synthesized waveform that is determined to be closest to the original waveform is selected as the final hypothesis. The original waveform and each synthesized waveform are aligned to a corresponding text sequence on a phoneme level. The mean of the feature vectors which align to each phoneme is computed for the original waveform as well as for each of the synthesized hypotheses.
    Type: Grant
    Filed: April 2, 1999
    Date of Patent: July 20, 2010
    Assignee: International Business Machines Corporation
    Inventors: Raimo Bakis, Ellen M. Eide
  • Patent number: 7747439
    Abstract: A method of correctly segmenting phonemes by determining a boundary indicating a start point and an end point of each of the segmented phonemes, and correctly finding the phoneme in the speech signal by determining which phoneme in a phoneme recognition standard table corresponds to each of the segmented phonemes. Using this phoneme recognition method, an amount of computation can be significantly reduced, and the phoneme in the speech signal can be easily found by calculating probability distances between phonemes.
    Type: Grant
    Filed: March 5, 2007
    Date of Patent: June 29, 2010
    Assignee: Samsung Electronics Co., Ltd
    Inventor: Hyun-Soo Kim
  • Patent number: 7739102
    Abstract: A system, method, and computer program product for domain-independent natural language understanding, including at least one of forming pairs of words and/or phrases in a sentence, wherein each word and/or phrase is paired with every other word and/or phrase; determining meanings for the words and/or phrases; assigning numeric codes that uniquely identify semantic concepts to those word and/or phrase meanings; comparing the numeric code of each word and/or phrase with each numeric code of the other word and/or phrase in the pair; selecting the pairs with the best relationships; combining highly-related pairs with other highly-related pairs to form longer groups of words; exchanging numeric codes for the longer groups with numeric codes having a weaker relationship to determine if the exchanged numeric codes provide an overall stronger relationship; and forming longer and longer groups until the sentence is understood.
    Type: Grant
    Filed: October 7, 2004
    Date of Patent: June 15, 2010
    Inventor: Howard J. Bender
  • Patent number: 7739111
    Abstract: A pattern matching method for matching between a first symbol sequence and a second symbol sequence which is shorter than the first symbol sequence is provided. The method includes the steps of performing DP matching between the first and second symbol sequences to create a matrix of the DP matching transition, detecting the maximum length of lengths of consecutive correct answers based on the matrix of the DP matching transition, and calculating similarity based on the maximum length.
    Type: Grant
    Filed: August 9, 2006
    Date of Patent: June 15, 2010
    Assignee: Canon Kabushiki Kaisha
    Inventor: Kazue Kaneko
  • Patent number: 7725307
    Abstract: An intelligent query system for processing voiced-based queries is disclosed. This distributed client-server system, typically implemented on an intranet or over the Internet accepts a user's queries at his/her computer, PDA or workstation using a speech input interface. After converting the user's query from speech to text, a natural language engine, a database processor and a full-text SQL database is implemented to find a single answer that best matches the user's query. Both statistical and semantic decoding are used to assist and improve the performance of the query recognition.
    Type: Grant
    Filed: August 29, 2003
    Date of Patent: May 25, 2010
    Assignee: Phoenix Solutions, Inc.
    Inventor: Ian M. Bennett
  • Patent number: 7707032
    Abstract: A method and system used to determine the similarity between an input speech data and a sample speech data is provided. First, the input speech data is segmented into a plurality of input speech frames and the sample speech data is segmented into a plurality of sample speech frames. Then, the input speech frames and the sample speech frames are used to build a matching matrix, wherein the matching matrix comprises the distance values between each of the input speech frames and each of the sample speech frames. Next, the distance values are used to calculate a matching score. Finally, the similarity between the input speech data and the sample speech data is determined according to this matching score.
    Type: Grant
    Filed: October 20, 2005
    Date of Patent: April 27, 2010
    Assignee: National Cheng Kung University
    Inventors: Jhing-Fa Wang, Po-Chuan Lin, Li-Chang Wen
  • Patent number: 7672835
    Abstract: An FFT unit performs an FFT process on high-frequency-eliminated, pitch-shifted voice data for one frame. A time scaling unit calculates a frequency amplitude, a phase, a phase difference between the present and immediately preceding frames, and an unwrapped version of the phase difference for each channel from which the frequency component was obtained by the FFT, detects a reference channel based on a peak one of the frequency amplitudes, and calculates the phase of each channel in a synthesized voice based on the reference channel, using results of the calculation. An IFFT unit processes each frequency component in accordance with the calculated phase, performs an IFFT process on the resulting frequency component, and produces synthesized voice data for one frame.
    Type: Grant
    Filed: December 19, 2005
    Date of Patent: March 2, 2010
    Assignee: Casio Computer Co., Ltd.
    Inventor: Masaru Setoguchi