Distance Patents (Class 704/238)

Method and apparatus for performing song detection on audio signal

Patent number: 8595009

Abstract: Methods and apparatuses for performing song detection on an audio signal are described. Clips of the audio signal are classified into classes comprising music. Class boundaries of music clips are detected as candidate boundaries of a first type. Combinations including non-overlapped sections are derived. Each section meets the following conditions: 1) including at least one music segment longer than a predetermined minimum song duration, 2) shorter than a predetermined maximum song duration, 3) both starting and ending with a music clip, and 4) a proportion of the music clips in each of the sections is greater than a predetermined minimum proportion. In this way, various possible song partitions in the audio signal can be obtained for investigation.

Type: Grant

Filed: July 26, 2012

Date of Patent: November 26, 2013

Assignee: Dolby Laboratories Licensing Corporation

Inventors: Lie Lu, Claus Bauer
Systems and methods for routing a facsimile confirmation based on content

Patent number: 8593673

Abstract: A method for routing a confirmation of receipt of a facsimile or portion thereof according to one embodiment of the present invention includes analyzing text of a facsimile for at least one of a meaning and a context of the text; and routing one or more confirmations to one or more destinations based on the analysis. A method for routing one or more confirmations according to another embodiment of the present invention includes analyzing a pattern of light and dark areas of a facsimile; correlating the pattern to one or more forms; and routing one or more confirmations to one or more destinations based on the correlation. Additional systems and methods are also presented.

Type: Grant

Filed: October 20, 2010

Date of Patent: November 26, 2013

Assignee: Kofax, Inc.

Inventors: Roy Couchman, Roland G. Borrey
Voice activity detection

Patent number: 8554560

Abstract: Discrimination between two classes comprises receiving a set of frames including an input signal and determining at least two different feature vectors for each of the frames. Discrimination between two classes further comprises classifying the two different feature vectors using sets of preclassifiers trained for at least two classes of events and from that classification, and determining values for at least one weighting factor. Discrimination between two classes still further comprises calculating a combined feature vector for each of the received frames by applying the weighting factor to the feature vectors and classifying the combined feature vector for each of the frames by using a set of classifiers trained for at least two classes of events.

Type: Grant

Filed: September 4, 2012

Date of Patent: October 8, 2013

Assignee: International Business Machines Corporation

Inventor: Zica Valsan
Voice recognition device, voice recognition method, and voice recognition program

Patent number: 8548806

Abstract: A voice recognition device, a voice recognition method and a voice recognition program capable of appropriately restricting recognition objects based on voice input from a user to recognize the input voice with accuracy are provided.

Type: Grant

Filed: September 11, 2007

Date of Patent: October 1, 2013

Assignee: Honda Motor Co. Ltd.

Inventor: Hisayuki Nagashima
Searching for symbol string

Patent number: 8532988

Abstract: A method for searching for an input symbol string, includes receiving (B) an input symbol string, proceeding (C) in a trie data structure to a calculation point indicated by the next symbol, calculating (D) distances at the calculation point, selecting (E) repeatedly the next branch to follow (C) to the next calculation point to repeat the calculation (D). After the calculation (G), selecting the symbol string having the shortest distance to the input symbol string on the basis of the performed calculations. To minimize the number of calculations, not only the distances are calculated (D) at the calculation points, but also the smallest possible length difference corresponding to each distance, and on the basis of each distance and corresponding length difference a reference value is calculated, and the branch is selected (E) in such a manner that next the routine proceeds from the calculation point producing the lowest reference value.

Type: Grant

Filed: July 3, 2003

Date of Patent: September 10, 2013

Assignee: Syslore Oy

Inventor: Jorkki Hyvonen
Method for segmenting audio signals

Patent number: 8521529

Abstract: An input signal is converted to a feature-space representation. The feature-space representation is projected onto a discriminant subspace using a linear discriminant analysis transform to enhance the separation of feature clusters. Dynamic programming is used to find global changes to derive optimal cluster boundaries. The cluster boundaries are used to identify the segments of the audio signal.

Type: Grant

Filed: April 18, 2005

Date of Patent: August 27, 2013

Assignee: Creative Technology Ltd

Inventors: Michael M. Goodwin, Jean Laroche
Frame Erasure Concealment Technique for a Bitstream-Based Feature Extractor

Publication number: 20130166294

Abstract: A frame erasure concealment technique for a bitstream-based feature extractor in a speech recognition system particularly suited for use in a wireless communication system operates to “delete” each frame in which an erasure is declared. The deletions thus reduce the length of the observation sequence, but have been found to provide for sufficient speech recognition based on both single word and “string” tests of the deletion technique.

Type: Application

Filed: November 30, 2012

Publication date: June 27, 2013

Applicant: AT&T INTELLECTUAL PROPERTY II, L.P.

Inventor: AT&T Intellectual Property II, L.P.
Systems and methods for routing a facsimile confirmation based on content

Patent number: 8451475

Abstract: A method for routing a confirmation of receipt of a facsimile or portion thereof according to one embodiment of the present invention includes analyzing text of a facsimile for at least one of a meaning and a context of the text; and routing one or more confirmations to one or more destinations based on the analysis. A method for routing one or more confirmations according to another embodiment of the present invention includes analyzing a pattern of light and dark areas of a facsimile; correlating the pattern to one or more forms; and routing one or more confirmations to one or more destinations based on the correlation. Additional systems and methods are also presented.

Type: Grant

Filed: May 1, 2007

Date of Patent: May 28, 2013

Assignee: Kofax, Inc.

Inventors: Roy Couchman, Roland G. Borrey
System and method for semantic categorization

Patent number: 8380511

Abstract: There is disclosed a system and method for automatically performing semantic categorization. In one embodiment at least one text description pertaining to a category set is accepted along with words that are anticipated to be uttered by a user pertaining to that category set; lexical chaining confidence score is attached to each pair matched between the anticipated words and the accepted text description. These confidence scores are used subsequently by a categorization circuit that accepts a text phrase utterance from an input source along with a category set pertaining to the accepted utterance. The categorization circuit, in one embodiment, creates word pairs matched between the accepted text phrase utterance and the accepted category set. From these word scores, the category pertaining to the utterance is determined based, at least in part, on the assigned lexical chaining confidence scores as previously determined.

Type: Grant

Filed: February 20, 2007

Date of Patent: February 19, 2013

Assignees: Intervoice Limited Partnership, Lymba Corporation

Inventors: Ellis K. Cave, Mithun Balakrishna, Vincent Mo
Speech recognition circuit and method

Patent number: 8352262

Abstract: A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words; a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, the memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying the scores by adding the score updates to the scores; and a selector circuit for selecting at least one n

Type: Grant

Filed: June 16, 2011

Date of Patent: January 8, 2013

Assignee: Zentian Limited

Inventors: Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
Method for speech recognition on all languages and for inputing words using speech recognition

Patent number: 8352263

Abstract: The invention can recognize all languages and input words. It needs m unknown voices to represent m categories of known words with similar pronunciations. Words can be pronounced in any languages, dialects or accents. Each will be classified into one of m categories represented by its most similar unknown voice. When user pronounces a word, the invention finds its F most similar unknown voices. All words in F categories represented by F unknown voices will be arranged according to their pronunciation similarity and alphabetic letters. The pronounced word should be among the top words. Since we only find the F most similar unknown voices from m (=500) unknown voices and since the same word can be classified into several categories, our recognition method is stable for all users and can fast and accurately recognize all languages (English, Chinese and etc.) and input much more words without using samples.

Type: Grant

Filed: September 29, 2009

Date of Patent: January 8, 2013

Inventors: Tze-Fen Li, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
System and method for processing speech recognition

Patent number: 8346550

Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a mobile device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.

Type: Grant

Filed: February 14, 2011

Date of Patent: January 1, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Richard C. Rose, Sarangarajan Pathasarathy, Aaron Edward Rosenberg, Shrikanth Sambasivan Narayanan
SPEECH RECOGNITION DEVICE AND SPEECH RECOGNITION METHOD

Publication number: 20120310646

Abstract: A speech recognition device and a speech recognition method thereof are disclosed. In the speech recognition method, a key phrase containing at least one key word is received. The speech recognition method comprises steps: receiving a sound source signal of a key word and generating a plurality of audio signals; transforming the audio signals into a plurality of frequency signals; receiving the frequency signals to obtain a space-frequency spectrum and an angular estimation value thereof; receiving the space-frequency spectrum to define and output at least one spatial eigenparameter and, and using the angular estimation value and the frequency signals to perform spotting and evaluation and outputting a Bhattacharyya distance; and receiving the spatial eigenparameter and the Bhattacharyya distance and using corresponding thresholds to determine correctness of the key phrase. Thereby this invention robustly achieves high speech recognition rate under very low SNR conditions.

Type: Application

Filed: July 7, 2011

Publication date: December 6, 2012

Applicant: NATIONAL CHIAO TUNG UNIVERSITY

Inventors: JWU-SHENG HU, MING-TANG LEE, TING-CHAO WANG, CHIA HSIN YANG
Voice activity detection system and method

Patent number: 8311813

Abstract: Discrimination between at least two classes of events in an input signal is carried out in the following way. A set of frames containing an input signal is received, and at least two different feature vectors are determined for each of said frames. Said at least two different feature vectors are classified using respective sets of preclassifiers trained for said at least two classes of events. Values for at least one weighting factor are determined based on outputs of said preclassifiers for each of said frames. A combined feature vector is calculated for each of said frames by applying said at least one weighting factor to said at least two different feature vectors. Said combined feature vector is classified using a set of classifiers trained for said at least two classes of events.

Type: Grant

Filed: October 26, 2007

Date of Patent: November 13, 2012

Assignee: International Business Machines Corporation

Inventor: Zica Valsan
Systems and methods for routing facsimiles based on content

Patent number: 8279465

Abstract: A method for routing a facsimile according to one embodiment of the present invention includes analyzing text of a facsimile for at least one of a meaning and a context of the text; and routing the facsimile to one or more destinations based on the analysis. A method for routing a facsimile according to another embodiment of the present invention includes analyzing a pattern of light and dark areas of a facsimile; correlating the pattern to one or more forms; and routing the facsimile to one or more destinations based on the correlation. Additional systems and methods are also presented.

Type: Grant

Filed: May 1, 2007

Date of Patent: October 2, 2012

Assignee: Kofax, Inc.

Inventor: Roy Couchman
System and method for identifying audio command prompts for use in a voice response environment

Patent number: 8265932

Abstract: A system and method for identifying audio command prompts for use in a voice response environment is provided. A signature is generated for audio samples each having preceding audio, reference phrase audio, and trailing audio segments. The trailing segment is removed and each of the preceding and reference phrase segments are divided into buffers. The buffers are transformed into discrete fourier transform buffers. One of the discrete fourier transform buffers from the reference phrase segment that is dissimilar to each of the discrete fourier transform buffers from the preceding segment is selected as the signature. Audio command prompts are processed to generate a discrete fourier transform. Each discrete fourier transform for the audio command prompts is compared with each of the signatures and a correlation value is determined. One such audio command prompt matches one such signature when the correlation value for that audio command prompt satisfies a threshold.

Type: Grant

Filed: October 3, 2011

Date of Patent: September 11, 2012

Assignee: Intellisist, Inc.

Inventor: Martin R. M. Dunsmuir
System and method for audibly presenting selected text

Patent number: 8239201

Abstract: Disclosed herein are methods for presenting speech from a selected text that is on a computing device. This method includes presenting text on a touch-sensitive display and having that text size within a threshold level so that the computing device can accurately determine the intent of the user when the user touches the touch screen. Once the user touch has been received, the computing device identifies and interprets the portion of text that is to be selected, and subsequently presents the text audibly to the user.

Type: Grant

Filed: October 24, 2008

Date of Patent: August 7, 2012

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Alistair D. Conkie, Horst Schroeter
Calculating cost measures between HMM acoustic models

Patent number: 8234116

Abstract: Measurement of Kullback-Leibler Divergence (KLD) between hidden Markov models (HMM) of acoustic units utilizes an unscented transform to approximate KLD between Gaussian mixtures. Dynamic programming equalizes the number of states between HMMs having a different number of states, while the total KLD of the HMMs is obtained by summing individual KLDs calculated by state pair by state pair comparisons.

Type: Grant

Filed: August 22, 2006

Date of Patent: July 31, 2012

Assignee: Microsoft Corporation

Inventors: Peng Liu, Frank Kao-Ping K. Soong, Jian-Lai Zhou
METHOD AND APPARATUS FOR RECOGNIZING SPEECH

Publication number: 20120166194

Abstract: Disclosed herein are an apparatus and method for recognizing speech. The apparatus includes a frame-based speech recognition unit, a segment division unit, a segment feature extraction unit, a segment speech recognition performance unit, and a combination and synchronization unit. The frame-based speech recognition unit extracts frame speech feature vectors from a speech signal, and performs speech recognition on frames of the speech signal using the frame speech feature vectors and a frame-based probability model. The segment division unit divides the speech signal into segments. The segment feature extraction unit extracts segment speech feature vectors around a boundary between the segments. The segment speech recognition performance unit performs speech recognition on the segments of the speech signal using the segment speech feature vectors and a segment-based probability model.

Type: Application

Filed: December 22, 2011

Publication date: June 28, 2012

Applicant: Electronics and Telecommunications Research Institute

Inventors: Ho-Young JUNG, Jeon-Gue PARK, Hoon CHUNG
Speaker verification system

Patent number: 8209174

Abstract: A text-independent speaker verification system utilizes mel frequency cepstral coefficients analysis in the feature extraction blocks, template modeling with vector quantization in the pattern matching blocks, an adaptive threshold and an adaptive decision verdict and is implemented in a stand-alone device using less powerful microprocessors and smaller data storage devices than used by comparable systems of the prior art.

Type: Grant

Filed: April 17, 2009

Date of Patent: June 26, 2012

Assignee: Saudi Arabian Oil Company

Inventor: Essam Abed Al-Telmissani
System and method for building emotional machines

Patent number: 8204749

Abstract: A system, method and computer-readable medium for practicing a method of emotion detection during a natural language dialog between a human and a computing device are disclosed. The method includes receiving an utterance from a user in a natural language dialog, receiving contextual information regarding the natural language dialog which is related to changes of emotion over time in the dialog, and detecting an emotion of the user based on the received contextual information. Examples of contextual information include, for example, differential statistics, joint statistics and distance statistics.

Type: Grant

Filed: March 21, 2011

Date of Patent: June 19, 2012

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Dilek Z. Hakkani-Tur, Jackson J. Liscombe, Guiseppe Riccardi
Apparatus and method for detecting speech and music portions of an audio signal

Patent number: 8195451

Abstract: In an information detecting apparatus (1), a speech kind discrimination unit (11) discriminates and classifies an audio signal at an information source into kind (category) such as music or speech, etc. on a predetermined time basis, and a memory unit/recording medium (13) records discrimination information thereof. A discrimination frequency calculating unit (15) calculates, on a predetermined time basis, discrimination frequency every kind at a predetermined time period longer than the time unit.

Type: Grant

Filed: February 10, 2004

Date of Patent: June 5, 2012

Assignee: Sony Corporation

Inventor: Yasuhiro Toguri
APPARATUS AND METHOD FOR CREATING ACOUSTIC MODEL

Publication number: 20120109650

Abstract: Disclosed herein is an apparatus and method for creating an acoustic model. The apparatus includes a binary tree creation unit, an information creation unit, and a binary tree reduction unit. The binary tree creation unit creates a binary tree by repeatedly merging a plurality of Gaussian components for each Hidden Markov Model (HMM) state of an acoustic model based on a distance measure reflecting a variation in likelihood score. The information creation unit creates information about information about the largest size of the acoustic model in accordance with a platform including a speech recognizer. The binary tree reduction unit reduces the binary tree in accordance with the information about the largest size of the acoustic model.

Type: Application

Filed: October 28, 2011

Publication date: May 3, 2012

Applicant: Electronics and Telecommunications Research Institute

Inventors: Hoon-Young CHO, Young-Ik Kim, Il-Bin Lee, Seung-Hi Kim, Jun Park, Dong-Hyun Kim, Sang-Hun Kim
Comparing events in word spotting

Patent number: 8170873

Abstract: An approach to comparing events in word spotting, such as comparing putative and reference instances of a keyword, makes use of a set of models of subword units. For each of two acoustic events and for each of a series of times in each of the events, a probability associated with each of the models of the set of subword units is computed. Then, a quantity characterizing a comparison of the two acoustic events, one occurring in each of the two acoustic signals, is computed using the computed probabilities associated with each of the models.

Type: Grant

Filed: July 22, 2004

Date of Patent: May 1, 2012

Assignee: Nexidia Inc.

Inventor: Robert W. Morris
System and method for detecting repeated patterns in dialog systems

Patent number: 8140330

Abstract: Embodiments of a method and system for detecting repeated patterns in dialog systems are described. The system includes a dynamic time warping (DTW) based pattern comparison algorithm that is used to find the best matching parts between a correction utterance and an original utterance. Reference patterns are generated from the correction utterance by an unsupervised segmentation scheme. No significant information about the position of the repeated parts in the correction utterance is assumed, as each reference pattern is compared with the original utterance from the beginning of the utterance to the end. A pattern comparison process with DTW is executed without knowledge of fixed end-points. A recursive DTW computation is executed to find the best matching parts that are considered as the repeated parts as well as the end-points of the utterance.

Type: Grant

Filed: June 13, 2008

Date of Patent: March 20, 2012

Assignee: Robert Bosch GmbH

Inventors: Mert Cevik, Fuliang Weng
Method and apparatus for identification of conference call participants

Patent number: 8050917

Abstract: A system including a conferencing telephone coupled to or in communication with an identification service. The identification service is configured to poll user devices of conference participants to determine or confirm identities. In response, the user devices transmit audio electronic business cards, which can include user voice samples and/or preprocessed voice recognition data. The identification service stores the resulting audio electronic business card data. When the corresponding participant speaks during the conference, the identification service identifies the speaker.

Type: Grant

Filed: September 27, 2007

Date of Patent: November 1, 2011

Assignee: Siemens Enterprise Communications, Inc.

Inventors: Rami Caspi, William J. Beyda
Collecting sound device with directionality, collecting sound method with directionality and memory product

Patent number: 8036888

Abstract: A sound input from sound sources existing in a plurality of directions is accepted and converted into a signal on a frequency axis. A suppressing function to suppress the converted signal on a frequency axis is computed, an amplitude component of a signal on a frequency axis is multiplied by the computed suppressing function and the converted signal on a frequency axis is corrected. A phase component of each converted signal on a frequency axis is computed for each frequency and a difference of phase components is computed. A probability value indicative of probability of existence of a sound source in a predetermined direction is specified based on the computed difference and a suppressing function to suppress a sound input from a sound source other than a sound source in a predetermined direction is computed based on the specified probability value.

Type: Grant

Filed: September 13, 2006

Date of Patent: October 11, 2011

Assignee: Fujitsu Limited

Inventor: Naoshi Matsuo
System and method for location based interaction with a device

Patent number: 8024186

Abstract: Embodiments of these location-based systems and methods for device interaction may allow a content delivery system to provide certain content to a device, or restrict certain content from being delivered to the device, based on the location of the device. When a user requests certain content the location of the device may be determined and compared against an access control list defining a set or rules regarding that content to determine if the requested content may be accessed from that location. If the content may be accessed from this location the content may be delivered, otherwise an error message, or another option, may be delivered to the device. Similarly, the location of a device may be utilized to tailor the delivery of content to a device, such that content may be provided to a user based on the user's location, in certain cases with little or no stimulus from the user.

Type: Grant

Filed: May 24, 2006

Date of Patent: September 20, 2011

Assignee: Mobitv, Inc.

Inventor: Jeremy S. De Bonet
Method and system of optimal selection strategy for statistical classifications

Patent number: 8024188

Abstract: An optimal selection or decision strategy is described through an example that includes use in dialog systems. The selection strategy or method includes receiving multiple predictions and multiple probabilities. The received predictions predict the content of a received input and each of the probabilities corresponds to one of the predictions. In an example dialog system, the received input includes an utterance. The selection method includes dynamically selecting a set of predictions from the received predictions by generating ranked predictions. The ranked predictions are generated by ordering the plurality of predictions according to descending probability.

Type: Grant

Filed: August 24, 2007

Date of Patent: September 20, 2011

Assignee: Robert Bosch GmbH

Inventors: Junling Hu, Fabrizio Morbini, Fuliang Weng, Xue Liu
Data modeling of class independent recognition models

Patent number: 8005674

Abstract: A recognition model set is generated. A technique is described to take advantage of the logarithm likelihood of real data for cross entropy to measure the mismatch between a training data and a training data derived model, and compare such type of mismatches between class dependent models and class independent model for evidence of model replacement. By using change of cross entropies in the decision of adding class independent Gaussian Mixture Models (GMMs), the good performance of class dependent models is largely retained, while decreasing the size and complexity of the model.

Type: Grant

Filed: July 10, 2007

Date of Patent: August 23, 2011

Assignee: International Business Machines Corporation

Inventors: Eric W Janke, Bin Jia
System and Method for Building Emotional Machines

Publication number: 20110172999

Abstract: A system, method and computer-readable medium for practicing a method of emotion detection during a natural language dialog between a human and a computing device are disclosed. The method includes receiving an utterance from a user in a natural language dialog, receiving contextual information regarding the natural language dialog which is related to changes of emotion over time in the dialog, and detecting an emotion of the user based on the received contextual information. Examples of contextual information include, for example, differential statistics, joint statistics and distance statistics.

Type: Application

Filed: March 21, 2011

Publication date: July 14, 2011

Applicant: AT&T Corp.

Inventors: Dilek Z. Hakkani-Tur, Jackson J. Liscombe, Guiseppe Riccardi
Speech recognition circuit and method

Patent number: 7979277

Abstract: A speech recognition circuit comprising a circuit for providing state identifiers which identify states corresponding to nodes or groups of adjacent nodes in a lexical tree, and for providing scores corresponding to said state identifiers, the lexical tree comprising a model of words; a memory structure for receiving and storing state identifiers identified by a node identifier identifying a node or group of adjacent nodes, said memory structure being adapted to allow lookup to identify particular state identifiers, reading of the scores corresponding to the state identifiers, and writing back of the scores to the memory structure after modification of the scores; an accumulator for receiving score updates corresponding to particular state identifiers from a score update generating circuit which generates the score updates using audio input, for receiving scores from the memory structure, and for modifying said scores by adding said score updates to said scores; and a selector circuit for selecting at least o

Type: Grant

Filed: September 14, 2005

Date of Patent: July 12, 2011

Assignee: Zentian Limited

Inventors: Guy Larri, Mark Catchpole, Damian Kelly Harris-Dowsett, Timothy Brian Reynolds
Human Voice Distinguishing Method and Device

Publication number: 20110166857

Abstract: A human voice distinguishing method and device are provided. The method involves: taking every n sampling points of the current frame of audio signals as one subsection, wherein n is a positive integer, judging whether two adjacent subsections have transition relative to a distinguishing threshold, wherein the sliding maximum absolute value of the two adjacent subsections is more and less than the distinguishing threshold respectively, if so, then determining the current frame to be human voice, where the sliding maximum absolute value of the subsection is obtained by the following method: taking the maximum value of absolute intensity of every sampling point in this subsection as the initial maximum absolute value of this subsection, and taking the maximum value of the initial maximum absolute value of this subsection and m subsections following this subsection as the sliding maximum absolute value of this subsection, wherein m is a positive integer.

Type: Application

Filed: September 15, 2009

Publication date: July 7, 2011

Applicant: ACTIONS SEMICONDUCTOR CO. LTD.

Inventors: Xiangyong Xie, Zhan Chen
Method of estimating sound arrival direction, sound arrival direction estimating apparatus, and computer program product

Patent number: 7970609

Abstract: Sound signals from sound sources present in multiple directions are accepted as inputs of multiple channels, and signal of each channel is transformed into a signal on a frequency axis. A phase component of the transformed signal is calculated for each identical frequency, and phase difference between the multiple channels is calculated. An amplitude component of the transformed signal is calculated, and a noise component is estimated from the calculated amplitude component. An SN ratio for each frequency is calculated on the basis of the amplitude component and the estimated noise component, and frequencies at which the SN ratios are larger than a predetermined value are extracted. Difference between arrival distances is calculated on the basis of the phase difference at selected frequency, and the arrival direction in which it is estimated that the target sound source is present is calculated.

Type: Grant

Filed: July 20, 2007

Date of Patent: June 28, 2011

Assignee: Fujitsu Limited

Inventor: Shoji Hayakawa
Assisted discrimination of similar sounding speakers

Patent number: 7970115

Abstract: A communications system is provided that includes: (a) a speech discrimination agent 136 operable to generate a speech profile of a first party to a voice call; and (b) a speech modification agent 140 operable to adjust, based on the speech profile, a spectral characteristic of a voice stream from the first party to form a modified voice stream, the modified voice stream being provided to the second party.

Type: Grant

Filed: October 5, 2005

Date of Patent: June 28, 2011

Assignee: Avaya Inc.

Inventors: Marc W. J. Coughlan, Alexander Q. Forbes, Alexander M. Scholte, Peter D. Runcie, Ralph Warta
Active labeling for spoken language understanding

Patent number: 7949525

Abstract: A spoken language understanding method and system are provided. The method includes classifying a set of labeled candidate utterances based on a previously trained classifier, generating classification types for each candidate utterance, receiving confidence scores for the classification types from the trained classifier, sorting the classified utterances based on an analysis of the confidence score of each candidate utterance compared to a respective label of the candidate utterance, and rechecking candidate utterances according to the analysis. The system includes modules configured to control a processor in the system to perform the steps of the method.

Type: Grant

Filed: June 16, 2009

Date of Patent: May 24, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Dilek Z. Hakkani-Tur, Mazin G. Rahim, Gokhan Tur
System and method for building emotional machines

Patent number: 7912720

Abstract: A system, method and computer-readable medium for practicing a method of emotion detection during a natural language dialog between a human and a computing device are disclosed. The method includes receiving an utterance from a user in a natural language dialog between a human and a computing device, receiving contextual information regarding the natural language dialog which is related to changes of emotion over time in the dialog, and detecting an emotion of the user based on the received contextual information. Examples of contextual information include, for example, differential statistics, joint statistics and distance statistics.

Type: Grant

Filed: July 20, 2005

Date of Patent: March 22, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Dilek Z. Hakkani-Tur, Jackson J. Liscombe, Guiseppe Riccardi
System and method for processing speech recognition

Patent number: 7904294

Abstract: An automatic speech recognition (ASR) system and method is provided for controlling the recognition of speech utterances generated by an end user operating a communications device. The ASR system and method can be used with a mobile device that is used in a communications network. The ASR system can be used for ASR of speech utterances input into a mobile device, to perform compensating techniques using at least one characteristic and for updating an ASR speech recognizer associated with the ASR system by determined and using a background noise value and a distortion value that is based on the features of the mobile device. The ASR system can be used to augment a limited data input capability of a mobile device, for example, caused by limited input devices physically located on the mobile device.

Type: Grant

Filed: April 9, 2007

Date of Patent: March 8, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Richard C. Rose, Sarangarajan Pathasarathy, Aaron Edward Rosenberg, Shrikanth Sambasivan Narayanan
System and method of pattern recognition in very high dimensional space

Patent number: 7869997

Abstract: A system and method of recognizing speech comprises an audio receiving element and a computer server. The audio receiving element and the computer server perform the process steps of the method. The method involves training a stored set of phonemes by converting them into n-dimensional space, where n is a relatively large number. Once the stored phonemes are converted, they are transformed using single value decomposition to conform the data generally into a hypersphere. The received phonemes from the audio-receiving element are also converted into n-dimensional space and transformed using single value decomposition to conform the data into a hypersphere. The method compares the transformed received phoneme to each transformed stored phoneme by comparing a first distance from a center of the hypersphere to a point associated with the transformed received phoneme and a second distance from the center of the hypersphere to a point associated with the respective transformed stored phoneme.

Type: Grant

Filed: March 28, 2008

Date of Patent: January 11, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventor: Bishnu Saroop Atal
TURN-TAKING CONFIDENCE

Publication number: 20100324896

Abstract: A method for managing interactive dialog between a machine and a user. In one embodiment, an interaction between the machine and the user is managed by determining at least one likelihood value which is dependent upon a possible speech onset of the user. In another embodiment, the likelihood value can be dependent on a model of a desire of the user for specific items, a model of an attention of the user to specific items, or a model of turn-taking cues. The values can be used to determine a mode confidence value that is used by the system to determine the nature of prompts provided to the user.

Type: Application

Filed: August 24, 2010

Publication date: December 23, 2010

Inventors: David Attwater, Bruce Balentine
Methods and apparatus for generating dialog state conditioned language models

Patent number: 7853449

Abstract: Techniques are provided for generating improved language modeling. Such improved modeling is achieved by conditioning a language model on a state of a dialog for which the language model is employed. For example, the techniques of the invention may improve modeling of language for use in a speech recognizer of an automatic natural language based dialog system. Improved usability of the dialog system arises from better recognition of a user's utterances by a speech recognizer, associated with the dialog system, using the dialog state-conditioned language models. By way of example, the state of the dialog may be quantified as: (i) the internal state of the natural language understanding part of the dialog system; or (ii) words in the prompt that the dialog system played to the user.

Type: Grant

Filed: March 28, 2008

Date of Patent: December 14, 2010

Assignee: Nuance Communications, Inc.

Inventors: Satyanarayana Dharanipragada, Michael Daniel Monkowski, Harry W. Printz, Karthik Visweswariah
APPARATUS AND METHOD FOR DETECTING VOICE BASED ON MOTION INFORMATION

Publication number: 20100277579

Abstract: Disclosed are an apparatus and method of deducing a user's intention using motion information. The user's intention deduction apparatus includes a speech intention determining unit configured to predict a speech intention regarding a user's speech using motion information sensed by at least one motion capture sensor, and a controller configured to control operation of detecting a voice section from a received sound signal based on the predicted speech intention.

Type: Application

Filed: April 29, 2010

Publication date: November 4, 2010

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Jeong-Mi CHO, Jeong-Su Kim, Won-Chul Bang, Nam-Hoon Kim
Systems and methods for fast and memory efficient machine translation using statistical integrated phase lattice

Patent number: 7797148

Abstract: A phrase-based translation system and method includes a statistically integrated phrase lattice (SIPL) (H) which represents an entire translational model. An input (I) is translated by determining a best path through an entire lattice (S) by performing an efficient composition operation between the input and the SIPL. The efficient composition operation is performed by a multiple level search where each operand in the efficient composition operation represents a different search level.

Type: Grant

Filed: June 4, 2008

Date of Patent: September 14, 2010

Assignee: International Business Machines Corporation

Inventors: Stanley Chen, Yuqing Gao, Bowen Zhou
System and method for rescoring N-best hypotheses of an automatic speech recognition system

Patent number: 7761296

Abstract: A system and method for rescoring the N-best hypotheses from an automatic speech recognition system by comparing an original speech waveform to synthetic speech waveforms that are generated for each text sequence of the N-best hypotheses. A distance is calculated from the original speech waveform to each of the synthesized waveforms, and the text associated with the synthesized waveform that is determined to be closest to the original waveform is selected as the final hypothesis. The original waveform and each synthesized waveform are aligned to a corresponding text sequence on a phoneme level. The mean of the feature vectors which align to each phoneme is computed for the original waveform as well as for each of the synthesized hypotheses.

Type: Grant

Filed: April 2, 1999

Date of Patent: July 20, 2010

Assignee: International Business Machines Corporation

Inventors: Raimo Bakis, Ellen M. Eide
Method and system for recognizing phoneme in speech signal

Patent number: 7747439

Abstract: A method of correctly segmenting phonemes by determining a boundary indicating a start point and an end point of each of the segmented phonemes, and correctly finding the phoneme in the speech signal by determining which phoneme in a phoneme recognition standard table corresponds to each of the segmented phonemes. Using this phoneme recognition method, an amount of computation can be significantly reduced, and the phoneme in the speech signal can be easily found by calculating probability distances between phonemes.

Type: Grant

Filed: March 5, 2007

Date of Patent: June 29, 2010

Assignee: Samsung Electronics Co., Ltd

Inventor: Hyun-Soo Kim
Relationship analysis system and method for semantic disambiguation of natural language

Patent number: 7739102

Abstract: A system, method, and computer program product for domain-independent natural language understanding, including at least one of forming pairs of words and/or phrases in a sentence, wherein each word and/or phrase is paired with every other word and/or phrase; determining meanings for the words and/or phrases; assigning numeric codes that uniquely identify semantic concepts to those word and/or phrase meanings; comparing the numeric code of each word and/or phrase with each numeric code of the other word and/or phrase in the pair; selecting the pairs with the best relationships; combining highly-related pairs with other highly-related pairs to form longer groups of words; exchanging numeric codes for the longer groups with numeric codes having a weaker relationship to determine if the exchanged numeric codes provide an overall stronger relationship; and forming longer and longer groups until the sentence is understood.

Type: Grant

Filed: October 7, 2004

Date of Patent: June 15, 2010

Inventor: Howard J. Bender
Pattern matching method and apparatus and speech information retrieval system

Patent number: 7739111

Abstract: A pattern matching method for matching between a first symbol sequence and a second symbol sequence which is shorter than the first symbol sequence is provided. The method includes the steps of performing DP matching between the first and second symbol sequences to create a matrix of the DP matching transition, detecting the maximum length of lengths of consecutive correct answers based on the matrix of the DP matching transition, and calculating similarity based on the maximum length.

Type: Grant

Filed: August 9, 2006

Date of Patent: June 15, 2010

Assignee: Canon Kabushiki Kaisha

Inventor: Kazue Kaneko
Query engine for processing voice based queries including semantic decoding

Patent number: 7725307

Abstract: An intelligent query system for processing voiced-based queries is disclosed. This distributed client-server system, typically implemented on an intranet or over the Internet accepts a user's queries at his/her computer, PDA or workstation using a speech input interface. After converting the user's query from speech to text, a natural language engine, a database processor and a full-text SQL database is implemented to find a single answer that best matches the user's query. Both statistical and semantic decoding are used to assist and improve the performance of the query recognition.

Type: Grant

Filed: August 29, 2003

Date of Patent: May 25, 2010

Assignee: Phoenix Solutions, Inc.

Inventor: Ian M. Bennett
Method and system for matching speech data

Patent number: 7707032

Abstract: A method and system used to determine the similarity between an input speech data and a sample speech data is provided. First, the input speech data is segmented into a plurality of input speech frames and the sample speech data is segmented into a plurality of sample speech frames. Then, the input speech frames and the sample speech frames are used to build a matching matrix, wherein the matching matrix comprises the distance values between each of the input speech frames and each of the sample speech frames. Next, the distance values are used to calculate a matching score. Finally, the similarity between the input speech data and the sample speech data is determined according to this matching score.

Type: Grant

Filed: October 20, 2005

Date of Patent: April 27, 2010

Assignee: National Cheng Kung University

Inventors: Jhing-Fa Wang, Po-Chuan Lin, Li-Chang Wen
Voice analysis/synthesis apparatus and program

Patent number: 7672835

Abstract: An FFT unit performs an FFT process on high-frequency-eliminated, pitch-shifted voice data for one frame. A time scaling unit calculates a frequency amplitude, a phase, a phase difference between the present and immediately preceding frames, and an unwrapped version of the phase difference for each channel from which the frequency component was obtained by the FFT, detects a reference channel based on a peak one of the frequency amplitudes, and calculates the phase of each channel in a synthesized voice based on the reference channel, using results of the calculation. An IFFT unit processes each frequency component in accordance with the calculated phase, performs an IFFT process on the resulting frequency component, and produces synthesized voice data for one frame.

Type: Grant

Filed: December 19, 2005

Date of Patent: March 2, 2010

Assignee: Casio Computer Co., Ltd.

Inventor: Masaru Setoguchi

prev 1 2 3 4 5 6 next