Correlation Patents (Class 704/237)
-
Patent number: 8175882Abstract: A method for task execution improvement, the method includes: generating a baseline model for executing a task; recording a user executing a task; comparing the baseline model to the user's execution of the task; and providing feedback to the user based on the differences in the user's execution and the baseline model.Type: GrantFiled: January 25, 2008Date of Patent: May 8, 2012Assignee: International Business Machines CorporationInventors: Sara H. Basson, Dimitiri Kanevsky, Edward E. Kelley, Bhuvana Ramabhadran
-
Publication number: 20120095762Abstract: A method of recognizing speech is provided. The method includes the operations of (a) dividing first speech that is input to a speech recognizing apparatus into frames; (b) converting the frames of the first speech into frames of second speech by applying conversion rules to the divided frames, respectively; and (c) recognizing, by the speech recognizing apparatus, the frames of the second speech, wherein (b) comprises converting the frames of the first speech into the frames of the second speech by reflecting at least one frame from among the frames that are previously positioned with respect to a frame of the first speech.Type: ApplicationFiled: October 19, 2011Publication date: April 19, 2012Applicants: SEOUL NATIONAL UNIVERSITY INDUSTRY FOUNDATION, SAMSUNG ELECTRONICS CO., LTD.Inventors: Ki-wan EOM, Chang-woo HAN, Tae-gyoon KANG, Nam-soo KIM, Doo-hwa HONG, Jae-won LEE, Hyung-joon LIM
-
Patent number: 8140329Abstract: A method and apparatus are proposed for automatically recognizing observed audio data. An observation vector is created of audio features extracted from the observed audio data and the observed audio data is recognized from the observation vector. The audio features include features are selected from a group of 3 types of features obtained from the observed audio data: (i) ICA features obtained by processing the observed audio data, (ii) first MFCC features obtained by removing a logarithm step from the conventional MFCC process, or (iii) second MFCC features obtained by applying the ICA process to results of a mel scale filter bank.Type: GrantFiled: April 5, 2004Date of Patent: March 20, 2012Assignee: Sony CorporationInventors: Jian Zhang, Wei Lu, Xiaobing Sun
-
Patent number: 8103506Abstract: The present disclosure provides method and system for converting a free text expression of an identity to a phonetic equivalent code. The conversion follows a set of rules based on phonetic groupings and compresses the expression to a shorter series of characters than the expression. The phonetic equivalent code may be compared to one or more other phonetic equivalent code to establish a correlation between the codes. The phonetic equivalent code of the free text expression may be associated with the code of a known identity. The known identity may be provided to a user for confirmation of the identity. Further, a plurality of expressions stored in a database may be consolidated by converting the expressions to phonetic equivalent codes, comparing the codes to find correlations, and if appropriate reducing the number of expressions or mapping the expressions to a fewer number of expressions.Type: GrantFiled: September 20, 2007Date of Patent: January 24, 2012Assignee: United Services Automobile AssociationInventors: Gregory Brian Meyer, James Elden Nicholson
-
Publication number: 20110288864Abstract: A method for detecting speech using a first microphone adapted to produce a first signal (x), and a second microphone adapted to produce a second signal (x2), the method comprising the steps of: (i) applying gain to the second signal to produce a normalised second signal, which signal is normalised relative to the first signal; (ii) constructing one or more signal components from the first signal and the normalised second signal; (iii) constructing an adaptive differential microphone (ADM) having a constructed microphone response constructed from the one or more signal components which response has at least one directional null; (iv) producing one or more ADM outputs (yf, yb) from the constructed microphone response in response to detected sound; (v) computing a ratio of a parameter of either a first signal component or a constructed microphone response to a parameter of an output of the ADM; (vi) comparing the ratio to an adaptive threshold value; (vii) detecting speech if the ratio is greater than or equType: ApplicationFiled: November 19, 2010Publication date: November 24, 2011Applicant: NXP B.V.Inventors: Patrick Kechichian, Cornelis Pieter Janse, Rene Martinus Maria Derkx, Wouter Joos Tirry
-
Patent number: 8060365Abstract: A dialog processing system which includes a target expression data extraction unit for extracting a plurality of target expression data each including a pattern matching portion which matches an utterance pattern, which are inputted by an utterance pattern input unit and is an utterance structure derived from contents of field-independent general conversations, among a plurality of utterance data which are inputted by an utterance data input unit and obtained by converting contents of a plurality of conversations in one field; a feature extraction unit for retrieving the pattern matching portions, respectively, from the plurality of target expression data extracted, and then for extracting feature quantity common to the plurality of pattern matching portions; and a mandatory data extraction unit for extracting mandatory data in the one field included in the plurality of utterance data by use of the feature quantities extracted.Type: GrantFiled: July 3, 2008Date of Patent: November 15, 2011Assignee: Nuance Communications, Inc.Inventors: Nobuyasu Itoh, Shiho Negishi, Hironori Takeuchi
-
Publication number: 20110276323Abstract: The illustrative embodiments described herein provide systems and methods for authenticating a speaker. In one embodiment, a method includes receiving reference speech input including a reference passphrase to form a reference recording, and receiving test speech input including a test passphrase to form a test recording. The method includes determining whether the test passphrase matches the reference passphrase, and determining whether one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase. The method authenticates the speaker of the test speech input in response to determining that the reference passphrase matches the test passphrase and that one or more voice features of the speaker of the test passphrase matches one or more voice features of the speaker of the reference passphrase.Type: ApplicationFiled: May 6, 2010Publication date: November 10, 2011Applicant: Senam Consulting, Inc.Inventor: Serge Olegovich Seyfetdinov
-
Publication number: 20110231186Abstract: A speech detection method is presented, which includes the following steps. A first voice captured device samples a first signal and a second voice captured device samples a second signal. The first voice captured device is closer to a speech signal source than the second voice captured device. A first energy corresponding to the first signal within an interval is calculated, a second energy corresponding to the second signal within the interval is calculated, and a first ratio is calculated according to the first energy and the second energy. The first ratio is transformed into a second ratio. A threshold value is set. It is determined whether the speech signal source is detected by comparing the second ratio and the threshold value.Type: ApplicationFiled: July 30, 2010Publication date: September 22, 2011Applicant: ISSC TECHNOLOGIES CORP.Inventors: Ying Tsung Lin, Yung Chen Ting, Pansop Kim
-
Patent number: 8010356Abstract: Parameters for distributions of a hidden trajectory model including means and variances are estimated using an acoustic likelihood function for observation vectors as an objection function for optimization. The estimation includes only acoustic data and not any intermediate estimate on hidden dynamic variables. Gradient ascent methods can be developed for optimizing the acoustic likelihood function.Type: GrantFiled: February 17, 2006Date of Patent: August 30, 2011Assignee: Microsoft CorporationInventors: Li Deng, Dong Yu, Xiaolong Li, Alejandro Acero
-
Patent number: 7974392Abstract: A communication device and method are provided for audibly outputting a received text message to a user, the text message being received from a sender. A text message to present audibly is received. An output voice to present the text message is retrieved, wherein the output voice is synthesized using predefined voice characteristic information to represent the sender's voice. The output voice is used to audibly present the text message to the user.Type: GrantFiled: March 2, 2010Date of Patent: July 5, 2011Assignee: Research In Motion LimitedInventor: Eric Ng
-
Patent number: 7974844Abstract: A speech recognition apparatus includes a first-candidate selecting unit that selects a recognition result of a first speech from first recognition candidates based on likelihood of the first recognition candidates; a second-candidate selecting unit that extracts recognition candidates of a object word contained in the first speech and recognition candidates of a clue word from second recognition candidates, acquires the relevance ratio associated with the semantic relation between the extracted recognition candidates of the object word and the extracted recognition candidates of the clue word, and selects a recognition result of the second speech based on the acquired relevance ratio; a correction-portion identifying unit that identifies a portion corresponding to the object word in the first speech; and a correcting unit that corrects the word on identified portion.Type: GrantFiled: March 1, 2007Date of Patent: July 5, 2011Assignee: Kabushiki Kaisha ToshibaInventor: Kazuo Sumita
-
Publication number: 20110153317Abstract: An apparatus for wireless communications includes a processing system. The processing system is configured to receive an input sound stream of a user, split the input sound stream into a plurality of frames, classify each of the frames as one selected from the group consisting of a non-speech frame and a speech frame, determine a pitch of each of the frames in a subset of the speech frames, and identify a gender of the user from the determined pitch. To determine the pitch, the processing system is configured to filter the speech frames to compute an error signal, compute an autocorrelation of the error signal, find a maximum autocorrelation value, and set the pitch to an index of the maximum autocorrelation value.Type: ApplicationFiled: December 23, 2009Publication date: June 23, 2011Applicant: QUALCOMM INCORPORATEDInventors: Yinian Mao, Gene Marsh
-
Patent number: 7917362Abstract: A method for determining a bit boundary of a repetition-coded signal including bits each having a plurality of epochs includes (a) counting the epochs repeatedly from an initial number to a predetermined number in a predetermined time, (b) sensing sign changes in the epochs, (c) recording each sensed sign change with a weighting function to a corresponding counting number of the epoch, and (d) determining the bit boundary according to a result of step (c).Type: GrantFiled: April 19, 2006Date of Patent: March 29, 2011Assignee: MediaTek Inc.Inventor: Jia-Horng Shieh
-
Publication number: 20110019805Abstract: Methods and systems are provided for searching audio records. Certain embodiments of the invention may be applied to search audio records containing a user's voice for instances where a specific sound, such as a word or phrase, is vocalized by the user. An audio sample is provided by recording the user vocalizing the sound. The audio sample is compared with the audio records to locate matches to the audio sample. In some embodiments, the audio records comprise recordings of calls between a near-end caller and a far-end caller, and the audio sample is a recording of a sound spoken by the near-end caller. The same input device may be used to record both the audio sample and the audio records.Type: ApplicationFiled: January 14, 2009Publication date: January 27, 2011Applicant: ALGO COMMUNICATION PRODUCTS LTD.Inventor: Paul William Zoehner
-
Publication number: 20100299148Abstract: A method for measuring speech intelligibility includes inputting a speech waveform to a system. At least one acoustic feature is extracted from the waveform. From the acoustic feature, at least one phoneme is segmented. At least one acoustic correlate measure is extracted from the at least one phoneme and at least one intelligibility measure is determined. The at least one acoustic correlate measure is mapped to the at least one intelligibility measure.Type: ApplicationFiled: March 29, 2010Publication date: November 25, 2010Inventors: Lee Krause, Mark Skowranski, Bonny Banerjee
-
Patent number: 7822614Abstract: A language analyzer performs speech recognition on a speech input by a speech input unit, specifies a possible word which is represented by the speech, and the score thereof, and supplies word data representing them to an agent processing unit. The agent processing unit stores process item data which defines a data acquisition process to acquire word data or the like, a discrimination process, and an input/output process, and wires or data defining transition from one process to another and giving a weighting factor to the transition, and executes a flow represented generally by the process item data and the wires to thereby control devices belonging to an input/output target device group. To which process in the flow the transition takes place is determined by the weighting factor of each wire, which is determined by the connection relationship between a point where the process has proceeded and the wire, and the score of word data.Type: GrantFiled: December 6, 2004Date of Patent: October 26, 2010Assignee: Kabushikikaisha KenwoodInventor: Rika Koyama
-
Patent number: 7805301Abstract: A reliable full covariance matrix estimation algorithm for pattern unit's state output distribution in pattern recognition system is discussed. An intermediate hierarchical tree structure is built to relate models for product units. Full covariance matrices of pattern unit's state output distribution are estimated based on all the related nodes in the tree.Type: GrantFiled: July 1, 2005Date of Patent: September 28, 2010Assignee: Microsoft CorporationInventors: Ye Tian, Frank Kao-Ping Soong, Jian-Lai Zhou
-
Patent number: 7804956Abstract: The present invention provides a biometrics-based cryptographic key generation system and method. A user-dependent distinguishable feature transform unit provides a feature transformation for each authentic user, which receives N-dimensional biometric features and performs a feature transformation to produce M-dimensional feature signals, such that the transformed feature signals of the authentic user are compact in the transformed feature space while those of other users presumed as imposters are either diverse or far away from those of the authentic user. A stable key generation unit receives the transformed feature signals to produce a cryptographic key based on bit information respectively provided by the M-dimensional feature signals, wherein the length of the bit information provided by the feature signal of each dimension is proportional to the degree of distinguishability in the dimension.Type: GrantFiled: March 11, 2005Date of Patent: September 28, 2010Assignee: Industrial Technology Research InstituteInventors: Yao-Jen Chang, Tsu-Han Chen, Wen-De Zhang
-
Patent number: 7774337Abstract: A method for controlling a relational database system, with a query statement comprised of keywords being analyzed, with the RTN being formed of independent RTN building blocks. Each RTN building block has an inner, directed decision graph which is defined independently from the inner, directed decision graphs of the other RTN building blocks with at least one decision position along at least one decision path. The inner decision graphs of all RTN building blocks are run by means of the keywords in a selection step and all possible paths of this decision graph are followed until either no match with the respectively selected path is determined by the decision graph and the process is interrupted, or the respectively chosen path is run until the end.Type: GrantFiled: July 10, 2007Date of Patent: August 10, 2010Assignee: Mediareif Moestl & Reif Kommunikations-und Informationstechnologien OEGInventor: Matthias Moestl
-
Patent number: 7756715Abstract: Apparatus, method, and medium for processing an audio signal using a correlation between bands are provided. The apparatus includes an encoding unit encoding an input audio signal and a decoding unit decoding the encoded input audio signal.Type: GrantFiled: November 17, 2005Date of Patent: July 13, 2010Assignee: Samsung Electronics Co., Ltd.Inventors: Junghoe Kim, Dohyung Kim, Sihwa Lee
-
Patent number: 7738635Abstract: A method for improving the recognition confidence of alphanumeric spoken input, suitable for use in a speech recognition telephony application such as a voice response system. An alphanumeric candidate is determined from the spoken input, which may be the best available representation of the spoken input. Recognition confidence is compared with a preestablished threshold. If the recognition confidence exceeds the threshold, the alphanumeric candidate is selected to represent the spoken input. Otherwise, present call data associated with the spoken input is determined. Call data may include automatic number identification (ANI) information, caller-ID information, and/or dialed number information service (DNIS) information. Information associated with the alphanumeric candidate and information associated with the present call data are correlated in order to select alphanumeric information that best represents the spoken input.Type: GrantFiled: January 6, 2005Date of Patent: June 15, 2010Assignee: International Business Machines Corporation Nuance Communications, Inc.Inventors: Christopher Ryan Groves, Kevin James Muterspaugh
-
Patent number: 7680657Abstract: Possible segmentations for an audio signal are scored based on distortions for feature vectors of the audio signal and the total number of segments in the segmentation. The scores are used to select a segmentation and the selected segmentation is used to identify a starting point and an ending point for a speech signal in the audio signal.Type: GrantFiled: August 15, 2006Date of Patent: March 16, 2010Assignee: Microsoft CorporationInventors: Yu Shi, Frank Kao-ping Soong, Jian-Iai Zhou
-
Patent number: 7624020Abstract: An adapter for a text to text training. A main corpus is used for training, and a domain specific corpus is used to adapt the main corpus according to the training information in the domain specific corpus. The adaptation is carried out using a technique that may be faster than the main training. The parameter set from the main training is adapted using the domain specific part.Type: GrantFiled: September 9, 2005Date of Patent: November 24, 2009Assignee: Language Weaver, Inc.Inventors: Kenji Yamada, Kevin Knight, Greg Langmead
-
Patent number: 7610198Abstract: A method of searching a signed codebook to quantize a vector includes weighting a shape codevector in a set of shape codevectors with a weighting function for a Weighted Mean Square Error (WMSE) criteria, to produce a weighted shape codevector. The method further includes correlating the weighted shape codevector with the vector to produce a weighted correlation term. The method also includes determining, based on a sign of the weighted correlation term, a preferred one of a positive and a negative signed codevector associated with the shape codevector. The method further includes determining whether one of the signed codevectors does not belong to an illegal space defining illegal vectors.Type: GrantFiled: June 7, 2002Date of Patent: October 27, 2009Assignee: Broadcom CorporationInventor: Jes Thyssen
-
Patent number: 7603269Abstract: A speech recognition grammar creating apparatus, which is capable of eliminating complex labor associated with preparing all rules by taking into account changes of the order of component elements of a speech-recognizing object and possible combinations of component elements including at least one component element that can be omitted. In the speech recognition grammar creating apparatus, an image edit section groups together at least one component element that cannot be omitted and at least one component element that can be omitted, as the speech-recognizing object, into a component element group as an omission-allowed group. An augmented BNF converting section creates the speech recognition grammar by expanding the component element group obtained by the grouping.Type: GrantFiled: June 29, 2005Date of Patent: October 13, 2009Assignee: Canon Kabushiki KaishaInventors: Kazue Kaneko, Michio Aizawa
-
Patent number: 7596495Abstract: A method is provided for recurrently estimating a spectrum of noise at each signal observation interval from a sound signal which contains the noise and which is observed at each signal observation interval. In the method, there are acquired an envelope of a previous spectrum of the noise which has been previously estimated from the sound signal observed at a previous signal observation interval, and an envelope of a current spectrum of the sound signal which is observed at a current signal observation interval subsequent to the previous signal observation interval. Then, a value of correlation is computed between the envelop of the previous spectrum of the noise and the envelope of the current spectrum of the sound signal. A current spectrum of the noise contained in the sound signal observed at the current signal observation interval is estimated in accordance with the computed value of the correlation and based on the previous spectrum of the noise and the current spectrum of the sound signal.Type: GrantFiled: March 29, 2005Date of Patent: September 29, 2009Assignee: Yamaha CorporationInventors: Michiko Kazama, Mikio Tohyama, Toru Hirai
-
Patent number: 7496513Abstract: Input is received from at least two different input sources. Information from these sources are combined together to provide a result. In a particular example, input from one source corresponds to potential recognition candidates, and input from another source corresponds to other potential candidates. These candidates are combined together to select a result.Type: GrantFiled: June 28, 2005Date of Patent: February 24, 2009Assignee: Microsoft CorporationInventors: Frank Kao-Ping Soong, Jian-Lai Zhou, Ye Tian
-
Patent number: 7412384Abstract: A digital signal processing method and learning method and devices therefor, and a program storage medium which are capable of further improving the waveform reproducibility of a digital signal. Self correlation coefficients are calculated by cutting parts out of the digital signal by multiple windows having different sizes, and the parts are classified based on the calculation results of the self correlation coefficients. Then, the digital signal is converted by the prediction method corresponding to the classified class, so that the conversion further suitable for the features of the digital signal can be conducted.Type: GrantFiled: July 31, 2001Date of Patent: August 12, 2008Assignee: Sony CorporationInventors: Tetsujiro Kondo, Tsutomu Watanabe
-
Patent number: 7337109Abstract: A multiple step adaptive method for time scaling. Synthesizing S3[n] signal from signal S1[n]signal and S2[n]signal. Comprising following steps: (a) calculating a first magnitude of a cross-correlation function of S1[n]signal and S2[n]signal according to a first index; (b) comparing the first magnitude with a threshold value; (c) if first magnitude is smaller than threshold value, calculating a first reference magnitude of cross-correlation function of S1[n]signal and S2[n]signal according to a first reference index behind the first index by a first determined number, or calculating a second reference magnitude of the cross-correlation function of the S1[n] signal and the S2[n] signal according to a second reference index behind the first index by a second number; (d) synthesizing the S3[n] signal by adding S1[n]signal to the S2[n] signal in accordance with a maximum index corresponding to a largest magnitude among all the magnitudes calculated in (c).Type: GrantFiled: October 2, 2003Date of Patent: February 26, 2008Assignee: ALI CorporationInventor: Gin-Der Wu
-
Patent number: 7284255Abstract: A system and method are disclosed for performing audience surveys of broadcast audio from radio and television. A small body-worn portable collection unit samples the audio environment of the survey member and stores highly compressed features of the audio programming. A central computer simultaneously collects the audio outputs from a number of radio and television receivers representing the possible selections that a survey member may choose. On a regular schedule the central computer interrogates the portable units used in the survey and transfers the captured audio feature samples. The central computer then applies a feature pattern recognition technique to identify which radio or television station the survey member was listening to at various times of day. This information is then used to estimate the popularity of the various broadcast stations.Type: GrantFiled: November 16, 1999Date of Patent: October 16, 2007Inventors: Steven G. Apel, Stephen C. Kenyon
-
Patent number: 7212968Abstract: A dynamic programming technique is provided for matching two sequences of phonemes both of which may be generated from text or speech. The scoring of the dynamic programming matching technique uses phoneme confusion scores, phoneme insertion scores and phoneme deletion scores which are obtained in advance in a training session and, if appropriate, confidence data generated by a recognition system if the sequences are generated from speech.Type: GrantFiled: October 25, 2000Date of Patent: May 1, 2007Assignee: Canon Kabushiki KaishaInventors: Philip Neil Garner, Jason Peter Andrew Charlesworth, Asako Higuchi
-
Patent number: 7139705Abstract: A method of determining the time relation between an original or input speech signal (10) and an output speech signal (15) affected by time warping in a communications system, such as a VoIP (Voice over Internet Protocol) system. Wherein corresponding speech bursts (11, 12; 16, 17) of the input (10) and output speech signal (15) are located in accordance with a predefined signal property thereof. The corresponding speech bursts (11, 12; 16, 17) thus located and time aligned (10, 30) for the correction of continuous and discontinuous warping effects. A performance estimate is generated by comparing the time aligned input and output speech signals (10, 30) applying cross-correlation techniques and PSQM (Perceptual Speech Quality Measure) or PSQM+ (Enhanced Perceptual Speech Quality Measure) techniques.Type: GrantFiled: November 13, 2000Date of Patent: November 21, 2006Assignee: Koninklijke KPN N.V.Inventors: John Gerard Beerends, Andries Pieter Hekstra
-
Patent number: 7130292Abstract: A method and apparatus for enhancing the receiving and information identification functions of multiple access communications systems by employing one or more optical processors configured as a bank of 1-D correlators. The present invention is particularly useful in a DS/SS CDMA communications system, resulting in a multiuser CDMA system that approaches carrier to noise performance (C/N) as opposed to being limited by multiple access interference (MAI). The correlators are arranged in parallel to detect and/or demodulate the received signal, in conjunction with one or more complex algorithms to perform near-optimum multiuser detection, perform multipath combining and/or perform carrier Doppler compensation.Type: GrantFiled: January 19, 2001Date of Patent: October 31, 2006Assignee: Essex CorporationInventors: Terry M. Turpin, James L. Lafuse
-
Patent number: 6996291Abstract: After one or both of a pair of images are obtained, an auto-correlation function for one of those images is generated to determine a smear amount and possibly a smear direction. The smear amount and direction are used to identify potential locations of a peak portion of the correlation function between the pair of images. The pair of images is then correlated only at offset positions corresponding to the one or more of the potential peak locations. In some embodiments, the pair of images is correlated according to a sparse set of image correlation function value points around the potential peak locations. In other embodiments, the pair of images is correlated at a dense set of correlation function value points around the potential peak locations. The correlation function values of these correlation function value points are then analyzed to determine the offset position of the true correlation function peak.Type: GrantFiled: August 6, 2001Date of Patent: February 7, 2006Assignee: Mitutoyo CorporationInventor: Michael Nahum
-
Patent number: 6965631Abstract: One embodiment of the present invention includes a circular shift register, K storage elements, and a code register. The circular shift register having N data samples circularly shifts a first data sample of the N data samples into a data position at a first clock frequency. The N data samples correspond to signal received from one of K satellites in a global positioning system (GPS). The N data samples are loaded into the circular shift register at a second clock frequency. The K storage elements store K code sequences, respectively. Each of the K code sequences has N code samples and includes a first code sample being written at a code position corresponding to the data position at a third clock frequency. The K storage elements correspond to the K satellites. The code register stores the N code samples loaded from one of the K storage elements at a fourth clock frequency. The fourth clock frequency is K times faster than the first clock frequency.Type: GrantFiled: March 13, 2001Date of Patent: November 15, 2005Assignee: PRI Research & Development Corp.Inventors: Kaveh Shakeri, Alireza Mehrnia, Farshid Soheili-Najafabadi
-
Patent number: 6823305Abstract: Speaker normalization is carried out based on biometric information available about a speaker, such as his height, or a dimension of a bodily member or article of clothing. The chosen biometric parameter correlates with the vocal tract length. Speech can be normalized based on the biometric parameter, which thus indirectly normalizes the speech based on the vocal tract length of the speaker. The inventive normalization can be used in model formation, or in actual speech recognition usage, or both. Substantial improvements in accuracy have been noted at little cost. The preferred biometric parameter is height, and the preferred form of scaling is linear scaling with the scale factor proportional to the height of the speaker.Type: GrantFiled: December 21, 2000Date of Patent: November 23, 2004Assignee: International Business Machines CorporationInventor: Ellen M. Eide
-
Patent number: 6687672Abstract: Methods and apparatus for blind channel estimation of a speech signal corrupted by a communication channel are provided. One method includes converting a noisy speech signal into either a cepstral representation or a log-spectral representation; estimating a correlation of the representation of the noisy speech signal; determining an average of the noisy speech signal; constructing and solving, subject to a minimization constraint, a system of linear equations utilizing a correlation structure of a clean speech training signal, the correlation of the representation of the noisy speech signal, and the average of the noisy speech signal; and selecting a sign of the solution of the system of linear equations to estimate an average clean speech signal in a processing window.Type: GrantFiled: March 15, 2002Date of Patent: February 3, 2004Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Younes Souilmi, Luca Rigazio, Patrick Nguyen, Jean-Claude Junqua
-
Publication number: 20030225581Abstract: Speech recognition is performed by matching between a characteristic quantity of an inputted speech and a composite HMM obtained by synthesizing a speech HMM (hidden Markov model) and a noise HMM for each speech frame of the inputted speech by use of the composite HMM.Type: ApplicationFiled: March 14, 2003Publication date: December 4, 2003Applicant: International Business Machines CorporationInventors: Tetsuya Takiguchi, Masafumi Nishimura
-
Publication number: 20030182117Abstract: A question as to the physical condition and a question as to the feeling are asked, answers are accepted by voice, acoustic features are extracted from the answer to the question as to the physical condition, and character string information is extracted from the answer to the question as to the feeling. A correlation between the acoustic features and the character string information is set, and character string information is identified from a newly accepted acoustic feature, thereby performing feeling presumption. Feeling is presumed from the voice in response to the subject's severalty and changes in the subject's age and physical condition. A medical examination by interview on the mental condition is performed on the subject by voice output and voice input, and the subject's mental condition is diagnosed based on the contents of the subject's answer and analysis of the answer by voice.Type: ApplicationFiled: January 31, 2003Publication date: September 25, 2003Applicant: SANYO ELECTRIC CO., LTD.Inventors: Rie Monchi, Masakazu Asano, Hirokazu Genno
-
Publication number: 20030093273Abstract: There are provided: a data generating section 3 which differentiates an input aural signal, detects as a sample point a point where a differentiating value satisfies a predetermined condition, and obtains discrete amplitude data on detected sample points and timing data indicative of a time interval between the sample points, and a correlating section 4 for computing correlation data by using the amplitude data and the timing data. Input speech is recognized by matching correlation data, which is generated for input speech by a correlating section 4, with correlation data which is generated in the same manner in advance for a variety of speech and is stored in a data memory 6.Type: ApplicationFiled: October 3, 2002Publication date: May 15, 2003Inventor: Yukio Koyanagi
-
Patent number: 6560575Abstract: An apparatus is provided for checking the consistency between two training words which can be used in, for example, a speech recognition or verification system. Two training examples are aligned using a dynamic programming alignment process and an average frame score is calculated from the alignment results together with the worst score in a number of consecutive frames. These values are then compared with similar values obtained from training examples which are known to be consistent to determine if the training examples are consistent.Type: GrantFiled: September 30, 1999Date of Patent: May 6, 2003Assignee: Canon Kabushiki KaishaInventor: Robert Alexander Keiller
-
Publication number: 20030009331Abstract: Pre-computed context-dependent phoneme representations of a number of constituents of a grammar are processed dynamically by a speech recognizer. The approach provides a configurable tradeoff between data size and recognition-time computation. This tradeoff can be obtained without sacrificing recognition accuracy, and in particular, allows full modeling of all cross-word phoneme contexts. In one aspect of the invention, a specification of a grammar is processed. This specification includes specifications of a number of constituents of the grammar. A first subset of the constituents of the grammar are selected, and the remaining of the constituents form a second subset. For each of the constituents in the first subset the method first includes processing the specification of the constituent to form a first processed representation that defines sequences of elements that are associated with that constituent and that includes words and references to constituents in the first subset.Type: ApplicationFiled: July 16, 2001Publication date: January 9, 2003Inventors: Johan Schalkwyk, Michael S. Phillips
-
Publication number: 20020184018Abstract: To propose a digital signal processing method and learning method and devices therefor, and a program storage medium which are capable of further improving the waveform reproducibility of a digital signal. Self correlation coefficients D40 and D41 are calculated respectively by cutting parts out of the digital signal D10 by multiple windows having different sizes, and the parts are classified based on the calculation results D15 of the self correlation coefficients D40 and D41 and then, the digital signal D10 is converted by the prediction method corresponding to the classified class, so that the conversion further suitable for the features of the digital signal D10 can be conducted.Type: ApplicationFiled: March 29, 2002Publication date: December 5, 2002Inventor: Tetsujiro Kondo
-
Patent number: 6314392Abstract: In a computerized method a continuous signal is segmented in order to determine statistically stationary units of the signal. The continuous signal is sampled at periodic intervals to produce a timed sequence of digital samples. Fixed numbers of adjacent digital samples are grouped into a plurality of disjoint sets or frames. A statistical distance between adjacent frames is determined. The adjacent sets are merged into a larger set of samples or cluster if the statistical distance is less than a predetermined threshold. In an iterative process, the statistical distance between the adjacent sets are determined, and as long as the distance is less than the predetermined threshold, the sets are iteratively merged to segment the signal into statistically stationary units.Type: GrantFiled: September 20, 1996Date of Patent: November 6, 2001Assignee: Digital Equipment CorporationInventors: Brian S. Eberman, William D. Goldenthal
-
Patent number: 6275799Abstract: A first parameter set constituting reference patterns of each category in speech recognition based on pattern matching with a reference pattern is to be determined from a plurality of learning utterance data. The first parameter set is determined so that a third evaluation function, represented by a sum of a first evaluation function and a second evaluation function is maximized. The first evaluation function represents a matching degree between all learning utterances and corresponding reference patterns. The second evaluation function represents a matching degree between elements of the first parameter set.Type: GrantFiled: February 2, 1995Date of Patent: August 14, 2001Assignee: NEC CorporationInventor: Ken-ichi Iso
-
Patent number: 6253175Abstract: Systems and methods for processing acoustic speech signals which utilize the wavelet transform (and alternatively, the Fourier transform) as a fundamental tool. The method essentially involves “synchrosqueezing” spectral component data obtained by performing a wavelet transform (or Fourier transform) on digitized speech signals. In one aspect, spectral components of the synchrosqueezed plane are dynamically tracked via a K-means clustering algorithm. The amplitude, frequency and bandwidth of each of the components are, thus, extracted. The cepstrum generated from this information is referred to as “K-mean Wastrum.” In another aspect, the result of the K-mean clustering process is further processed to limit the set of primary components to formants. The resulting features are referred to as “formant-based wastrum.” Formants are interpolated in unvoiced regions and the contribution of unvoiced turbulent part of the spectrum are added.Type: GrantFiled: November 30, 1998Date of Patent: June 26, 2001Assignee: International Business Machines CorporationInventors: Sankar Basu, Stephane H. Maes
-
Patent number: 6201960Abstract: An improved method and system of measuring the perceived speech quality in mobile telecommunications networks is disclosed herein. In an embodiment of the invention, the method uses both radio link parameters and an objective measuring technique performed on received signals to estimate the speech quality perceived by the end-user. A radio link processing stage extracts temporal information from a set of available radio link parameters such as the BER, FER, RxLev, handover statistics, soft information, and speech energy. Concurrently, a speech processing stage is used to process a sequence of original signals and received signals, obtained from the output of a telecommunications system. The signal sequences are processed by an objective measuring technique such as Perceptual Speech Quality Measure (PSQM). The outputs from the radio link processing and speech processing stages are utilized to calculate an estimate for speech quality.Type: GrantFiled: June 24, 1997Date of Patent: March 13, 2001Assignee: Telefonaktiebolaget LM Ericsson (publ)Inventors: Tor Björn Minde, Anders Tomas Uvliden, Per Anders Karlsson, Per Gunnar Heikkilä
-
Patent number: 6199041Abstract: A method and system for transforming a sampling rate in speech recognition systems, in accordance with the present invention, includes the steps of providing cepstral based data including utterances comprised of segments at a reference frequency, the segments being represented by cepstral vector coefficients, converting the cepstral vector coefficients to energy bands in logarithmic spectra, filtering the energy bands of the logarithmic spectra to remove energy bands having a frequency above a predetermined portion of a target frequency and converting the filtered logarithmic spectra to modified cepstral vector coefficients at the target frequency. Another method and system convert system prototypes for speech recognition systems from a reference frequency to a target frequency.Type: GrantFiled: November 20, 1998Date of Patent: March 6, 2001Assignee: International Business Machines CorporationInventors: Fu-Hua Liu, Michael A. Picheny
-
Patent number: 6157830Abstract: A method and system for measuring the speech quality in a mobile cellular telecommunications network using available radio link parameters is disclosed herein. In a preferred embodiment, the method includes receiving a set of radio link parameters, as defined in a standard or otherwise available, such as the BER, FER, RxLev, handover statistics, soft information, and speech energy. Temporal information is obtained from the radio link parameters to create a set of temporal parameters which can be statistically analyzed, for example, for the maximum and minimum, mean, standard deviation, and autocorrelation values for a time interval. The temporal parameters are combined to yield a set of correlated parameters that are more closely related to the speech quality. An estimator then uses the correlated parameters to calculate an estimate for the speech quality. The method of the present invention takes advantage of temporal information and correlated relationships from the transmitted parameters.Type: GrantFiled: May 22, 1997Date of Patent: December 5, 2000Assignee: Telefonaktiebolaget LM EricssonInventors: Tor Bjorn Minde, Anders Tomas Uvliden, Per Anders Karlsson, Per Gunnar Heikkil.ang.
-
Patent number: 5787395Abstract: A voice recognizing method in which a plurality of voice recognition objective words are provided. Scores are accumulated for an unknown input voice signal as compared to the voice recognition objective words by using parameters which are calculated in advance. Upon receipt of an unknown voice signal, a corresponding voice recognition objective word is extracted and recognized. The voice recognition objective words are structured into an overlapping hierarchical structure by using correlation values between each pair of voice recognition objective words. This correlation may be computed from acoustic features, HMM parameters or the like. Score calculation is performed on the unknown input voice signal by using a dictionary of the voice recognition objective words structured in the hierarchical structure. Upon preliminary recognition, the dictionary of the voice recognition objective words is resorted without recalculation of the correlation values.Type: GrantFiled: July 18, 1996Date of Patent: July 28, 1998Assignee: Sony CorporationInventor: Katsuki Minamino