Endpoint Detection Patents (Class 704/248)

Voice region detection apparatus and method with color noise removal using run statistics

Patent number: 7630891

Abstract: The present invention relates to a voice region detection apparatus and method capable of accurately detecting a voice region even in a voice signal with color noise. The voice region detection method comprises the steps of, if a voice signal is input, dividing the input voice signal into frames; performing whitening of surrounding noise by combining white noise with the frames; extracting random parameters indicating randomness of frames from the frames subjected to the whitening; classifying the frames into voice frames and noise frames based on the extracted random parameters; and detecting a voice region by calculating start and end positions of a voice based on the voice and noise frames. According to the present invention, the voice region can be accurately detected even in a voice signal with a large amount of color noise mixed therewith.

Type: Grant

Filed: November 26, 2003

Date of Patent: December 8, 2009

Assignee: Samsung Electronics Co., Ltd.

Inventors: Kwang-cheol Oh, Yong-beom Lee
Method and apparatus for automatically generating a general extraction function calculable on an input signal, e.g. an audio signal to extract therefrom a predetermined global characteristic value of its contents, e.g. a descriptor

Patent number: 7624012

Abstract: The invention enables to generate a general function (4) which can operate on an input signal (Sx) to extract from the latter a value (DVex) of a global characteristic value expressing a feature (De) of the information conveyed by that signal. It operates by: generating at least one compound function (CF1-CFn), said compound function being generated from at least one of a set of elementary functions (EF1, EF2, . . .

Type: Grant

Filed: December 16, 2003

Date of Patent: November 24, 2009

Assignee: Sony France S.A.

Inventors: François Pachet, Aymeric Zils
Functional operations for accessing and/or building interlocking trees datastores to enable their use with applications software

Patent number: 7593923

Abstract: A set of mechanisms handles communication with a Knowledge Store and its K Engine(s). The Knowledge Store (Kstore) does not need indexes or tables to support it but instead is formed by the construction of interlocking trees of pointers in nodes of the interlocking trees. The K Engine builds and is used to query a KStore by using threads that use software objects together with a K Engine to learn particlized events, thus building the KStore, and these or other software objects can be used to make queries and get answers from the KStore, usually with the help of a K Engine. Under some circumstances, information can be obtained directly from the KStore, but is generally only available through the actions of the K Engine. The mechanisms provide communications pathways for users and applications software to build and/or query the KStore. Both these processes can proceed simultaneously, and in multiple instances. There can be a plurality of engines operating on a KStore essentially simultaneously.

Type: Grant

Filed: June 29, 2004

Date of Patent: September 22, 2009

Assignee: Unisys Corporation

Inventors: Jane Campbell Mazzagatti, Jane Van Keuren Claar, Tony T. Phan
Selection of incoming call screening treatment based on emotional state criterion

Patent number: 7580512

Abstract: An incoming call screening treatment is selected for a call to a called communication device based on an emotional state criterion input by a user of the called communication device.

Type: Grant

Filed: June 28, 2005

Date of Patent: August 25, 2009

Assignee: Alcatel-Lucent USA Inc.

Inventors: Ramachendra P. Batni, Ranjan Sharma
Speaker selection training via a-posteriori Gaussian mixture model analysis, transformation, and combination of hidden Markov models

Patent number: 7574359

Abstract: The present invention is directed to a 3-stage adaptation framework based on speaker selection training. First a subset of cohort speakers is selected for a test speaker. Then cohort models are transformed to be closer to the test speaker. Finally the adapted model for the test speaker is obtained by combining these transformed cohort models. Combination weights as well as bias items can be adaptively learned from adaptation data.

Type: Grant

Filed: October 1, 2004

Date of Patent: August 11, 2009

Assignee: Microsoft Corporation

Inventor: Chao Huang
Harmonic structure based acoustic speech interval detection method and device

Patent number: 7567900

Abstract: A harmonic structure acoustic signal detection device not depending on the level fluctuation of the input signal including: an FFT unit which performs FFT on an input signal and calculates a power spectrum component for each frame; a harmonic structure extraction unit which leaves only a harmonic structure from the power spectrum component; a voiced feature evaluation unit which evaluates correlation between the frames of harmonic structures extracted by the harmonic structure extraction unit, thereby evaluates whether or not the segment is a vowel segment, and extracts the voiced segment; and a speech segment determination unit which determines a speech segment according to the continuity and durability of the output of the voiced feature evaluation unit.

Type: Grant

Filed: June 3, 2004

Date of Patent: July 28, 2009

Assignee: Panasonic Corporation

Inventors: Tetsu Suzuki, Takeo Kanamori, Takashi Kawamura
Path change detector for echo cancellation

Patent number: 7555117

Abstract: A path change is detected by multiplying the energy of a signal on the output of a summation circuit in one channel of a telephone by a constant to produce a product. The product is compared with the energy of a signal on the input of the summation circuit. A path change is indicated when the energy of the product exceeds the energy of a signal on the input of the summation circuit. The comparison is made in successive frames of an audio signal and a path change is indicated when the energy of a signal on the input of the summation circuit is exceeded in two or more successive frames.

Type: Grant

Filed: July 12, 2005

Date of Patent: June 30, 2009

Assignee: Acoustic Technologies, Inc.

Inventors: Seth Suppappola, Franklyn H. Story
Service for providing speaker voice metrics

Patent number: 7487090

Abstract: A method of providing voice metrics over an established telephone call between a user and a subscriber can include receiving voice information from the user over the call and determining biometric information from the voice information for the user. The method further can include encoding the biometric metric information and sending the biometric information to the subscriber over the call.

Type: Grant

Filed: December 15, 2003

Date of Patent: February 3, 2009

Assignee: International Business Machines Corporation

Inventors: Thomas E. Creamer, Peeyush Jaiswal, Victor S. Moore
Optimization of detection systems using a detection error tradeoff analysis criterion

Patent number: 7424425

Abstract: In detection systems, such as speaker verification systems, for a given operating point range, with an associated detection “cost”, the detection cost is preferably reduced by essentially trading off the system error in the area of interest with areas essentially “outside” that interest. Among the advantages achieved thereby are higher optimization gain and better generalization. From a measurable Detection Error Tradeoff (DET) curve of the given detection system, a criterion is preferably derived, such that its minimization provably leads to detection cost reduction in the area of interest. The criterion allows for selective access to the slope and offset of the DET curve (a line in case of normally distributed detection scores, a curve approximated by mixture of Gaussians in case of other distributions). By modifying the slope of the DET curve, the behavior of the detection system is changed favorably with respect to the given area of interest.

Type: Grant

Filed: May 19, 2002

Date of Patent: September 9, 2008

Assignee: International Business Machines Corporation

Inventors: Jiri Navratil, Ganesh N. Ramaswamy
System and method for Mandarin Chinese speech recognition using an optimized phone set

Patent number: 7353173

Abstract: The present invention comprises a system and method for implementing a Mandarin Chinese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Mandarin Chinese phone set. The optimized Mandarin Chinese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Mandarin Chinese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Mandarin Chinese speech during the speech recognition procedure.

Type: Grant

Filed: March 31, 2003

Date of Patent: April 1, 2008

Assignees: Sony Corporation, Sony Electronics Inc.

Inventors: Xavier Menendez-Pidal, Lei Duan, Jingwen Lu, Lex Olorenshaw
System and method for cantonese speech recognition using an optimized phone set

Patent number: 7353172

Abstract: The present invention comprises a system and method for implementing a Cantonese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Cantonese phone set. The optimized Cantonese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Cantonese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Cantonese speech during the speech recognition procedure.

Type: Grant

Filed: March 24, 2003

Date of Patent: April 1, 2008

Assignees: Sony Corporation, Sony Electronics Inc.

Inventors: Michael Emonts, Xavier Menendez-Pidal, Lex Olorenshaw
System and method for effectively implementing a Mandarin Chinese speech recognition dictionary

Patent number: 7353174

Abstract: The present invention comprises a system and method for effectively implementing a Mandarin Chinese speech recognition dictionary, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Mandarin Chinese phone set. The optimized Mandarin Chinese phone set may efficiently be implemented by utilizing an allophone and phonemic variation technique. In addition, the foregoing vocabulary dictionary may be implemented by utilizing unified dictionary optimization techniques to provide robust and accurate speech recognition. Furthermore, the vocabulary dictionary may be implemented as an optimized dictionary to accurately recognize either Northern Mandarin Chinese speech or Southern Mandarin Chinese speech during the speech recognition procedure.

Type: Grant

Filed: March 31, 2003

Date of Patent: April 1, 2008

Assignees: Sony Corporation, Sony Electronics Inc.

Inventors: Xavier Menendez-Pidal, Lei Duan, Jingwen Lu, Lex Olorenshaw
Method and apparatus to perform speech reference enrollment based on input speech characteristics

Patent number: 7319956

Abstract: A speech reference enrollment method involves requesting a user speak a word; detecting a first utterance; requesting the user speak the word; detecting a second utterance; determining a first similarity between the first utterance and the second utterance; when the first similarity is less than a predetermined similarity, requesting the user speak the word; detecting a third utterance; determining a second similarity between the first utterance and the third utterance; and when the second similarity is greater than or equal to the predetermined similarity, creating a reference.

Type: Grant

Filed: March 23, 2001

Date of Patent: January 15, 2008

Assignee: SBC Properties, L.P.

Inventor: Robert Wesley Bossemeyer, Jr.
System and method for a endpoint detection of speech for improved speech recognition in noisy environments

Patent number: 7277853

Abstract: According to a disclosed embodiment, an endpointer determines the background energy of a first portion of a speech signal, and a cepstral computing module extracts one or more features of the first portion. The endpointer calculates an average distance of the first portion based on the features. Subsequently, an energy computing module measures the energy of a second portion of the speech signal, and the cepstral computing module extracts one or more features of the second portion. Based on the features of the second portion, the endpointer calculates a distance of the second portion. Thereafter, the endpointer contrasts the energy of the second portion with the background energy of the first portion, and compares the distance of the second portion with the distance of the first portion. The second portion of the speech signal is classified by the endpointer as speech or non-speech based on the contrast and the comparison.

Type: Grant

Filed: September 5, 2001

Date of Patent: October 2, 2007

Assignee: Mindspeed Technologies, Inc.

Inventors: Sahar E. Bou-Ghazale, Ayman O. Asadi, Khaled Assaleh
Audio signal encoding method combining codes having different frame lengths and data rates

Patent number: 7222068

Abstract: A system for transmitting audio signals over a telecommunications link generates the signals as two or more alternative feeds, for example at different data rates. The two feeds are encoded using coding methods having a frame structure with different frame lengths. To facilitate switching between the two, the input signal is notionally divided into temporal portions and each is coded by taking it, plus enough of the next (or preceding) portion to make up a whole number of frames, and encoding it, whereby the encoded portions overlap—at least for one of the feeds. The overlap is lost upon decoding by discarding duplicate material.

Type: Grant

Filed: November 19, 2001

Date of Patent: May 22, 2007

Assignee: British Telecommunications public limited company

Inventors: Anthony R Leaning, Richard J Whiting
Method and apparatus for speech recognition

Patent number: 7072835

Abstract: A method and apparatus for speech recognition of the present application has a process to collate, with an input utterance, an acoustic model corresponding to a hypothesis to be expressed by the connection of utterance segments, such as phonemes or syllables, and developed according to a length of an input utterance by an inter-word connection rule thereby obtaining a recognition score. Within a word of the hypothesis, the similar hypotheses high in utterance score within a predetermined threshold from the maximum value of the score are all held to a word end irrespectively of the number of hypotheses. Meanwhile, at a word end of the hypotheses, the hypotheses are narrowed to a predetermined number of upper ranking in the order of higher score.

Type: Grant

Filed: January 17, 2002

Date of Patent: July 4, 2006

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Tomohiro Konuma, Tsuyoshi Inoue, Mitsuru Endo, Natsuki Saito, Akira Ishida, Tatsuya Kimura
Speaker recognition using dynamic time warp template spotting

Patent number: 7050973

Abstract: An improved template spotting technique may be implemented as part of text dependent speaker verification system to authenticate a user of a wireless communication device. This technique may be suitable for use in noisy environments and for wireless communication devices with limited processing power. Endpoints of a test utterance are identified by first computing local distances between test frames and a target template. Accumulated distances are then computed from the local distances. Endpoints of the utterance may be identified when one or more of the accumulated distances is below a predetermined threshold. Once endpoints of a test utterance are identified, a dynamic time warp (DTW) process may be used to determine whether the test utterance matches a training template. One embodiment of the present invention aligns multiple training templates to reduce the probability of failing to verify the identity of a speaker that should have been properly verified.

Type: Grant

Filed: April 22, 2002

Date of Patent: May 23, 2006

Assignee: Intel Corporation

Inventor: Hagai Aronowitz
Method and apparatus for processing an input speech signal during presentation of an output audio signal

Patent number: 6937977

Abstract: A start of an input speech signal is detected during presentation of an output audio signal and an input start time, relative to the output audio signal, is determined. The input start time is then provided for use in responding to the input speech signal. In another embodiment, the output audio signal has a corresponding identification. When the input speech signal is detected during presentation of the output audio signal, the identification of the output audio signal is provided for use in responding to the input speech signal. Information signals comprising data and/or control signals are provided in response to at least the contextual information provided, i.e., the input start time and/or the identification of the output audio signal. In this manner, the present invention accurately establishes a context of an input speech signal relative to an output audio signal regardless of the delay characteristics of the underlying communication system.

Type: Grant

Filed: October 5, 1999

Date of Patent: August 30, 2005

Assignee: fastmobile, Inc.

Inventor: Ira A. Gerson
Speaker recognition

Patent number: 6922668

Abstract: This invention relates to an improved method and apparatus for speaker recognition. In this invention, prior to comparing feature vectors derived from speech with a stored reference model the feature vectors are processed by applying a speaker dependent transform which matches the characteristics of a particular speaker's vocal tract. Features derived from speech which has very dissimilar characteristics to those of the speaker on which the transform is dependent may be severely distorted by the transform, whereas features from speech which has similar characteristics to those of the speaker on which the transform is dependent will be distorted far less.

Type: Grant

Filed: February 25, 2000

Date of Patent: July 26, 2005

Assignee: British Telecommunications Public Limited Company

Inventor: Simon N. Downey
Prosody based endpoint detection

Patent number: 6873953

Abstract: A method and apparatus are provided for performing prosody based endpoint detection of speech in a speech recognition system. Input speech represents an utterance, which has an intonation pattern. An end-of-utterance condition is identified based on prosodic parameters of the utterance, such as the intonation pattern and the duration of the final syllable of the utterance, as well as non-prosodic parameters, such as the log energy of the speech.

Type: Grant

Filed: May 22, 2000

Date of Patent: March 29, 2005

Assignee: Nuance Communications

Inventor: Matthew Lennig
Interactive process for recognition and evaluation of a partial search query and display of interactive results

Patent number: 6862713

Abstract: A method for presenting to an end-user the intermediate matching search results of a keyword search in an index list of information. The method comprising the steps of: coupling to a search engine a graphical user interface for accepting keyword search terms for searching the indexed list of information with the search engine; receiving one or more keyword search terms with one or more separation characters separating there between; performing a keyword search with the one or more keyword search terms received when a separation character is received; and presenting the number of documents matching the keyword search terms to the end-user, and presenting a graphical menu item on a display. In accordance with another embodiment of the present invention, an information processing system and computer readable storage medium carries out the above method.

Type: Grant

Filed: August 31, 1999

Date of Patent: March 1, 2005

Assignee: International Business Machines Corporation

Inventors: Reiner Kraft, W. Scott Spangler
Method and means for creating prosody in speech regeneration for laryngectomees

Patent number: 6795807

Abstract: A device and a method to be used by laryngeally impaired people to improve the naturalness of their speech. An artificial sound creating mechanism which forms a simulated glottal pulse in the vocal tract is utilized. An artificial glottal pulse is compared with the natural spectrum and an inverse filter is generated to provide an output signal which would better reproduce natural sound. A digital signal processor introduces a variation of pitch based on an algorithm developed for this purpose; i.e. creating prosody. The algorithm uses primarily the relative amplitude of the speech signal and the rise and fall rates of the amplitude as a basis for setting the frequency of the speech. The invention also clarifies speech of laryngectomees by sensing the presence of consonants in the speech and appropriately amplifying them with respect to the vowel sounds.

Type: Grant

Filed: August 17, 2000

Date of Patent: September 21, 2004

Inventor: David R. Baraff
Method and apparatus for performing real-time endpoint detection in automatic speech recognition

Patent number: 6782363

Abstract: A method and apparatus for performing real-time endpoint detection for use in automatic speech recognition. A filter is applied to the input speech signal and the filter output is then evaluated with use of a state transition diagram (i.e., a finite state machine). The filter is advantageously designed in light of several criteria in order to increase the accuracy and robustness of detection. The state transition diagram advantageously has three states. The endpoints which are detected may then be advantageously applied to the problem of energy normalization of the speech portion of the signal.

Type: Grant

Filed: May 4, 2001

Date of Patent: August 24, 2004

Assignee: Lucent Technologies Inc.

Inventors: Chin-Hui Lee, Qi P. Li, Jinsong Zheng, Qiru Zhou
Method and apparatus for performing double-talk detection with an adaptive decision threshold

Patent number: 6775653

Abstract: A double-talk detector (DTD) method of performing double-talk detection, an echo canceller, and a method of performing echo cancellation is used with an echo canceller (EC) to sense when an echo is corrupted by near-end speech (NES). The double-talk detector inhibits the adaptation of a synthesizing filter when NES is present, in order to avoid divergence of the adaptive algorithm. Due to the time varying properties of the echo path and the signal levels, a suitable decision threshold ensures the accuracy of the DTD. The double-talk detector utilizes an adaptive decision threshold which is capable of tracking variations in the echo path and signal/noise levels during a call.

Type: Grant

Filed: March 30, 2000

Date of Patent: August 10, 2004

Assignee: Agere Systems Inc.

Inventor: Xiong Guan Wei
Adaptation of compound gaussian mixture models

Publication number: 20040117183

Abstract: Methods and arrangementgs for enhancing speech recognition in noisy environments, via providing providing at least one initial Compound Gaussian Mixture model, applying an adaptation algorithm to at least one item associated with speech enrollment data and to the at least one initial Compound Gaussian Mixture model to yield an intermediate output, and mathematically combining the at least one initial Compound Gaussian Mixture model with the intermediate output to yield an adapted Compound Gaussian Mixture model.

Type: Application

Filed: December 13, 2002

Publication date: June 17, 2004

Applicant: IBM Corporation

Inventors: Sabine V. Deligne, Satyanarayana Dharanipragada
Methods and apparatus for identifying unknown speakers using a hierarchical tree structure

Patent number: 6748356

Abstract: A method and apparatus are disclosed for identifying speakers participating in an audio-video source, whether or not such speakers have been previously registered or enrolled. A speaker segmentation system separates the speakers and identifies all possible frames where there is a segment boundary between non-homogeneous speech portions. A hierarchical speaker tree clustering system clusters homogeneous segments (generally corresponding to the same speaker), and assigns a cluster identifier to each detected segment, whether or not the actual name of the speaker is known. A hierarchical enrolled speaker database is used that includes one or more background models for unenrolled speakers to assign a speaker to each identified segment.

Type: Grant

Filed: June 7, 2000

Date of Patent: June 8, 2004

Assignee: International Business Machines Corporation

Inventors: Homayoon Sadr Mohammad Beigi, Mahesh Viswanathan
Speech enabled, automatic telephone dialer using names, including seamless interface with computer-based address book programs

Patent number: 6563911

Abstract: The present invention a speech enabled automatic telephone dialer device, system, and method using a spoken name corresponding to name-telephone number data of computer-based address book programs. The invention includes user telephones connected to a PBX-type telephony mechanism, which is connected to a telephony board of a name dialer device. User computer workstations containing loaded address book programs with name-telephone number data are connected to the name dialer device. The name dialer device includes a host computer in a network; a telephony board for controlling the PBX for dialing; memory within the host computer for storing software and name-telephone number data; and, software to access computer-based address book programs, to receive voice inputs from the PBX-type telephony mechanism, to create converted phonemes from names to match voice inputs with specific name-telephone number data from the computer-based address book programs for initiating an automatic dialing.

Type: Grant

Filed: January 23, 2001

Date of Patent: May 13, 2003

Assignee: iVoice, Inc.

Inventor: Jerome R. Mahoney
Method for identifying authorized users using a spectrogram and apparatus of the same

Publication number: 20020116189

Abstract: A method for identifying authorized users and the apparatus of the same, which identifies users by comparison with specific spectrograms of authorized users. The method comprises the steps of: (i) detecting the end point of a verbalized sample from the user requesting access; (ii) retrieving speech features from a spectrogram of the speech; (iii) determining whether training is necessary, and if so, taking the speech features as a reference template, setting a threshold and going back to (i), otherwise going on to next step; (iv) matching patterns of the speech features and the reference template; (v) computing a distance between the speech features and the reference template according to the matching result of (iv) to obtain a distance scoring; (vi) comparing the distance scoring with the threshold; (vii) determining whether the user is authorized according to the compared result of (vi).

Type: Application

Filed: June 19, 2001

Publication date: August 22, 2002

Applicant: Winbond Electronics Corp.

Inventors: Tsuei-Chi Yeh, Wen-Yuan Chen
Speech recognition apparatus, method and storage medium thereof

Patent number: 6341263

Abstract: A voice recognition system, method and storage medium is provided. The system includes a plurality of storage sections, a selection section, an adaptation section, a plurality of calculation sections, an adaptation section, a normalization section and a decision section. The method includes the steps for performing the functions associated with the sections.

Type: Grant

Filed: May 17, 1999

Date of Patent: January 22, 2002

Assignee: NEC Corporation

Inventors: Eiko Yamada, Hiroaki Hattori
Method and apparatus for accurate endpointing of speech in the presence of noise

Patent number: 6324509

Abstract: An apparatus for accurate endpointing of speech in the presence of noise includes a processor and a software module. The processor executes the instructions of the software module to compare an utterance with a first signal-to-noise-ratio (SNR) threshold value to determine a first starting point and a first ending point of the utterance. The processor then compares with a second SNR threshold value a part of the utterance that predates the first starting point to determine a second starting point of the utterance. The processor also then compares with the second SNR threshold value a part of the utterance that postdates the first ending point to determine a second ending point of the utterance. The first and second SNR threshold values are recalculated periodically to reflect changing SNR conditions. The first SNR threshold value advantageously exceeds the second SNR threshold value.

Type: Grant

Filed: February 8, 1999

Date of Patent: November 27, 2001

Assignee: Qualcomm Incorporated

Inventors: Ning Bi, Chienchung Chang, Andrew P. Dejaco
Communication device and method for endpointing speech utterances

Patent number: 6321197

Abstract: A communication device capable of endpointing speech utterances includes a microprocessor (110) connected to communication interface circuitry (115), memory (120), audio circuitry (130), an optional keypad (140), a display (150), and a vibrator/buzzer (160). Audio circuitry (130) is connected to microphone (133) and speaker (135). Microprocessor (110) includes a speech/noise classifier and speech recognition technology. Microprocessor (110) analyzes a speech signal to determine speech waveform parameters within a speech acquisition window. Microprocessor (110) compares the speech waveform parameters to determine the start and end points of the speech utterance. Microprocessor (110) starts at a frame index based on the energy centroid of the speech utterance and analyzes the frames preceding and following the frame index to determine the endpoints.

Type: Grant

Filed: January 22, 1999

Date of Patent: November 20, 2001

Assignee: Motorola, Inc.

Inventors: William M. Kushner, Audrius Polikaitis
APPARATUS AND METHOD FOR RECOGNIZING VOICE WITH REDUCED SENSITIVITY TO AMBIENT NOISE

Publication number: 20010029449

Abstract: A voice recognizing apparatus includes a filter bank for deriving feature parameters of voice from a microphone and a filter bank for deriving feature parameters of noise represented by an electric signal which is directly inputted to a terminal. The voice parameters and the noise parameters are respectively stored in a voice parameter buffer and a noise parameter buffer. A threshold value for sampling the voice parameters is set to be changed in accordance with a noise level for each frame. From a series of the voice parameters exceeding the threshold value, a series of the noise parameters corresponding thereto are subtracted so that only a voice parameter pattern can be obtained. In a recognition mode, the voice pattern is compared with a reference pattern to recognize the voice from the microphone. In addition, in a registration mode, the voice pattern is registered as a reference pattern.

Type: Application

Filed: July 21, 1997

Publication date: October 11, 2001

Inventors: SHIN-ICHI TSURUFUJI, MASAYUKI IIDA, RYUJI SUZUKI
System and method for segmentation and recognition of speech signals

Patent number: 6278972

Abstract: A system and method for forming a segmented speech signal from an input speech signal having a plurality of frames. The input speech signal is converted from a time domain signal to a frequency domain signal having a plurality of speech frames, wherein each speech frame in the frequency domain signal is represented by at least one spectral value associated with the speech frame. A spectral difference value is then determined for each pair of adjacent frames in the frequency domain signal, wherein the spectral difference value for each pair of adjacent frames is representative of a difference between the at least one spectral value associated with each frame in the pair of adjacent frames. An initial cluster boundary is set between each pair of adjacent frames in the frequency domain signal, and a variance value is assigned to each cluster in the frequency domain signal, wherein the variance value for each cluster is equal to one of the determined spectral difference values.

Type: Grant

Filed: January 4, 1999

Date of Patent: August 21, 2001

Assignee: Qualcomm Incorporated

Inventors: Ning Bi, Chienchung Chang
Voice recognition process and device, associated remote control device

Publication number: 20010012998

Abstract: The invention relates to a voice recognition device.

Type: Application

Filed: December 14, 2000

Publication date: August 9, 2001

Inventors: Pierrick Jouet, Frederic Soufflet
Multi-resolution system and method for speaker verification

Patent number: 6272463

Abstract: A method is given for generating a speaker-dependant model of an utterance that has at least one occurrence. The method includes generating an initial model, having a first resolution, that encodes each of the occurrences of the utterance; and generating at least one additional speaker-specific model, having a different resolution from that of the initial model, of all occurrences of the utterance.

Type: Grant

Filed: March 3, 1999

Date of Patent: August 7, 2001

Assignee: Lernout & Hauspie Speech Products N.V.

Inventor: Martine Lapere
Method and apparatus for detecting voice activity

Patent number: 6182035

Abstract: A voice activity detector that implements a fast wavelet transformation using filter pairs. A quadrature high pass filter provides an output signal corresponding to the upper half of the Nyquist frequency and a quadrature low pass filter provides an output signal corresponding to the lower half of the Nyquist frequency. The quadrature high pass filter is useful for catching and isolating transients in the input signal and the quadrature low pass filter is useful for fine frequency analysis. The voice activity detector can utilize multiple decomposition levels that are arranged in a pyramid or tree formation to increase the reliability of the voice activity decision. For example, the output of the quadrature low pass filter can be further decomposed using a second pair of filters. The voice activity decision can be generated by comparing a signal power estimate for the output of the filter pairs to threshold levels that are specific for each filter or frequency range.

Type: Grant

Filed: March 26, 1998

Date of Patent: January 30, 2001

Assignee: Telefonaktiebolaget LM Ericsson (publ)

Inventor: Fisseha Mekuria
Background energy estimation

Patent number: 6157670

Abstract: A method of estimating background noise in a signal. The signal is divided into blocks of equal predetermined length. The minimum energy of the signal during the length of each block is determined. The minimum energy determined for the current block is compared to a previous determination of minimum energy. If the current minimum energy exceeds a predetermined maximum energy level, the current block minimum energy is discarded and the previous determination remains unchanged. If the current block minimum energy is below the previous determination, the previous estimate is reduced by the difference between the previous determination and current minimum energy. If the current energy is above the previous determination but below the maximum, the previous estimate is increased by half of the difference between the current energy and the previous estimate. The increase factor may also be adjusted to increase the current estimated energy level by a factor of any amount between and including 0 and 1.

Type: Grant

Filed: August 10, 1999

Date of Patent: December 5, 2000

Assignee: Telogy Networks, Inc.

Inventor: Bogdan Kosanovic
Method and apparatus to detect and delimit foreground speech

Patent number: 6134524

Abstract: The present invention provides improved foreground-speech signal endpointing by computing a spectral stationarity statistic. This statistic is used by a finite state machine to endpoint speech. Endpointing using the spectral stationarity statistic is less susceptible to background noise than endpointing using conventional measures. The present invention uses frame-synchronous quantile estimation to generate a mask signal for signal to Noise Ratio Normalization.

Type: Grant

Filed: October 24, 1997

Date of Patent: October 17, 2000

Assignee: Nortel Networks Corporation

Inventors: Stephen Douglas Peters, Daniel Boies
Pattern recognition

Patent number: 6078884

Abstract: Pattern recognition apparatus uses a recognition processor for processing an input signal to indicate its similarity to allowed sequences of reference patterns to be recognised. A speech recognition processor includes a classification arrangement to identify a sequence of patterns corresponding to said input signal and for repeatedly partitioning the input signal into a speech-containing portion and, preceding and/or following said speech-containing portion, noise or silence portions. A noise model generator is provided to generate a pattern of the noise or silence portion, for subsequent use by said classification means for pattern identification purposes. The noise model generator may generate a noise model for each noise portion of the input signal, which may be used to adapt the reference patterns.

Type: Grant

Filed: March 26, 1998

Date of Patent: June 20, 2000

Assignee: British Telecommunications public limited company

Inventor: Simon N. Downey
Speech spurt detecting apparatus and method with threshold adapted by noise and speech statistics

Patent number: 6044342

Abstract: A speech spurt detecting apparatus for detecting speech spurts in a voice signal has a storage for storing an input voice signal. A decision portion determines speech spurt sections and mute sections using a threshold value and sets one of the mute sections at a latter part of a hangover time. A mute level statistical processor estimates the noise distribution of a signal in the mute sections. A speech spurt detecting threshold value decision portion receives the average and the variance of the noise distribution from the mute level statistical processor and approximates the noise distribution to a gamma distribution to decide a speech spurt detecting threshold. A speech spurt transmitting portion outputs the voice signal in the speech spurt sections from the storage. A speech spurt level statistical processor carries out statistical processing of the speech spurt sections.

Type: Grant

Filed: November 25, 1997

Date of Patent: March 28, 2000

Assignee: Logic Corporation

Inventors: Nobuki Sato, Hiroshi Kamei, Takamasa Tomono, Makoto Aoki, Jina Baek
Integrated endpoint detection for improved speech recognition method and system

Patent number: 6029130

Abstract: A method and a system recognize speech based upon an approach which combines certain advantages of speech detection and word spotting for improved accuracy without sacrificing efficiency. The improved method and system is based upon the determination of a total similarity value based upon a cumulative value and power information at or substantially near a terminal frame.

Type: Grant

Filed: August 20, 1997

Date of Patent: February 22, 2000

Assignee: Ricoh Company, Ltd.

Inventor: Takashi Ariyoshi
Speech recognition apparatus for consumer electronic applications

Patent number: 6021387

Abstract: A spoken word or phrase recognition device. The device does not require a digital signal processor, large RAM, or extensive analog circuitry. The input audio signal is digitized and passed recursively through a digital difference filter to produce a multiplicity of filtered output waveforms. These waveforms are processed in real time by a microprocessor to generate a pattern that is recognized by a neural network pattern classifier that operates in software in the microprocessor. By application of additional techniques, this device has been shown to recognize an unknown speaker saying a digit from zero through nine with an accuracy greater than 99%. Because of the recognition accuracy and cost-effective design, the device may be used in cost sensitive applications such as toys, electronic learning aids, and consumer electronic products.

Type: Grant

Filed: February 23, 1998

Date of Patent: February 1, 2000

Assignee: Sensory Circuits, Inc.

Inventors: Forrest S. Mozer, Michael C. Mozer, Todd F. Mozer
Criteria for usable repetitions of an utterance during speech reference enrollment

Patent number: 6012027

Abstract: A speech reference enrollment method involves the following steps: (a) requesting a user speak a vocabulary word; (b) detecting a first utterance (354); (c) requesting the user speak the vocabulary word; (d) detecting a second utterance (358); (e) determining a first similarity between the first utterance and the second utterance (362); (f) when the first similarity is less than a predetermined similarity, requesting the user speak the vocabulary word; (g) detecting a third utterance (366); (h) determining a second similarity between the first utterance and the third utterance (370); and (i) when the second similarity is greater than or equal to the predetermined similarity, creating a reference (364).

Type: Grant

Filed: September 17, 1997

Date of Patent: January 4, 2000

Assignee: Ameritech Corporation

Inventor: Robert Wesley Bossemeyer, Jr.
Method and system for efficiently avoiding partial matching in voice recognition

Patent number: 5974381

Abstract: To avoid a predetermined amount of time and or a certain amount of processing time prior to determining a number of frames for each speech input portion, a fast voice recognition system enables real-time frame counting based upon a comparison between a decreasing number of frames and an increasing time-dependent threshold. The real-time voice recognition also enables a substantially reduced rate for erroneous partial matching.

Type: Grant

Filed: December 19, 1997

Date of Patent: October 26, 1999

Assignee: Ricoh Company, Ltd.

Inventor: Syuji Kubota
Method and apparatus for adapting the language model's size in a speech recognition system

Patent number: 5899973

Abstract: In this speech recognition system, the size of the language model is reduced by discarding those n-grams that the acoustic part of the system can recognize most accurately without support from a language model. The n-grams can be discarded dynamically during the running of the system or during the build or setup-time of the system. Trigrams occurring infrequently in the text corpora are substituted for the discarded n-grams to increase the accuracy of the word recognitions.

Type: Grant

Filed: September 25, 1997

Date of Patent: May 4, 1999

Assignee: International Business Machines Corporation

Inventors: Upali Bandara, Siegfried Kunzmann, Karlheinz Mohr, Burn L. Lewis
Voice recognition and voice response apparatus using speech period start point and termination point

Patent number: 5884257

Abstract: A voice recognition apparatus is provided which includes a first detection circuit for receiving an electric signal corresponding to voice. The first detection circuit detects a voice termination point representing a time at which the input of the electric signal corresponding to the voice is terminated based on the electric signal. The apparatus further includes a second detection circuit for determining a speech period, the speech period being a period in which the voice is uttered within a whole period in which the voice is input, based on the electric signal. In addition, the apparatus includes a feature amount extracting circuit for producing a feature amount vector, on the basis of a part of the electric signal corresponding to the speech period. A memory is provided for storing feature amount vectors for a plurality of voice candidates which are previously generated.

Type: Grant

Filed: January 30, 1997

Date of Patent: March 16, 1999

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Hidetsugu Maekawa, Tatsumi Watanabe, Kazuaki Obara, Kazuhiro Kayashima, Kenji Matsui, Yoshihiko Matsukawa
Speech detection device

Patent number: 5826230

Abstract: The device detects the beginning and ending portions of speech contained within an input signal based on the variance of smoothed frequency band limited energy and the history of the smoothed frequency band limited energy within the signal. The use of the variance allows detection which is relatively independent of an absolute signal-to-noise ratio with the signal, and allows accurate detection within a wide variety of backgrounds such as music, motor noise, and background noise, such as other voices. The device can be easily implemented using off-the-shelf hardware along with a high-speed special purpose digital signal processor integrated circuit.

Type: Grant

Filed: March 18, 1996

Date of Patent: October 20, 1998

Assignees: Matsushita Electric Industrial Co., Ltd., Panasonic Technologies, Inc.

Inventor: Benjamin Kerr Reaves
Method and device for rating of speech quality by calculating time delays from onset of vowel sounds

Patent number: 5806028

Abstract: A method and device for determining quality of speech. The speech to be evaluated is listened to by a person who reproduces the speech. The end of vowel sounds in the produced and reproduced speech respectively are determined. The difference between the ends of the vowel sounds is registered. From the obtained time differences an average value is determined. The average value indicates the quality of the produced speech. The invention can be used for evaluation of different speech sources.

Type: Grant

Filed: February 14, 1996

Date of Patent: September 8, 1998

Assignee: Telia AB

Inventor: Bertil Lyberg
Speech recognizing device and method assuming a current frame is an end point of a current reference pattern

Patent number: 5799275

Abstract: A speech recognition system automatically designates a scope of a partial reference pattern. Plural reference patterns, each of which ends in each of composing frames and starts from a preceding frame, are supposed and cumulative distances at every frame are calculated. A partial reference pattern that has a minimal distance value as compared with all other partial reference patterns is taken as a partial input speech recognizing result.

Type: Grant

Filed: June 18, 1996

Date of Patent: August 25, 1998

Assignees: The Japan Iron and Steel Federation, Sharp Kabushiki Kaisha, Real World Computing Partnership

Inventors: Yoshiaki Itoh, Jiro Kiyama, Hiroshi Kojima, Susumu Seki, Ryuichi Oka
Start/end point detection for word recognition

Patent number: 5794195

Abstract: During speech recognition of words, a precise and strong detection of start/end points of the words must be ensured, even in very noisy surroundings. Use of a feature with noise-resistant properties is shown wherein for a feature vector, a function of the signal energy is formed as the first feature and a function of the quadratic difference of an LPC (Linear-Predictive-Coding) cepstrum coefficient as a second feature. A check quantity or a maximum function of a distribution function is calculated, which detects the start/end points by comparison with a threshold.

Type: Grant

Filed: May 12, 1997

Date of Patent: August 11, 1998

Assignee: Alcatel N.V.

Inventors: Thomas Hormann, Gregor Rozinaj

prev 1 2 3 4 next