Endpoint Detection Patents (Class 704/248)
  • Patent number: 7630891
    Abstract: The present invention relates to a voice region detection apparatus and method capable of accurately detecting a voice region even in a voice signal with color noise. The voice region detection method comprises the steps of, if a voice signal is input, dividing the input voice signal into frames; performing whitening of surrounding noise by combining white noise with the frames; extracting random parameters indicating randomness of frames from the frames subjected to the whitening; classifying the frames into voice frames and noise frames based on the extracted random parameters; and detecting a voice region by calculating start and end positions of a voice based on the voice and noise frames. According to the present invention, the voice region can be accurately detected even in a voice signal with a large amount of color noise mixed therewith.
    Type: Grant
    Filed: November 26, 2003
    Date of Patent: December 8, 2009
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Kwang-cheol Oh, Yong-beom Lee
  • Patent number: 7624012
    Abstract: The invention enables to generate a general function (4) which can operate on an input signal (Sx) to extract from the latter a value (DVex) of a global characteristic value expressing a feature (De) of the information conveyed by that signal. It operates by: generating at least one compound function (CF1-CFn), said compound function being generated from at least one of a set of elementary functions (EF1, EF2, . . .
    Type: Grant
    Filed: December 16, 2003
    Date of Patent: November 24, 2009
    Assignee: Sony France S.A.
    Inventors: François Pachet, Aymeric Zils
  • Patent number: 7593923
    Abstract: A set of mechanisms handles communication with a Knowledge Store and its K Engine(s). The Knowledge Store (Kstore) does not need indexes or tables to support it but instead is formed by the construction of interlocking trees of pointers in nodes of the interlocking trees. The K Engine builds and is used to query a KStore by using threads that use software objects together with a K Engine to learn particlized events, thus building the KStore, and these or other software objects can be used to make queries and get answers from the KStore, usually with the help of a K Engine. Under some circumstances, information can be obtained directly from the KStore, but is generally only available through the actions of the K Engine. The mechanisms provide communications pathways for users and applications software to build and/or query the KStore. Both these processes can proceed simultaneously, and in multiple instances. There can be a plurality of engines operating on a KStore essentially simultaneously.
    Type: Grant
    Filed: June 29, 2004
    Date of Patent: September 22, 2009
    Assignee: Unisys Corporation
    Inventors: Jane Campbell Mazzagatti, Jane Van Keuren Claar, Tony T. Phan
  • Patent number: 7580512
    Abstract: An incoming call screening treatment is selected for a call to a called communication device based on an emotional state criterion input by a user of the called communication device.
    Type: Grant
    Filed: June 28, 2005
    Date of Patent: August 25, 2009
    Assignee: Alcatel-Lucent USA Inc.
    Inventors: Ramachendra P. Batni, Ranjan Sharma
  • Patent number: 7574359
    Abstract: The present invention is directed to a 3-stage adaptation framework based on speaker selection training. First a subset of cohort speakers is selected for a test speaker. Then cohort models are transformed to be closer to the test speaker. Finally the adapted model for the test speaker is obtained by combining these transformed cohort models. Combination weights as well as bias items can be adaptively learned from adaptation data.
    Type: Grant
    Filed: October 1, 2004
    Date of Patent: August 11, 2009
    Assignee: Microsoft Corporation
    Inventor: Chao Huang
  • Patent number: 7567900
    Abstract: A harmonic structure acoustic signal detection device not depending on the level fluctuation of the input signal including: an FFT unit which performs FFT on an input signal and calculates a power spectrum component for each frame; a harmonic structure extraction unit which leaves only a harmonic structure from the power spectrum component; a voiced feature evaluation unit which evaluates correlation between the frames of harmonic structures extracted by the harmonic structure extraction unit, thereby evaluates whether or not the segment is a vowel segment, and extracts the voiced segment; and a speech segment determination unit which determines a speech segment according to the continuity and durability of the output of the voiced feature evaluation unit.
    Type: Grant
    Filed: June 3, 2004
    Date of Patent: July 28, 2009
    Assignee: Panasonic Corporation
    Inventors: Tetsu Suzuki, Takeo Kanamori, Takashi Kawamura
  • Patent number: 7555117
    Abstract: A path change is detected by multiplying the energy of a signal on the output of a summation circuit in one channel of a telephone by a constant to produce a product. The product is compared with the energy of a signal on the input of the summation circuit. A path change is indicated when the energy of the product exceeds the energy of a signal on the input of the summation circuit. The comparison is made in successive frames of an audio signal and a path change is indicated when the energy of a signal on the input of the summation circuit is exceeded in two or more successive frames.
    Type: Grant
    Filed: July 12, 2005
    Date of Patent: June 30, 2009
    Assignee: Acoustic Technologies, Inc.
    Inventors: Seth Suppappola, Franklyn H. Story
  • Patent number: 7487090
    Abstract: A method of providing voice metrics over an established telephone call between a user and a subscriber can include receiving voice information from the user over the call and determining biometric information from the voice information for the user. The method further can include encoding the biometric metric information and sending the biometric information to the subscriber over the call.
    Type: Grant
    Filed: December 15, 2003
    Date of Patent: February 3, 2009
    Assignee: International Business Machines Corporation
    Inventors: Thomas E. Creamer, Peeyush Jaiswal, Victor S. Moore
  • Patent number: 7424425
    Abstract: In detection systems, such as speaker verification systems, for a given operating point range, with an associated detection “cost”, the detection cost is preferably reduced by essentially trading off the system error in the area of interest with areas essentially “outside” that interest. Among the advantages achieved thereby are higher optimization gain and better generalization. From a measurable Detection Error Tradeoff (DET) curve of the given detection system, a criterion is preferably derived, such that its minimization provably leads to detection cost reduction in the area of interest. The criterion allows for selective access to the slope and offset of the DET curve (a line in case of normally distributed detection scores, a curve approximated by mixture of Gaussians in case of other distributions). By modifying the slope of the DET curve, the behavior of the detection system is changed favorably with respect to the given area of interest.
    Type: Grant
    Filed: May 19, 2002
    Date of Patent: September 9, 2008
    Assignee: International Business Machines Corporation
    Inventors: Jiri Navratil, Ganesh N. Ramaswamy
  • Patent number: 7353173
    Abstract: The present invention comprises a system and method for implementing a Mandarin Chinese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Mandarin Chinese phone set. The optimized Mandarin Chinese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Mandarin Chinese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Mandarin Chinese speech during the speech recognition procedure.
    Type: Grant
    Filed: March 31, 2003
    Date of Patent: April 1, 2008
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Xavier Menendez-Pidal, Lei Duan, Jingwen Lu, Lex Olorenshaw
  • Patent number: 7353172
    Abstract: The present invention comprises a system and method for implementing a Cantonese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Cantonese phone set. The optimized Cantonese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Cantonese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Cantonese speech during the speech recognition procedure.
    Type: Grant
    Filed: March 24, 2003
    Date of Patent: April 1, 2008
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Michael Emonts, Xavier Menendez-Pidal, Lex Olorenshaw
  • Patent number: 7353174
    Abstract: The present invention comprises a system and method for effectively implementing a Mandarin Chinese speech recognition dictionary, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Mandarin Chinese phone set. The optimized Mandarin Chinese phone set may efficiently be implemented by utilizing an allophone and phonemic variation technique. In addition, the foregoing vocabulary dictionary may be implemented by utilizing unified dictionary optimization techniques to provide robust and accurate speech recognition. Furthermore, the vocabulary dictionary may be implemented as an optimized dictionary to accurately recognize either Northern Mandarin Chinese speech or Southern Mandarin Chinese speech during the speech recognition procedure.
    Type: Grant
    Filed: March 31, 2003
    Date of Patent: April 1, 2008
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Xavier Menendez-Pidal, Lei Duan, Jingwen Lu, Lex Olorenshaw
  • Patent number: 7319956
    Abstract: A speech reference enrollment method involves requesting a user speak a word; detecting a first utterance; requesting the user speak the word; detecting a second utterance; determining a first similarity between the first utterance and the second utterance; when the first similarity is less than a predetermined similarity, requesting the user speak the word; detecting a third utterance; determining a second similarity between the first utterance and the third utterance; and when the second similarity is greater than or equal to the predetermined similarity, creating a reference.
    Type: Grant
    Filed: March 23, 2001
    Date of Patent: January 15, 2008
    Assignee: SBC Properties, L.P.
    Inventor: Robert Wesley Bossemeyer, Jr.
  • Patent number: 7277853
    Abstract: According to a disclosed embodiment, an endpointer determines the background energy of a first portion of a speech signal, and a cepstral computing module extracts one or more features of the first portion. The endpointer calculates an average distance of the first portion based on the features. Subsequently, an energy computing module measures the energy of a second portion of the speech signal, and the cepstral computing module extracts one or more features of the second portion. Based on the features of the second portion, the endpointer calculates a distance of the second portion. Thereafter, the endpointer contrasts the energy of the second portion with the background energy of the first portion, and compares the distance of the second portion with the distance of the first portion. The second portion of the speech signal is classified by the endpointer as speech or non-speech based on the contrast and the comparison.
    Type: Grant
    Filed: September 5, 2001
    Date of Patent: October 2, 2007
    Assignee: Mindspeed Technologies, Inc.
    Inventors: Sahar E. Bou-Ghazale, Ayman O. Asadi, Khaled Assaleh
  • Patent number: 7222068
    Abstract: A system for transmitting audio signals over a telecommunications link generates the signals as two or more alternative feeds, for example at different data rates. The two feeds are encoded using coding methods having a frame structure with different frame lengths. To facilitate switching between the two, the input signal is notionally divided into temporal portions and each is coded by taking it, plus enough of the next (or preceding) portion to make up a whole number of frames, and encoding it, whereby the encoded portions overlap—at least for one of the feeds. The overlap is lost upon decoding by discarding duplicate material.
    Type: Grant
    Filed: November 19, 2001
    Date of Patent: May 22, 2007
    Assignee: British Telecommunications public limited company
    Inventors: Anthony R Leaning, Richard J Whiting
  • Patent number: 7072835
    Abstract: A method and apparatus for speech recognition of the present application has a process to collate, with an input utterance, an acoustic model corresponding to a hypothesis to be expressed by the connection of utterance segments, such as phonemes or syllables, and developed according to a length of an input utterance by an inter-word connection rule thereby obtaining a recognition score. Within a word of the hypothesis, the similar hypotheses high in utterance score within a predetermined threshold from the maximum value of the score are all held to a word end irrespectively of the number of hypotheses. Meanwhile, at a word end of the hypotheses, the hypotheses are narrowed to a predetermined number of upper ranking in the order of higher score.
    Type: Grant
    Filed: January 17, 2002
    Date of Patent: July 4, 2006
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Tomohiro Konuma, Tsuyoshi Inoue, Mitsuru Endo, Natsuki Saito, Akira Ishida, Tatsuya Kimura
  • Patent number: 7050973
    Abstract: An improved template spotting technique may be implemented as part of text dependent speaker verification system to authenticate a user of a wireless communication device. This technique may be suitable for use in noisy environments and for wireless communication devices with limited processing power. Endpoints of a test utterance are identified by first computing local distances between test frames and a target template. Accumulated distances are then computed from the local distances. Endpoints of the utterance may be identified when one or more of the accumulated distances is below a predetermined threshold. Once endpoints of a test utterance are identified, a dynamic time warp (DTW) process may be used to determine whether the test utterance matches a training template. One embodiment of the present invention aligns multiple training templates to reduce the probability of failing to verify the identity of a speaker that should have been properly verified.
    Type: Grant
    Filed: April 22, 2002
    Date of Patent: May 23, 2006
    Assignee: Intel Corporation
    Inventor: Hagai Aronowitz
  • Patent number: 6937977
    Abstract: A start of an input speech signal is detected during presentation of an output audio signal and an input start time, relative to the output audio signal, is determined. The input start time is then provided for use in responding to the input speech signal. In another embodiment, the output audio signal has a corresponding identification. When the input speech signal is detected during presentation of the output audio signal, the identification of the output audio signal is provided for use in responding to the input speech signal. Information signals comprising data and/or control signals are provided in response to at least the contextual information provided, i.e., the input start time and/or the identification of the output audio signal. In this manner, the present invention accurately establishes a context of an input speech signal relative to an output audio signal regardless of the delay characteristics of the underlying communication system.
    Type: Grant
    Filed: October 5, 1999
    Date of Patent: August 30, 2005
    Assignee: fastmobile, Inc.
    Inventor: Ira A. Gerson
  • Patent number: 6922668
    Abstract: This invention relates to an improved method and apparatus for speaker recognition. In this invention, prior to comparing feature vectors derived from speech with a stored reference model the feature vectors are processed by applying a speaker dependent transform which matches the characteristics of a particular speaker's vocal tract. Features derived from speech which has very dissimilar characteristics to those of the speaker on which the transform is dependent may be severely distorted by the transform, whereas features from speech which has similar characteristics to those of the speaker on which the transform is dependent will be distorted far less.
    Type: Grant
    Filed: February 25, 2000
    Date of Patent: July 26, 2005
    Assignee: British Telecommunications Public Limited Company
    Inventor: Simon N. Downey
  • Patent number: 6873953
    Abstract: A method and apparatus are provided for performing prosody based endpoint detection of speech in a speech recognition system. Input speech represents an utterance, which has an intonation pattern. An end-of-utterance condition is identified based on prosodic parameters of the utterance, such as the intonation pattern and the duration of the final syllable of the utterance, as well as non-prosodic parameters, such as the log energy of the speech.
    Type: Grant
    Filed: May 22, 2000
    Date of Patent: March 29, 2005
    Assignee: Nuance Communications
    Inventor: Matthew Lennig
  • Patent number: 6862713
    Abstract: A method for presenting to an end-user the intermediate matching search results of a keyword search in an index list of information. The method comprising the steps of: coupling to a search engine a graphical user interface for accepting keyword search terms for searching the indexed list of information with the search engine; receiving one or more keyword search terms with one or more separation characters separating there between; performing a keyword search with the one or more keyword search terms received when a separation character is received; and presenting the number of documents matching the keyword search terms to the end-user, and presenting a graphical menu item on a display. In accordance with another embodiment of the present invention, an information processing system and computer readable storage medium carries out the above method.
    Type: Grant
    Filed: August 31, 1999
    Date of Patent: March 1, 2005
    Assignee: International Business Machines Corporation
    Inventors: Reiner Kraft, W. Scott Spangler
  • Patent number: 6795807
    Abstract: A device and a method to be used by laryngeally impaired people to improve the naturalness of their speech. An artificial sound creating mechanism which forms a simulated glottal pulse in the vocal tract is utilized. An artificial glottal pulse is compared with the natural spectrum and an inverse filter is generated to provide an output signal which would better reproduce natural sound. A digital signal processor introduces a variation of pitch based on an algorithm developed for this purpose; i.e. creating prosody. The algorithm uses primarily the relative amplitude of the speech signal and the rise and fall rates of the amplitude as a basis for setting the frequency of the speech. The invention also clarifies speech of laryngectomees by sensing the presence of consonants in the speech and appropriately amplifying them with respect to the vowel sounds.
    Type: Grant
    Filed: August 17, 2000
    Date of Patent: September 21, 2004
    Inventor: David R. Baraff
  • Patent number: 6782363
    Abstract: A method and apparatus for performing real-time endpoint detection for use in automatic speech recognition. A filter is applied to the input speech signal and the filter output is then evaluated with use of a state transition diagram (i.e., a finite state machine). The filter is advantageously designed in light of several criteria in order to increase the accuracy and robustness of detection. The state transition diagram advantageously has three states. The endpoints which are detected may then be advantageously applied to the problem of energy normalization of the speech portion of the signal.
    Type: Grant
    Filed: May 4, 2001
    Date of Patent: August 24, 2004
    Assignee: Lucent Technologies Inc.
    Inventors: Chin-Hui Lee, Qi P. Li, Jinsong Zheng, Qiru Zhou
  • Patent number: 6775653
    Abstract: A double-talk detector (DTD) method of performing double-talk detection, an echo canceller, and a method of performing echo cancellation is used with an echo canceller (EC) to sense when an echo is corrupted by near-end speech (NES). The double-talk detector inhibits the adaptation of a synthesizing filter when NES is present, in order to avoid divergence of the adaptive algorithm. Due to the time varying properties of the echo path and the signal levels, a suitable decision threshold ensures the accuracy of the DTD. The double-talk detector utilizes an adaptive decision threshold which is capable of tracking variations in the echo path and signal/noise levels during a call.
    Type: Grant
    Filed: March 30, 2000
    Date of Patent: August 10, 2004
    Assignee: Agere Systems Inc.
    Inventor: Xiong Guan Wei
  • Publication number: 20040117183
    Abstract: Methods and arrangementgs for enhancing speech recognition in noisy environments, via providing providing at least one initial Compound Gaussian Mixture model, applying an adaptation algorithm to at least one item associated with speech enrollment data and to the at least one initial Compound Gaussian Mixture model to yield an intermediate output, and mathematically combining the at least one initial Compound Gaussian Mixture model with the intermediate output to yield an adapted Compound Gaussian Mixture model.
    Type: Application
    Filed: December 13, 2002
    Publication date: June 17, 2004
    Applicant: IBM Corporation
    Inventors: Sabine V. Deligne, Satyanarayana Dharanipragada
  • Patent number: 6748356
    Abstract: A method and apparatus are disclosed for identifying speakers participating in an audio-video source, whether or not such speakers have been previously registered or enrolled. A speaker segmentation system separates the speakers and identifies all possible frames where there is a segment boundary between non-homogeneous speech portions. A hierarchical speaker tree clustering system clusters homogeneous segments (generally corresponding to the same speaker), and assigns a cluster identifier to each detected segment, whether or not the actual name of the speaker is known. A hierarchical enrolled speaker database is used that includes one or more background models for unenrolled speakers to assign a speaker to each identified segment.
    Type: Grant
    Filed: June 7, 2000
    Date of Patent: June 8, 2004
    Assignee: International Business Machines Corporation
    Inventors: Homayoon Sadr Mohammad Beigi, Mahesh Viswanathan
  • Patent number: 6563911
    Abstract: The present invention a speech enabled automatic telephone dialer device, system, and method using a spoken name corresponding to name-telephone number data of computer-based address book programs. The invention includes user telephones connected to a PBX-type telephony mechanism, which is connected to a telephony board of a name dialer device. User computer workstations containing loaded address book programs with name-telephone number data are connected to the name dialer device. The name dialer device includes a host computer in a network; a telephony board for controlling the PBX for dialing; memory within the host computer for storing software and name-telephone number data; and, software to access computer-based address book programs, to receive voice inputs from the PBX-type telephony mechanism, to create converted phonemes from names to match voice inputs with specific name-telephone number data from the computer-based address book programs for initiating an automatic dialing.
    Type: Grant
    Filed: January 23, 2001
    Date of Patent: May 13, 2003
    Assignee: iVoice, Inc.
    Inventor: Jerome R. Mahoney
  • Publication number: 20020116189
    Abstract: A method for identifying authorized users and the apparatus of the same, which identifies users by comparison with specific spectrograms of authorized users. The method comprises the steps of: (i) detecting the end point of a verbalized sample from the user requesting access; (ii) retrieving speech features from a spectrogram of the speech; (iii) determining whether training is necessary, and if so, taking the speech features as a reference template, setting a threshold and going back to (i), otherwise going on to next step; (iv) matching patterns of the speech features and the reference template; (v) computing a distance between the speech features and the reference template according to the matching result of (iv) to obtain a distance scoring; (vi) comparing the distance scoring with the threshold; (vii) determining whether the user is authorized according to the compared result of (vi).
    Type: Application
    Filed: June 19, 2001
    Publication date: August 22, 2002
    Applicant: Winbond Electronics Corp.
    Inventors: Tsuei-Chi Yeh, Wen-Yuan Chen
  • Patent number: 6341263
    Abstract: A voice recognition system, method and storage medium is provided. The system includes a plurality of storage sections, a selection section, an adaptation section, a plurality of calculation sections, an adaptation section, a normalization section and a decision section. The method includes the steps for performing the functions associated with the sections.
    Type: Grant
    Filed: May 17, 1999
    Date of Patent: January 22, 2002
    Assignee: NEC Corporation
    Inventors: Eiko Yamada, Hiroaki Hattori
  • Patent number: 6324509
    Abstract: An apparatus for accurate endpointing of speech in the presence of noise includes a processor and a software module. The processor executes the instructions of the software module to compare an utterance with a first signal-to-noise-ratio (SNR) threshold value to determine a first starting point and a first ending point of the utterance. The processor then compares with a second SNR threshold value a part of the utterance that predates the first starting point to determine a second starting point of the utterance. The processor also then compares with the second SNR threshold value a part of the utterance that postdates the first ending point to determine a second ending point of the utterance. The first and second SNR threshold values are recalculated periodically to reflect changing SNR conditions. The first SNR threshold value advantageously exceeds the second SNR threshold value.
    Type: Grant
    Filed: February 8, 1999
    Date of Patent: November 27, 2001
    Assignee: Qualcomm Incorporated
    Inventors: Ning Bi, Chienchung Chang, Andrew P. Dejaco
  • Patent number: 6321197
    Abstract: A communication device capable of endpointing speech utterances includes a microprocessor (110) connected to communication interface circuitry (115), memory (120), audio circuitry (130), an optional keypad (140), a display (150), and a vibrator/buzzer (160). Audio circuitry (130) is connected to microphone (133) and speaker (135). Microprocessor (110) includes a speech/noise classifier and speech recognition technology. Microprocessor (110) analyzes a speech signal to determine speech waveform parameters within a speech acquisition window. Microprocessor (110) compares the speech waveform parameters to determine the start and end points of the speech utterance. Microprocessor (110) starts at a frame index based on the energy centroid of the speech utterance and analyzes the frames preceding and following the frame index to determine the endpoints.
    Type: Grant
    Filed: January 22, 1999
    Date of Patent: November 20, 2001
    Assignee: Motorola, Inc.
    Inventors: William M. Kushner, Audrius Polikaitis
  • Publication number: 20010029449
    Abstract: A voice recognizing apparatus includes a filter bank for deriving feature parameters of voice from a microphone and a filter bank for deriving feature parameters of noise represented by an electric signal which is directly inputted to a terminal. The voice parameters and the noise parameters are respectively stored in a voice parameter buffer and a noise parameter buffer. A threshold value for sampling the voice parameters is set to be changed in accordance with a noise level for each frame. From a series of the voice parameters exceeding the threshold value, a series of the noise parameters corresponding thereto are subtracted so that only a voice parameter pattern can be obtained. In a recognition mode, the voice pattern is compared with a reference pattern to recognize the voice from the microphone. In addition, in a registration mode, the voice pattern is registered as a reference pattern.
    Type: Application
    Filed: July 21, 1997
    Publication date: October 11, 2001
    Inventors: SHIN-ICHI TSURUFUJI, MASAYUKI IIDA, RYUJI SUZUKI
  • Patent number: 6278972
    Abstract: A system and method for forming a segmented speech signal from an input speech signal having a plurality of frames. The input speech signal is converted from a time domain signal to a frequency domain signal having a plurality of speech frames, wherein each speech frame in the frequency domain signal is represented by at least one spectral value associated with the speech frame. A spectral difference value is then determined for each pair of adjacent frames in the frequency domain signal, wherein the spectral difference value for each pair of adjacent frames is representative of a difference between the at least one spectral value associated with each frame in the pair of adjacent frames. An initial cluster boundary is set between each pair of adjacent frames in the frequency domain signal, and a variance value is assigned to each cluster in the frequency domain signal, wherein the variance value for each cluster is equal to one of the determined spectral difference values.
    Type: Grant
    Filed: January 4, 1999
    Date of Patent: August 21, 2001
    Assignee: Qualcomm Incorporated
    Inventors: Ning Bi, Chienchung Chang
  • Publication number: 20010012998
    Abstract: The invention relates to a voice recognition device.
    Type: Application
    Filed: December 14, 2000
    Publication date: August 9, 2001
    Inventors: Pierrick Jouet, Frederic Soufflet
  • Patent number: 6272463
    Abstract: A method is given for generating a speaker-dependant model of an utterance that has at least one occurrence. The method includes generating an initial model, having a first resolution, that encodes each of the occurrences of the utterance; and generating at least one additional speaker-specific model, having a different resolution from that of the initial model, of all occurrences of the utterance.
    Type: Grant
    Filed: March 3, 1999
    Date of Patent: August 7, 2001
    Assignee: Lernout & Hauspie Speech Products N.V.
    Inventor: Martine Lapere
  • Patent number: 6182035
    Abstract: A voice activity detector that implements a fast wavelet transformation using filter pairs. A quadrature high pass filter provides an output signal corresponding to the upper half of the Nyquist frequency and a quadrature low pass filter provides an output signal corresponding to the lower half of the Nyquist frequency. The quadrature high pass filter is useful for catching and isolating transients in the input signal and the quadrature low pass filter is useful for fine frequency analysis. The voice activity detector can utilize multiple decomposition levels that are arranged in a pyramid or tree formation to increase the reliability of the voice activity decision. For example, the output of the quadrature low pass filter can be further decomposed using a second pair of filters. The voice activity decision can be generated by comparing a signal power estimate for the output of the filter pairs to threshold levels that are specific for each filter or frequency range.
    Type: Grant
    Filed: March 26, 1998
    Date of Patent: January 30, 2001
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventor: Fisseha Mekuria
  • Patent number: 6157670
    Abstract: A method of estimating background noise in a signal. The signal is divided into blocks of equal predetermined length. The minimum energy of the signal during the length of each block is determined. The minimum energy determined for the current block is compared to a previous determination of minimum energy. If the current minimum energy exceeds a predetermined maximum energy level, the current block minimum energy is discarded and the previous determination remains unchanged. If the current block minimum energy is below the previous determination, the previous estimate is reduced by the difference between the previous determination and current minimum energy. If the current energy is above the previous determination but below the maximum, the previous estimate is increased by half of the difference between the current energy and the previous estimate. The increase factor may also be adjusted to increase the current estimated energy level by a factor of any amount between and including 0 and 1.
    Type: Grant
    Filed: August 10, 1999
    Date of Patent: December 5, 2000
    Assignee: Telogy Networks, Inc.
    Inventor: Bogdan Kosanovic
  • Patent number: 6134524
    Abstract: The present invention provides improved foreground-speech signal endpointing by computing a spectral stationarity statistic. This statistic is used by a finite state machine to endpoint speech. Endpointing using the spectral stationarity statistic is less susceptible to background noise than endpointing using conventional measures. The present invention uses frame-synchronous quantile estimation to generate a mask signal for signal to Noise Ratio Normalization.
    Type: Grant
    Filed: October 24, 1997
    Date of Patent: October 17, 2000
    Assignee: Nortel Networks Corporation
    Inventors: Stephen Douglas Peters, Daniel Boies
  • Patent number: 6078884
    Abstract: Pattern recognition apparatus uses a recognition processor for processing an input signal to indicate its similarity to allowed sequences of reference patterns to be recognised. A speech recognition processor includes a classification arrangement to identify a sequence of patterns corresponding to said input signal and for repeatedly partitioning the input signal into a speech-containing portion and, preceding and/or following said speech-containing portion, noise or silence portions. A noise model generator is provided to generate a pattern of the noise or silence portion, for subsequent use by said classification means for pattern identification purposes. The noise model generator may generate a noise model for each noise portion of the input signal, which may be used to adapt the reference patterns.
    Type: Grant
    Filed: March 26, 1998
    Date of Patent: June 20, 2000
    Assignee: British Telecommunications public limited company
    Inventor: Simon N. Downey
  • Patent number: 6044342
    Abstract: A speech spurt detecting apparatus for detecting speech spurts in a voice signal has a storage for storing an input voice signal. A decision portion determines speech spurt sections and mute sections using a threshold value and sets one of the mute sections at a latter part of a hangover time. A mute level statistical processor estimates the noise distribution of a signal in the mute sections. A speech spurt detecting threshold value decision portion receives the average and the variance of the noise distribution from the mute level statistical processor and approximates the noise distribution to a gamma distribution to decide a speech spurt detecting threshold. A speech spurt transmitting portion outputs the voice signal in the speech spurt sections from the storage. A speech spurt level statistical processor carries out statistical processing of the speech spurt sections.
    Type: Grant
    Filed: November 25, 1997
    Date of Patent: March 28, 2000
    Assignee: Logic Corporation
    Inventors: Nobuki Sato, Hiroshi Kamei, Takamasa Tomono, Makoto Aoki, Jina Baek
  • Patent number: 6029130
    Abstract: A method and a system recognize speech based upon an approach which combines certain advantages of speech detection and word spotting for improved accuracy without sacrificing efficiency. The improved method and system is based upon the determination of a total similarity value based upon a cumulative value and power information at or substantially near a terminal frame.
    Type: Grant
    Filed: August 20, 1997
    Date of Patent: February 22, 2000
    Assignee: Ricoh Company, Ltd.
    Inventor: Takashi Ariyoshi
  • Patent number: 6021387
    Abstract: A spoken word or phrase recognition device. The device does not require a digital signal processor, large RAM, or extensive analog circuitry. The input audio signal is digitized and passed recursively through a digital difference filter to produce a multiplicity of filtered output waveforms. These waveforms are processed in real time by a microprocessor to generate a pattern that is recognized by a neural network pattern classifier that operates in software in the microprocessor. By application of additional techniques, this device has been shown to recognize an unknown speaker saying a digit from zero through nine with an accuracy greater than 99%. Because of the recognition accuracy and cost-effective design, the device may be used in cost sensitive applications such as toys, electronic learning aids, and consumer electronic products.
    Type: Grant
    Filed: February 23, 1998
    Date of Patent: February 1, 2000
    Assignee: Sensory Circuits, Inc.
    Inventors: Forrest S. Mozer, Michael C. Mozer, Todd F. Mozer
  • Patent number: 6012027
    Abstract: A speech reference enrollment method involves the following steps: (a) requesting a user speak a vocabulary word; (b) detecting a first utterance (354); (c) requesting the user speak the vocabulary word; (d) detecting a second utterance (358); (e) determining a first similarity between the first utterance and the second utterance (362); (f) when the first similarity is less than a predetermined similarity, requesting the user speak the vocabulary word; (g) detecting a third utterance (366); (h) determining a second similarity between the first utterance and the third utterance (370); and (i) when the second similarity is greater than or equal to the predetermined similarity, creating a reference (364).
    Type: Grant
    Filed: September 17, 1997
    Date of Patent: January 4, 2000
    Assignee: Ameritech Corporation
    Inventor: Robert Wesley Bossemeyer, Jr.
  • Patent number: 5974381
    Abstract: To avoid a predetermined amount of time and or a certain amount of processing time prior to determining a number of frames for each speech input portion, a fast voice recognition system enables real-time frame counting based upon a comparison between a decreasing number of frames and an increasing time-dependent threshold. The real-time voice recognition also enables a substantially reduced rate for erroneous partial matching.
    Type: Grant
    Filed: December 19, 1997
    Date of Patent: October 26, 1999
    Assignee: Ricoh Company, Ltd.
    Inventor: Syuji Kubota
  • Patent number: 5899973
    Abstract: In this speech recognition system, the size of the language model is reduced by discarding those n-grams that the acoustic part of the system can recognize most accurately without support from a language model. The n-grams can be discarded dynamically during the running of the system or during the build or setup-time of the system. Trigrams occurring infrequently in the text corpora are substituted for the discarded n-grams to increase the accuracy of the word recognitions.
    Type: Grant
    Filed: September 25, 1997
    Date of Patent: May 4, 1999
    Assignee: International Business Machines Corporation
    Inventors: Upali Bandara, Siegfried Kunzmann, Karlheinz Mohr, Burn L. Lewis
  • Patent number: 5884257
    Abstract: A voice recognition apparatus is provided which includes a first detection circuit for receiving an electric signal corresponding to voice. The first detection circuit detects a voice termination point representing a time at which the input of the electric signal corresponding to the voice is terminated based on the electric signal. The apparatus further includes a second detection circuit for determining a speech period, the speech period being a period in which the voice is uttered within a whole period in which the voice is input, based on the electric signal. In addition, the apparatus includes a feature amount extracting circuit for producing a feature amount vector, on the basis of a part of the electric signal corresponding to the speech period. A memory is provided for storing feature amount vectors for a plurality of voice candidates which are previously generated.
    Type: Grant
    Filed: January 30, 1997
    Date of Patent: March 16, 1999
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Hidetsugu Maekawa, Tatsumi Watanabe, Kazuaki Obara, Kazuhiro Kayashima, Kenji Matsui, Yoshihiko Matsukawa
  • Patent number: 5826230
    Abstract: The device detects the beginning and ending portions of speech contained within an input signal based on the variance of smoothed frequency band limited energy and the history of the smoothed frequency band limited energy within the signal. The use of the variance allows detection which is relatively independent of an absolute signal-to-noise ratio with the signal, and allows accurate detection within a wide variety of backgrounds such as music, motor noise, and background noise, such as other voices. The device can be easily implemented using off-the-shelf hardware along with a high-speed special purpose digital signal processor integrated circuit.
    Type: Grant
    Filed: March 18, 1996
    Date of Patent: October 20, 1998
    Assignees: Matsushita Electric Industrial Co., Ltd., Panasonic Technologies, Inc.
    Inventor: Benjamin Kerr Reaves
  • Patent number: 5806028
    Abstract: A method and device for determining quality of speech. The speech to be evaluated is listened to by a person who reproduces the speech. The end of vowel sounds in the produced and reproduced speech respectively are determined. The difference between the ends of the vowel sounds is registered. From the obtained time differences an average value is determined. The average value indicates the quality of the produced speech. The invention can be used for evaluation of different speech sources.
    Type: Grant
    Filed: February 14, 1996
    Date of Patent: September 8, 1998
    Assignee: Telia AB
    Inventor: Bertil Lyberg
  • Patent number: 5799275
    Abstract: A speech recognition system automatically designates a scope of a partial reference pattern. Plural reference patterns, each of which ends in each of composing frames and starts from a preceding frame, are supposed and cumulative distances at every frame are calculated. A partial reference pattern that has a minimal distance value as compared with all other partial reference patterns is taken as a partial input speech recognizing result.
    Type: Grant
    Filed: June 18, 1996
    Date of Patent: August 25, 1998
    Assignees: The Japan Iron and Steel Federation, Sharp Kabushiki Kaisha, Real World Computing Partnership
    Inventors: Yoshiaki Itoh, Jiro Kiyama, Hiroshi Kojima, Susumu Seki, Ryuichi Oka
  • Patent number: 5794195
    Abstract: During speech recognition of words, a precise and strong detection of start/end points of the words must be ensured, even in very noisy surroundings. Use of a feature with noise-resistant properties is shown wherein for a feature vector, a function of the signal energy is formed as the first feature and a function of the quadratic difference of an LPC (Linear-Predictive-Coding) cepstrum coefficient as a second feature. A check quantity or a maximum function of a distribution function is calculated, which detects the start/end points by comparison with a threshold.
    Type: Grant
    Filed: May 12, 1997
    Date of Patent: August 11, 1998
    Assignee: Alcatel N.V.
    Inventors: Thomas Hormann, Gregor Rozinaj