Endpoint Detection Patents (Class 704/253)
  • Patent number: 7680662
    Abstract: A speech recognition system (105) includes an acoustic front end (115) and a processing unit (125). The acoustic front end (115) receives frames of acoustic data and determines cepstral coefficients for each of the received frames. The processing unit (125) determines a number of peaks in the cepstral coefficients for each of the received frames of acoustic data and compares the peaks in the cepstral coefficients of a first one of the received frames with the peaks in the cepstral coefficients of at least a second one of the received frames. The processing unit (125) then segments the received frames of acoustic data based on the comparison.
    Type: Grant
    Filed: May 13, 2005
    Date of Patent: March 16, 2010
    Assignees: Verizon Corporate Services Group Inc., BBN Technologies Corp.
    Inventors: Chang-Qing Shu, Han Shu
  • Patent number: 7672846
    Abstract: A voice recognition system and a voice processing system in which a self-repair utterance can be inputted and recognized accurately, as in a conversation in which a human user makes a self-repair utterance. A signal processing unit converts speech voice data into a feature, a voice section detecting unit detects voice sections in the speech voice data, and a priority determining unit selects a voice section that includes a self-repair utterance from among the voice sections according to a priority criterion without using any result of recognizing a speech vocabulary sequence. Priority criteria can include a length of the voice section, signal to noise ratio, chronological order of the voice section as well as speech speed. A decoder calculates a matching score with a recognition vocabulary using the feature of the voice section and an acoustic model.
    Type: Grant
    Filed: January 4, 2006
    Date of Patent: March 2, 2010
    Assignee: Fujitsu Limited
    Inventors: Nobuyuki Washio, Shouji Harada
  • Publication number: 20100030559
    Abstract: According to a disclosed embodiment, an endpointer determines the background energy of a first portion of a speech signal, and a cepstral computing module extracts one or more features of the first portion. The endpointer calculates an average distance of the first portion based on the features. Subsequently, an energy computing module measures the energy of a second portion of the speech signal, and the cepstral computing module extracts one or more features of the second portion. Based on the features of the second portion, the endpointer calculates a distance of the second portion. Thereafter, the endpointer contrasts the energy of the second portion with the background energy of the first portion, and compares the distance of the second portion with the distance of the first portion. The second portion of the speech signal is classified by the endpointer as speech or non-speech based on the contrast and the comparison.
    Type: Application
    Filed: June 25, 2009
    Publication date: February 4, 2010
    Applicant: MINDSPEED TECHNOLOGIES, INC.
    Inventors: Sahar E. Bou-Ghazale, Ayman O. Asadi, Khaled Assaleh
  • Patent number: 7653541
    Abstract: A speech processing device and a speech processing method, a storage medium, and a program decreases deletion errors and increases a speech recognition rate. A network of words and syllables is generated, and the network has two kinds of paths: paths that do not contain a particular syllable and paths that contain the syllable at a position corresponding to a boundary between words. Thus, an optimal sub-word sequence on the network is selected for an input utterance.
    Type: Grant
    Filed: November 12, 2003
    Date of Patent: January 26, 2010
    Assignee: Sony Corporation
    Inventor: Hiroaki Ogawa
  • Patent number: 7587320
    Abstract: Systems and methods for automatically segmenting speech inventories. A set of Hidden Markov Models (HMMs) are initialized using bootstrap data. The HMMs are next re-estimated and aligned to produce phone labels. The phone boundaries of the phone labels are then corrected using spectral boundary correction. Optionally, this process of using the spectral-boundary-corrected phone labels as input instead of the bootstrap data is performed iteratively in order to further reduce mismatches between manual labels and phone labels assigned by the HMM approach.
    Type: Grant
    Filed: August 1, 2007
    Date of Patent: September 8, 2009
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair D. Conkie, Yeon-Jun Kim
  • Patent number: 7529670
    Abstract: A speech recognition system is provided that, in one embodiment, includes an input 104 operable to receive voice utterances from a user, a speaker monitoring agent 132 operable to determine a pulmonary state of the user as a function of time, and a frame analyzer 120 operable to (i) determine a respective pulmonary state of the user at an approximate time when each of the voice utterances was made and (ii) process each of the voice utterance in a manner dependent upon the respective pulmonary state.
    Type: Grant
    Filed: May 16, 2005
    Date of Patent: May 5, 2009
    Assignee: Avaya Inc.
    Inventor: Paul Roller Michaelis
  • Patent number: 7523034
    Abstract: Methods and arrangements for enhancing speech recognition in noisy environments, via providing at least one initial Compound Gaussian Mixture model, applying an adaptation algorithm to at least one item associated with speech enrollment data and to the at least one initial Compound Gaussian Mixture model to yield an intermediate output, and mathematically combining the at least one initial Compound Gaussian Mixture model with the intermediate output to yield an adapted Compound Gaussian Mixture model.
    Type: Grant
    Filed: December 13, 2002
    Date of Patent: April 21, 2009
    Assignee: International Business Machines Corporation
    Inventors: Sabine V. Deligne, Satyanarayana Dharanipragada
  • Patent number: 7507894
    Abstract: A processing load at the time of playing back sound data having a loop part is reduced. A sound data encoding apparatus comprises a block dividing means that divides the sound data into blocks according to predetermined rules, and an encoding means that encodes the blocks in groups of a plurality of consecutive blocks. The block dividing means divides the sound data such that, when encoded blocks encoded by the encoding means are decoded to output decoded blocks, then the loop end position in a block that includes the loop end position in the decoded blocks is nearer to an end of the block than a predetermined position. In detail, input delay dummy data are added ahead of the sound data before dividing the sound data. At the time of outputting the encoded data, loop information is outputted also.
    Type: Grant
    Filed: November 21, 2005
    Date of Patent: March 24, 2009
    Assignee: Sony Computer Entertainment Inc.
    Inventors: Hiromitsu Matsuura, Takehiro Tominaga
  • Publication number: 20090063150
    Abstract: Sentence boundaries in noisy conversational transcription data are automatically identified. Noise and transcription symbols are removed, and a training set is formed with sentence boundaries marked based on long silences or on manual markings in the transcribed data. Frequencies of head and tail n-grams that occur at the beginning and ending of sentences are determined from the training set. N-grams that occur a significant number of times in the middle of sentences in relation to their occurrences at the beginning or ending of sentences are filtered out. A boundary is marked before every head n-gram and after every tail n-gram occurring in the conversational data and remaining after filtering. Turns are identified. A boundary is marked after each turn, unless the turn ends with an impermissible tail word or is an incomplete turn. The marked boundaries in the conversational data identify sentence boundaries.
    Type: Application
    Filed: August 27, 2007
    Publication date: March 5, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Tetsuya Nasukawa, Diwakar Punjani, Shourya Roy, L. Venkata Subramaniam, Hironori Takeuchi
  • Patent number: 7493257
    Abstract: To handle portions of a recognized sentence having an error, a user is questioned about contents associated with portions. According to a user's answer, a result is obtained. Speech recognition unit extracts a speech feature of a speech signal inputted from user and finds a phoneme nearest to the speech feature to recognize a word. Recognition error determination unit finds a sentence confidence based on a confidence of the recognized word, performs examination of a semantic structure of a recognized sentence, and determines whether or not an error exists in the recognized sentence which is subjected to speech recognition according to predetermined criterion based on both sentence confidence and result of examining semantic structure. Meta-dialogue generation unit generates a question asking user for additional information based on content of a portion where the error exists and a type of the error.
    Type: Grant
    Filed: August 5, 2004
    Date of Patent: February 17, 2009
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jung-eun Kim, Jae-won Lee
  • Publication number: 20090037176
    Abstract: A wordspotting system is applied to a speech source in a preliminary processing phase. The putative hits corresponding to queries (e.g., keywords, key phrases, or more complex queries that may include Boolean expressions and proximity operators) are used to control a speech recognizer. The control can include one or more of application of a time specification that is determined from the putative hits for selecting an interval of the speech source to which to apply the speech recognizer; application of a grammar specification determined from the putative hits that is used by the speech recognizer, and application of a specification of a lattice or pruning specification that is used by the recognizer to limit or guide the recognizer in recognition of the speech source.
    Type: Application
    Filed: August 1, 2008
    Publication date: February 5, 2009
    Applicant: Nexidia Inc.
    Inventor: Jon A. Arrowood
  • Patent number: 7478046
    Abstract: To provide a speech recognition apparatus which enables the reduction of transmission time and of costs. A terminal-side apparatus (100) includes a speech detection portion (101) for detecting a speech interval of inputted data, a waveform compression portion (102) for compressing waveform data at the detected speech interval, and a waveform transmission portion (103) for producing the compressed waveform data. A server-side apparatus (200) includes a waveform reception portion (201) for receiving the waveform data transmitted from the terminal-side apparatus, a waveform decompression portion (202) for decompressing the received waveform data, an analyzing portion (203) for analyzing the decompressed waveform data, and a recognizing portion (204) for performing recognition processing to produce a recognition result.
    Type: Grant
    Filed: June 20, 2002
    Date of Patent: January 13, 2009
    Assignee: NEC Corporation
    Inventors: Eiko Yamada, Hiroshi Hagane, Kazunaga Yoshida
  • Publication number: 20080262843
    Abstract: An apparatus and method for recognizing paraphrases of uttered phrases, such as place names. At least one keyword contained in a speech utterance is recognized. Then, the keyword(s) contained in the speech utterance are re-recognized using a phrase including the keyword(s). Based on both recognition results, it is determined whether a paraphrase could have been uttered. If a paraphrase could have been uttered, a phrase corresponding to the paraphrase is determined as a result of speech recognition of the speech utterance.
    Type: Application
    Filed: October 22, 2007
    Publication date: October 23, 2008
    Applicant: NISSAN MOTOR CO., LTD.
    Inventors: Keiko Katsuragawa, Minoru Tomikashi, Takeshi Ono, Daisuke Saitoh, Eiji Tozuka
  • Patent number: 7433819
    Abstract: A computer based method and related system, computer program product and device includes receiving audio input associated with a user reading a sequence of words and determining an approximate amount of time corresponding to an absence of input associated with an assessed word, after receiving audio input associated with a preceding word in the sequence of words, or since the start of the audio buffer or file. The method can also include displaying a visual indication or generating an audio intervention based on the determined amount of time corresponding to an absence of input.
    Type: Grant
    Filed: September 10, 2004
    Date of Patent: October 7, 2008
    Assignee: Scientific Learning Corporation
    Inventors: Marilyn Jager Adams, Valerie L. Beattie
  • Publication number: 20080177542
    Abstract: A voice recognition program according to claim 1 makes a computer execute a start determining word recognizing step for checking whether a start determining word for a sentence recognition is inputted by voice or not, an end determining word recognizing step for checking whether an end determining word for the sentence recognition is inputted by voice or not after the voice input of the start determining word, and a sentence recognizing step for recognizing by voice an intermediate sentence between the start determining word and the end determining word when it is judged that the start determining word and the end determining word are inputted in the start determining word recognizing step and the end determining word recognizing step.
    Type: Application
    Filed: March 11, 2005
    Publication date: July 24, 2008
    Applicant: GIFU SERVICE CORPORATION
    Inventor: Hideo Yamamoto
  • Publication number: 20080177543
    Abstract: Training wording data indicating the wording of each of the words in training text, training speech data indicating characteristics of speech of each of the words, and training boundary data indicating whether each word in training speech is a boundary of a prosodic phrase are stored. After inputting candidates for boundary data, a first likelihood that each of the a boundary of a prosodic phrase of the words in the inputted text would agree with one of the inputted boundary data candidates is calculated and a second likelihood is calculated. Thereafter, one boundary data candidate maximizing a product of the first and second likelihoods is searched out from among the inputted boundary data candidates, and then a result of the searching is outputted.
    Type: Application
    Filed: November 27, 2007
    Publication date: July 24, 2008
    Applicant: International Business Machines Corporation
    Inventors: Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana, Gakuto Kurata
  • Patent number: 7366667
    Abstract: Method and device for the recognition of words and pauses in a voice signal. The words (Wi) spoken in a row and pauses (Ti) are thereby combined as to be appertaining to a word group as soon as one of the pauses (Ti) exceeds a limit value (TG). Stored references (Rj) are allocated to the voice signal of the word group, and an indication of the result of the allocation is effected after the limit value (TG) has been exceeded. To this end, parameters corresponding to the moments of the transitions between ranges with voice and non-voice are determined from the voice signal, and the limit value (TG) is then changed in dependence on said parameters.
    Type: Grant
    Filed: December 21, 2001
    Date of Patent: April 29, 2008
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventor: Stefan Dobler
  • Patent number: 7319960
    Abstract: A speech recognition system uses a phoneme counter to determine the length of a word to be recognized. The result is used to split a lexicon into one or more sub-lexicons containing only words which have the same or similar length to that of the word to be recognized, so restricting the search space significantly. In another aspect, a phoneme counter is used to estimate the number of phonemes in a word so that a transition bias can be calculated. This bias is applied to the transition probabilities between phoneme models in an HNN based recognizer to improve recognition performance for relatively short or long words.
    Type: Grant
    Filed: December 19, 2001
    Date of Patent: January 15, 2008
    Assignee: Nokia Corporation
    Inventors: Soren Riis, Konstantinos Koumpis
  • Patent number: 7319956
    Abstract: A speech reference enrollment method involves requesting a user speak a word; detecting a first utterance; requesting the user speak the word; detecting a second utterance; determining a first similarity between the first utterance and the second utterance; when the first similarity is less than a predetermined similarity, requesting the user speak the word; detecting a third utterance; determining a second similarity between the first utterance and the third utterance; and when the second similarity is greater than or equal to the predetermined similarity, creating a reference.
    Type: Grant
    Filed: March 23, 2001
    Date of Patent: January 15, 2008
    Assignee: SBC Properties, L.P.
    Inventor: Robert Wesley Bossemeyer, Jr.
  • Patent number: 7313526
    Abstract: The present invention relates to speech recognition using selectable recognition modes. This includes innovations such as: large vocabulary speech recognition programming that supplies recognized words to external program as they are recognized, and allows a user to select between large vocabulary recognition of an utterance with and without language context from the prior utterance independently of state of the external program; allowing a user to select between continuous and discrete speech recognition that use substantially the same vocabulary; allowing a user to select between continuous and discrete large-vocabulary speech recognition modes; allowing a user to select between at least two different alphabetic entry speech recognition modes; and allowing a user to select from among four or more of the following recognitions modes when creating text: a large-vocabulary mode, an alphabetic entry mode, a number entry mode, and a punctuation entry mode.
    Type: Grant
    Filed: September 24, 2004
    Date of Patent: December 25, 2007
    Assignee: Voice Signal Technologies, Inc.
    Inventors: Daniel L. Roth, Jordan R. Cohen, David F. Johnston, Manfred G. Grabherr
  • Patent number: 7295752
    Abstract: One aspect of the invention is directed to a system and method for video cataloging. The video is cataloged according to predefined or user definable metadata. The metadata is used to index and then retrieve encoded video.
    Type: Grant
    Filed: February 5, 2002
    Date of Patent: November 13, 2007
    Assignee: Virage, Inc.
    Inventors: Ramesh Jain, Charles Fuller, Mojgan Monika Gorkani, Bradley Horowitz, Richard D. Humphrey, Michael J. Portuesi, Chiao-fe Shu, Arun Hampapur, Amarnath Gupta, Jeffrey Bach
  • Patent number: 7289961
    Abstract: Data are embedded in an audio signal for watermarking, steganography, or other purposes. The audio signal is divided into time frames. In each time frame, the relative phases of one or more frequency bands are shifted to represent the data to be embedded. In one embodiment, two frequency bands are selected according to a pseudo-random sequence, and their relative phase is shifted. In another embodiment, the phases of one or more overtones relative to the fundamental tone are quantized.
    Type: Grant
    Filed: June 18, 2004
    Date of Patent: October 30, 2007
    Assignee: University of Rochester
    Inventors: Mark F. Bocko, Zeljko Ignjatovic
  • Patent number: 7277853
    Abstract: According to a disclosed embodiment, an endpointer determines the background energy of a first portion of a speech signal, and a cepstral computing module extracts one or more features of the first portion. The endpointer calculates an average distance of the first portion based on the features. Subsequently, an energy computing module measures the energy of a second portion of the speech signal, and the cepstral computing module extracts one or more features of the second portion. Based on the features of the second portion, the endpointer calculates a distance of the second portion. Thereafter, the endpointer contrasts the energy of the second portion with the background energy of the first portion, and compares the distance of the second portion with the distance of the first portion. The second portion of the speech signal is classified by the endpointer as speech or non-speech based on the contrast and the comparison.
    Type: Grant
    Filed: September 5, 2001
    Date of Patent: October 2, 2007
    Assignee: Mindspeed Technologies, Inc.
    Inventors: Sahar E. Bou-Ghazale, Ayman O. Asadi, Khaled Assaleh
  • Patent number: 7266497
    Abstract: Systems and methods for automatically segmenting speech inventories. A set of Hidden Markov Models (HMMs) are initialized using bootstrap data. The HMMs are next re-estimated and aligned to produce phone labels. The phone boundaries of the phone labels are then corrected using spectral boundary correction. Optionally, this process of using the spectral-boundary-corrected phone labels as input instead of the bootstrap data is performed iteratively in order to further reduce mismatches between manual labels and phone labels assigned by the HMM approach.
    Type: Grant
    Filed: January 14, 2003
    Date of Patent: September 4, 2007
    Assignee: AT&T Corp.
    Inventors: Alistair D. Conkie, Yeon-Jun Kim
  • Publication number: 20070192103
    Abstract: The invention provides a conversational speech analyzer which analyzes whether utterances in a meeting are of interest or concern. Frames are calculated using sound signals obtained from a microphone and a sensor, sensor signals are cut out for each frame, and by calculating the correlation between sensor signals for each frame, an interest level which represents the concern of an audience regarding utterances is calculated, and the meeting is analyzed.
    Type: Application
    Filed: February 14, 2007
    Publication date: August 16, 2007
    Inventors: Nobuo Sato, Yasunari Obuchi
  • Patent number: 7240002
    Abstract: The present invention provides a speech recognition apparatus having high speech recognition performance and capable of performing speech recognition in a highly efficient manner. A matching unit 14 calculates the scores of words selected by a preliminary word selector 13 and determines a candidate for a speech recognition result on the basis of the calculated scores. A control unit 11 produces word connection relationships among words included in a word series employed as a candidate for the speech recognition result and stores them into a word connection information storage unit 16. A reevaluation unit 15 corrects the word connection relationships one by one. On the basis of the corrected word connection relationships, the control unit 11 determines the speech recognition result. A word connection managing unit 21 limits times allowed for a boundary between words represented by the word connection relationships to be located thereat.
    Type: Grant
    Filed: November 7, 2001
    Date of Patent: July 3, 2007
    Assignee: Sony Corporation
    Inventors: Katsuki Minamino, Yasuharu Asano, Hiroaki Ogawa, Helmut Lucke
  • Patent number: 7231346
    Abstract: A speech section detection apparatus capable of reliably detecting a speech section even for a word containing a glottal stop sound or for a word containing a succession of “s” column sounds or “h” column sounds (sounds in the third column or the sixth column in the Japanese Goju-on Zu syllabary table). A speech signal detected by a microphone is amplified by a line amplifier, and converted by an analog/digital converter into a digital signal which is then stored in a memory. The stored speech signal is fetched into a pitch detector where a speech pitch is extracted by processing the speech signal in time domain. A gate signal generator controls the gate signal based on the speech pitch, and a speech section signal generator controls a speech section signal based on the gate signal. A word can be extracted by segmenting the speech signal stored in the memory in accordance with the speech section signal.
    Type: Grant
    Filed: March 26, 2003
    Date of Patent: June 12, 2007
    Assignee: Fujitsu Ten Limited
    Inventors: Toshitaka Yamato, Hideki Kitao, Shinichi Iwamoto, Osamu Iwata, Masataka Nakamura, Yoshinao Oomoto
  • Patent number: 7225130
    Abstract: The present invention relates to: speech recognition using selectable recognition modes; using choice lists in large-vocabulary speech recognition; enabling users to select word transformations; speech recognition that automatically turns recognition off in one or more specified ways; phone key control of large-vocabulary speech recognition; speech recognition using phone key alphabetic filtering and spelling: speech recognition that enables a user to perform re-utterance recognition; the combination of speech recognition and text-to-speech (TTS) generation; the combination of speech recognition with handwriting and/or character recognition; and the combination of large-vocabulary speech recognition with audio recording and playback.
    Type: Grant
    Filed: September 6, 2002
    Date of Patent: May 29, 2007
    Assignee: Voice Signal Technologies, Inc.
    Inventors: Daniel L. Roth, Jordan R. Cohen, David F. Johnston, Manfred G. Grabherr
  • Patent number: 7194409
    Abstract: A method and system for allowing a user to interface to an interactive voice response system via natural language commands. The system plays a prompt that initiates user interaction. In certain embodiments, the system detects initial user speech, wherein the initial user speech begins during the prompt or during a silence after the prompt. Then, the system determines whether the user speech restarts (second user speech) within a predetermined time period, wherein the predetermined time period is dependent upon whether the initial user speech began during the prompt or during the silence. If the user speech does restart, then the system uses the second user speech for recognition purposes. If the user speech does not restart, then the system uses the initial user speech for recognition purposes.
    Type: Grant
    Filed: November 30, 2001
    Date of Patent: March 20, 2007
    Inventors: Bruce Balentine, Rex Stringham, Ralph Melaragno, Justin Munroe
  • Patent number: 7177810
    Abstract: A method and apparatus for finding endpoints in speech by utilizing information contained in speech prosody. Prosody denotes the way speakers modulate the timing, pitch and loudness of phones, words, and phrases to convey certain aspects of meaning; informally, prosody includes what is perceived as the “rhythm” and “melody” of speech. Because speakers use prosody to convey units of speech to listeners, the method and apparatus performs endpoint detection by extracting and interpreting the relevant prosodic properties of speech.
    Type: Grant
    Filed: April 10, 2001
    Date of Patent: February 13, 2007
    Assignee: SRI International
    Inventors: Elizabeth Shriberg, Harry Bratt, Mustafa K. Sonmez
  • Patent number: 7146318
    Abstract: A method for detecting pauses in speech signals is disclosed in which the frequency spectrum is divided into two or more sub-bands. Samples of the signals on the sub-bands are stored at intervals, the energy levels of the sub-bands are determined on the basis of the stored samples, a power threshold value (thr) is determined, and the energy levels of the sub-bands are compared with said power threshold value (thr) . A subband minimum is set and a detection time limit is set so that, in a noise situation, a speech pause can be verified by checking to determine if each pause detected remains for the duration of the detection time limit and if a pause is detected in at least said minimum subbands.
    Type: Grant
    Filed: May 6, 2004
    Date of Patent: December 5, 2006
    Assignee: Nokia Corporation
    Inventors: Kari Laurila, Juha Häkkinen, Ramalingam Hariharan
  • Patent number: 6996525
    Abstract: A method for selecting a speech recognizer from a number of speech recognizers in a speech recognition system. The speech recognition system receives an audio stream from an application and derives enabling information. The speech recognition system then enables at least some of the speech recognizers and receives their results. It derives selection information and uses it to select the best speech recognizer and its results and returns those results back to the application.
    Type: Grant
    Filed: June 15, 2001
    Date of Patent: February 7, 2006
    Assignee: Intel Corporation
    Inventors: Steven M. Bennett, Andrew V. Anderson
  • Patent number: 6993481
    Abstract: According to the invention, a method for detecting speech activity for a signal is disclosed. In one step, a plurality of features is extracted from the signal. An active speech probability density function (PDF) of the plurality of features is modeled, and an inactive speech PDF of the plurality of features is modeled. The active and inactive speech PDFs are adapted to respond to changes in the signal over time. The signal is probability-based classifyied based, at least in part, on the plurality of features. Speech in the signal is distinguished based, at least in part, upon the probability-based classification.
    Type: Grant
    Filed: December 4, 2001
    Date of Patent: January 31, 2006
    Assignee: Global IP Sound AB
    Inventors: Jan K. Skoglund, Jan T. Linden
  • Patent number: 6947891
    Abstract: A speech recognition system that is insensitive to external noise and applicable to actual life includes an A/D converter that converts analog voice signals to digital signals. An FIR filtering section employs powers-of-two conversion to filter the digital signals converted at the A/D converter into numbers of channels. A characteristic extraction section immediately extracts speech characteristics having strong noise-resistance from the output signals of the FIR filtering section without using additional memories. A word boundary detection section discriminates the information of the start-point and the end-point of a voice signal on the basis of the characteristics extracted by the characteristic extraction section.
    Type: Grant
    Filed: January 22, 2001
    Date of Patent: September 20, 2005
    Assignee: Korea Advanced Institute of Science & Technology
    Inventors: Soo Young Lee, Chang Min Kim
  • Patent number: 6947892
    Abstract: A method and arrangement for speech recognition wherein a volume distance is determined between recognized words and the pauses lying between them. When the volume distance of a word is lower than a predetermined threshold, the word is evaluated as being incorrectly recognized, such that errors caused by unwanted noises are avoided.
    Type: Grant
    Filed: August 18, 2000
    Date of Patent: September 20, 2005
    Assignee: Siemens Aktiengesellschaft
    Inventors: Josef Bauer, Jochen Junkawitsch
  • Patent number: 6934685
    Abstract: The present invention provides a speech recognition device for toys comprising a storage means for measuring the length in time of a combination of two or more continuous words or expressions and the length in time of a pause or pauses between the words or expressions and then storing a measured value in advance, a control means for measuring the length in time of a word or expression spoken by a speaker, comparing a measured value with the measured value stored in the storage means, and recognizing the word or expression of the speaker in the event that the result of the comparison falls within a predetermined tolerance and an output means for outputting the result of the recognition so carried out.
    Type: Grant
    Filed: April 21, 2000
    Date of Patent: August 23, 2005
    Assignee: Toytec Corporation
    Inventor: Takashi Ichikawa
  • Patent number: 6928407
    Abstract: A system and associated method automatically discover salient segments in a speech transcript and focus on the segmentation of an audio/video source into topically cohesive segments based on Automatic Speech Recognition (ASR) transcriptions. The word n-grams are extracted from the speech transcript using a three-phase segmentation algorithm based on the following sequence or combination of boundary-based and content-based methods: a boundary-based method; a rate of arrival of feature method; and a content-based method. In the first two segmentation passes, the temporal proximity and the rate of arrival of features are analyzed to compute an initial segmentation. In the third segmentation pass, changes in the set of content-bearing words used by adjacent segments are detected, to validate the initial segments for merging them, to prevent over-segmentation.
    Type: Grant
    Filed: March 29, 2002
    Date of Patent: August 9, 2005
    Assignee: International Business Machines Corporation
    Inventors: Dulce Beatriz Ponceleon, Savitha Srinivasan
  • Patent number: 6928404
    Abstract: Systems and methods are provided for generating a language component vocabulary VC for a speech recognition system having a language vocabulary V of a plurality of word forms. One method for generating a language component vocabulary VC for a speech recognition system having a language vocabulary V of a plurality of word forms includes partitioning the language vocabulary V into subsets of word forms based on frequencies of occurrence of the respective word forms, in at least one the subsets, splitting word forms having frequencies less than a threshold to thereby generate word form components and generating a language component vocabulary VC including word forms and word form components. The resulting language component vocabulary, which includes word forms and word components, is used to generate a language model that can be efficiently implemented for real-time automatic speech recognition applications for languages with large vocabularies.
    Type: Grant
    Filed: March 17, 1999
    Date of Patent: August 9, 2005
    Assignee: International Business Machines Corporation
    Inventors: Ponani Gopalakrishnan, Dimitri Kanevsky, Michael Daniel Monkowski, Jan Sedivy
  • Patent number: 6892190
    Abstract: Disclosed herein is a machine translation system which can automatically switch from one or more dictionaries to more appropriate dictionaries for translating a first language to a second language. As a dictionary constitution, a base dictionary and domain dictionaries can be provided. The domain dictionary can be divided into a compound word dictionary that includes triggers for switching dictionaries and a compound word dictionary that does not include triggers for switching the dictionaries. When a compound word included in the compound word dictionary that includes triggers for switching the dictionaries is detected during source text analysis, a priority of the concerned domain dictionary can be set higher than that of the base dictionary. Moreover, the domain dictionary can be subdivided into a main domain dictionary and a sub-domain dictionary.
    Type: Grant
    Filed: July 19, 2001
    Date of Patent: May 10, 2005
    Assignee: International Business Machines Corporation
    Inventors: Hiromi Hatori, Tomohiro Miyahira
  • Patent number: 6889187
    Abstract: A method and apparatus for detecting and transmitting voice signals in a packet voice network system. The method and apparatus make use of a voice activity detection (VAD) unit at a transmitter, for determining if an input signal contains active audio information or passive audio information, where the input signal includes a plurality of frames. For one or more frames of the input signal containing active audio information, the VAD computes a hangover time period. This computation includes determining whether the hangover time period has a fixed duration or a variable duration on the basis of characteristics of the active audio information contained in the one or more frames. When the VAD detects a frame containing passive audio information subsequent to the one or more frames containing active audio information, the input signal is suppressed after the expiry of the computed hangover time period from the detection of the passive audio information.
    Type: Grant
    Filed: December 26, 2001
    Date of Patent: May 3, 2005
    Assignee: Nortel Networks Limited
    Inventor: Shude Zhang
  • Patent number: 6873953
    Abstract: A method and apparatus are provided for performing prosody based endpoint detection of speech in a speech recognition system. Input speech represents an utterance, which has an intonation pattern. An end-of-utterance condition is identified based on prosodic parameters of the utterance, such as the intonation pattern and the duration of the final syllable of the utterance, as well as non-prosodic parameters, such as the log energy of the speech.
    Type: Grant
    Filed: May 22, 2000
    Date of Patent: March 29, 2005
    Assignee: Nuance Communications
    Inventor: Matthew Lennig
  • Patent number: 6862713
    Abstract: A method for presenting to an end-user the intermediate matching search results of a keyword search in an index list of information. The method comprising the steps of: coupling to a search engine a graphical user interface for accepting keyword search terms for searching the indexed list of information with the search engine; receiving one or more keyword search terms with one or more separation characters separating there between; performing a keyword search with the one or more keyword search terms received when a separation character is received; and presenting the number of documents matching the keyword search terms to the end-user, and presenting a graphical menu item on a display. In accordance with another embodiment of the present invention, an information processing system and computer readable storage medium carries out the above method.
    Type: Grant
    Filed: August 31, 1999
    Date of Patent: March 1, 2005
    Assignee: International Business Machines Corporation
    Inventors: Reiner Kraft, W. Scott Spangler
  • Publication number: 20040176953
    Abstract: There is disclosed an interactive voice response system for prompting a user with feedback during speech recognition. A user who speaks too slowly or too quickly may speak even more slowly or quickly in response to an error in speech recognition. The present system aims to give the user specific feedback on the speed of speaking. The method can include: acquiring an utterance from a user; recognising a string of words from the utterance; acquiring for each word the ratio of actual duration of delivery to ideal duration; calculating an average ratio for all the words wherein the average ratio is an indication of the speed of the delivery of the utterance; and prompting the user as to the speed of delivery of the utterance according to the average ratio.
    Type: Application
    Filed: September 12, 2003
    Publication date: September 9, 2004
    Applicant: International Business Machines Corporation
    Inventors: Wendy-Ann Coyle, Stephen James Haskey
  • Patent number: 6782363
    Abstract: A method and apparatus for performing real-time endpoint detection for use in automatic speech recognition. A filter is applied to the input speech signal and the filter output is then evaluated with use of a state transition diagram (i.e., a finite state machine). The filter is advantageously designed in light of several criteria in order to increase the accuracy and robustness of detection. The state transition diagram advantageously has three states. The endpoints which are detected may then be advantageously applied to the problem of energy normalization of the speech portion of the signal.
    Type: Grant
    Filed: May 4, 2001
    Date of Patent: August 24, 2004
    Assignee: Lucent Technologies Inc.
    Inventors: Chin-Hui Lee, Qi P. Li, Jinsong Zheng, Qiru Zhou
  • Patent number: 6741962
    Abstract: A speech recognition system for recognizing an input voice of a narrow frequency band. The speech recognition system includes: a frequency band converting unit for converting the input voice of the narrow frequency band into a pseudo voice of a wide frequency band which covers an entirety of the narrow frequency band and which is wider than the narrow frequency band.
    Type: Grant
    Filed: March 7, 2002
    Date of Patent: May 25, 2004
    Assignee: NEC Corporation
    Inventor: Kenichi Iso
  • Patent number: 6718302
    Abstract: A method for utilizing validity constraints in a speech endpoint detector comprises a validity manager that may utilize a pulse width module to validate utterances that include a plurality of energy pulses during a certain time period. The validity manager also may utilize a minimum power module to ensure that speech energy below a pre-determined level is not classified as a valid utterance. In addition the validity manager may use a duration module to ensure that valid utterances fall within a specified duration. Finally, the validity manager may utilize a short-utterance minimum power module to specifically distinguish an utterance of short duration from background noise based on the energy level of the short utterance.
    Type: Grant
    Filed: January 12, 2000
    Date of Patent: April 6, 2004
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Duanpei Wu, Miyuki Tanaka, Ruxin Chen, Lex Olorenshaw
  • Patent number: 6711536
    Abstract: An apparatus is provided for detecting the presence of speech within an input speech signal. Speech is detected by treating the average frame energy of an input speech signal as a sampled signal and looking for modulations within the sampled signal that are characteristic of speech.
    Type: Grant
    Filed: September 30, 1999
    Date of Patent: March 23, 2004
    Assignee: Canon Kabushiki Kaisha
    Inventor: David Llewellyn Rees
  • Patent number: 6629073
    Abstract: A speech recognition method and system utilize an acoustic model that is capable of providing probabilities for both a large acoustic unit and an acoustic sub-unit. Each of these probabilities describes the likelihood of a set of feature vectors from a series of feature vectors representing a speech signal. The large acoustic unit is formed from a plurality of acoustic sub-units. At least one sub-unit probability and at least on large unit probability from the acoustic model are used by a decoder to generate a score for a sequence of hypothesized words. When combined, the acoustic sub-units associated with all of the sub-unit probabilities used to determine the score span fewer than all of the feature vectors in the series of feature vectors. An overlapping decoding technique is also provided.
    Type: Grant
    Filed: April 27, 2000
    Date of Patent: September 30, 2003
    Assignee: Microsoft Corporation
    Inventors: Hsiao-Wuen Hon, Kuansan Wang
  • Patent number: 6606599
    Abstract: According to the present invention, a method for integrating processes with a multi-faceted human centered interface is provided. The interface is facilitated to implement a hands free, voice driven environment to control processes and applications. A natural language model is used to parse voice initiated commands and data, and to route those voice initiated inputs to the required applications or processes. The use of an intelligent context based parser allows the system to intelligently determine what processes are required to complete a task which is initiated using natural language. A single window environment provides an interface which is comfortable to the user by preventing the occurrence of distracting windows from appearing. The single window has a plurality of facets which allow distinct viewing areas. Each facet has an independent process routing its outputs thereto. As other processes are activated, each facet can reshape itself to bring a new process into one of the viewing areas.
    Type: Grant
    Filed: March 12, 2001
    Date of Patent: August 12, 2003
    Assignee: Interactive Speech Technologies, LLC
    Inventors: Richard Grant, Peter McGregor
  • Patent number: RE38649
    Abstract: Speech recognition technology has attained maturity such that the most likely speech recognition result has been reached and is available before an energy based termination of speech has been made. The present invention innovatively uses the rapidly available speech recognition results to provide intelligent barge-in for voice-response systems, to count words to output sub-sequences to provide paralleling and/or pipelining of tasks related to the entire word sequence, and to count words to provide rapid, speech recognition based termination of speech processing and outputting of the recognized word sequence.
    Type: Grant
    Filed: July 13, 2001
    Date of Patent: November 9, 2004
    Assignee: Lucent Technologies Inc.
    Inventors: Anand Rangaswamy Setlur, Rafid Antoon Sukkar