Patents Examined by Donald L. Storm

Device for speech recognition with dictionary updating

Patent number: 6732074

Abstract: A standard dictionary; a feature extracting unit which extracts features from an input speech; a matching unit which performs matching between the features of the input speech extracted by the feature extracting unit and the standard dictionary; a result outputting unit which outputs a matching result in the matching unit; and a dictionary updating portion which updates the standard dictionary are provided. The standard dictionary is built initially as a dictionary to be used for recognizing speeches produced by any independent speaker; and the dictionary updating unit updates the standard dictionary so as to provide a dictionary to be used for recognizing speeches produced by a dependent speaker based on the result of matching between the features extracted from the input speech and the standard dictionary.

Type: Grant

Filed: January 27, 2000

Date of Patent: May 4, 2004

Assignee: Ricoh Company, Ltd.

Inventor: Masaru Kuroda
Method and apparatus for retrieving a video and audio scene using an index generated by speech recognition

Patent number: 6728673

Abstract: A video retrieval data generation apparatus includes an extractor that is configured to extract a characteristic pattern from a voice signal synchronous with a video signal. The video retrieval data generation apparatus also includes an index generator that is configured to set the voice signal for a voice period as a processing target. The index generator is further configured to prepare standard voice patterns of a subword corresponding to a plurality of subwords, detect, for each subword, a characteristic pattern similar to a standard voice pattern at each of the voice periods, and generate, for each subword, an index containing time synchronization information corresponding to a position where the similar characteristic pattern is detected. The video retrieval data generation apparatus also includes a multiplexer that is configured to multiplex video signals, voice signals and indexes to output in a data stream format.

Type: Grant

Filed: May 9, 2003

Date of Patent: April 27, 2004

Assignee: Matsushita Electric Industrial Co., LTD

Inventors: Hiroshi Furuyama, Hitoshi Yashio, Ikuo Inoue, Mitsuru Endo, Masakatsu Hoshimi
Method and system for speech reconstruction from speech recognition features, pitch and voicing with resampled basis functions providing reconstruction of the spectral envelope

Patent number: 6725190

Abstract: A speech reconstruction method and system for converting a series of binned spectra or functions thereof such as the Mel Frequency Cepstra Coefficients (MFCC), of an original digitized speech signal, into a reconstructed speech signal, where each binned spectrum has a respective pitch value and voicing decision. The binned spectra are derived from the original digitized speech signal at successive instances by multiplying each estimate of the spectral envelope by a predetermined set of frequency domain window functions and computing the integrals thereof. At each respective time instance, harmonic frequencies and weights are generated according to the respective pitch value and voicing decision. Basis functions having bounded supports on the frequency axis are each sampled at all said harmonic frequencies, which are within its support and multiplied by respective harmonic weights.

Type: Grant

Filed: November 2, 1999

Date of Patent: April 20, 2004

Assignee: International Business Machines Corporation

Inventors: Dan Chazan, Gilad Cohen, Ron Hoory
Method for determining subscriber loop make-up by subtracting calculated signals

Patent number: 6724859

Abstract: A method for determining the make up of a subscriber loop via improved time-domain reflectometry techniques by analyzing the echo responses generated by transmittal of pulses onto the subscriber loop. In the method discontinuities along a loop are identified sequentially and in a step-by-step fashion by comparing the measured waveform to suitable waveforms generated on the basis of a hypothesized topology. Once the generated waveform that best matches the measured data has been found and a discontinuity identified, the waveform generated in correspondence of the loop topology identified so far is subtracted from the measured data to produce a compensated waveform, which, is more suitable for detection and location of the next echo.

Type: Grant

Filed: September 29, 2000

Date of Patent: April 20, 2004

Assignee: Telcordia Technologies, Inc.

Inventor: Stefano Galli
Apparatus and method for discriminating between voice and data by using a frequency estimate representing both a central frequency and an energy of an input signal

Patent number: 6718297

Abstract: The present invention classifies an input signal as either voice or data with reduced energy consumption. The present invention includes a frequency estimator and an energy estimator for processing an input signal and a classification unit connected to both the frequency and energy estimators for classifying the input signal. The frequency estimator includes a delay and difference integrator. In operation, the delay receives the input signal and generates a delayed input signal and the difference integrator receives the delayed and input signals and generates a frequency estimate value representing both the estimated central frequency of the input signal and the estimated energy of the input signal. The energy estimator generates an estimate of the energy level of the input signal. The classification unit classifies the input as either voice or data based on a comparison of the frequency and energy estimate values and a data threshold value.

Type: Grant

Filed: February 15, 2000

Date of Patent: April 6, 2004

Assignee: The Boeing Company

Inventors: Joseph Peebles Pride, III, Edward James Carroll, Cheryl Jean Franklin
Method for utilizing validity constraints in a speech endpoint detector

Patent number: 6718302

Abstract: A method for utilizing validity constraints in a speech endpoint detector comprises a validity manager that may utilize a pulse width module to validate utterances that include a plurality of energy pulses during a certain time period. The validity manager also may utilize a minimum power module to ensure that speech energy below a pre-determined level is not classified as a valid utterance. In addition the validity manager may use a duration module to ensure that valid utterances fall within a specified duration. Finally, the validity manager may utilize a short-utterance minimum power module to specifically distinguish an utterance of short duration from background noise based on the energy level of the short utterance.

Type: Grant

Filed: January 12, 2000

Date of Patent: April 6, 2004

Assignees: Sony Corporation, Sony Electronics Inc.

Inventors: Duanpei Wu, Miyuki Tanaka, Ruxin Chen, Lex Olorenshaw
Specifying a tree structure for speech recognizers using correlation between regression classes

Patent number: 6718305

Abstract: Disclosed is a method for use by a speech recognizer. The method includes determining a regression class tree structure for the speech recognizer, wherein the tree structure includes, representing word subunits or regression classes, as tree leaves, combining the word subunits to form tree nodes using a distance measure for the word subunits in the acoustic space, and combining regression classes to a regression class that lies closer to a tree root of the tree structure using a correlation measure, and wherein at least two of regression classes having the largest correlation parameter are combined to a new regression class that is used in the formation of the regression tree structure, instead of the two combined regression classes, to determine a regression class representing the tree root.

Type: Grant

Filed: March 17, 2000

Date of Patent: April 6, 2004

Assignee: Koninklijke Philips Electronics N.V.

Inventor: Reinhold Häb-Umbach
Media presentation system controlled by voice to text commands

Patent number: 6718308

Abstract: A system and method for searching, assembling, and manipulating a variety of multi-media using voice converted to text commands. Digital images, movies, audio, or text is verbally searched and retrieved from a variety of video and audio databases using a combination of directional commands and a means for juxtaposing and assembling search results. The desired media is then placed onto a platform means for manipulating and editing the media files. Any retrieved media files and/or images can be manipulated and assembled on-screen using commands such as “zoom” or “move left” by having corners and borders read by the grid of the platform means. The image(s) are also capable of being stacked, or overlay one another to define re-proportioned backgrounds. The image(s) from the platform means are displayed without the grid using an image platter as a means of providing a preliminary view of the presentation prior to projection.

Type: Grant

Filed: July 7, 2000

Date of Patent: April 6, 2004

Inventor: Daniel L. Nolting
User interface for translating natural language inquiries into database queries and data presentations

Patent number: 6701294

Abstract: A natural language-based interface data presentation system interfaces, for example, information visualization system interfaces, is realized by employing so-called open-ended natural language inquiries to the interface that translates them into database queries and a set of information to be provided to a user. More specifically, a natural language inquiry is translated to database queries by determining if any complete database queries can be formulated based on the natural language inquiry and, if so, specifying which complete database queries are to be made. In accordance with one aspect of the invention, knowledge of the information visualization presentation is advantageously employed in the interface to guide a user in response to the user's natural language inquiries.

Type: Grant

Filed: January 19, 2000

Date of Patent: March 2, 2004

Assignee: Lucent Technologies, Inc.

Inventors: Thomas J. Ball, Kenneth Charles Cox, Rebecca Elizabeth Grinter, Stacie Lynn Hibino, Lalita Jategaonkar Jagadeesan, David Alejandro Mantilla
System and method of mu-law or A-law compression of bark amplitudes for speech recognition

Patent number: 6694294

Abstract: A method and system that improves voice recognition by improving the voice recognizer of a voice recognition system. Mu-law compression of bark amplitudes is used to reduce the effect of additive noise and thus improve the accuracy of the voice recognition system. A-law compression of bark amplitudes is used to improve the accuracy of the voice recognizer. Both mu-law compression and mu-law expansion can be used in the voice recognizer to improve the accuracy of the voice recognizer. Both A-law compression and A-law expansion can be used in the voice recognizer to improve the accuracy of the voice recognizer.

Type: Grant

Filed: October 31, 2000

Date of Patent: February 17, 2004

Assignee: Qualcomm Incorporated

Inventor: Harinath Garudadri
Method and system for low bit rate speech coding with speech recognition features and pitch providing reconstruction of the spectral envelope

Patent number: 6678655

Abstract: A method for encoding a digitized speech signal so as to generate data capable of being decoded as speech. A digitized speech signal is first converted to a series of feature vectors using for example known Mel-frequency Cepstral coefficients (MFCC) techniques. At successive instances instance of time a respective pitch value of the digitized speech signal is computed, and successive acoustic vectors each containing the respective pitch value and feature vector are compressed so as to derive therefrom a bit stream. A suitable decoder reverses the operation so as to extract the features vectors and pitch values, thus allowing speech reproduction and playback. In addition, speech recognition is possible using the decompressed feature vectors, with no impairment of the recognition accuracy and no computational overhead.

Type: Grant

Filed: November 12, 2002

Date of Patent: January 13, 2004

Assignee: International Business Machines Corporation

Inventors: Ron Hoory, Dan Chazan, Ezra Silvera, Meir Zibulski
Method and system of audio highlighting during audio edit functions

Patent number: 6678661

Abstract: A method for highlighting a desired portion in an audio sequence for use in a visual display challenged environment. The method includes storing the audio sequence in memory. Next, the user selects a desired portion of the audio sequence and the selected portion is distinguished from the remainder of the audio sequence by automatically varying an audio characteristic of the selected portion during playback, without permanently altering the selected portion. In a related embodiment, the audio characteristic that is varied is pitch of the selected portion.

Type: Grant

Filed: February 11, 2000

Date of Patent: January 13, 2004

Assignee: International Business Machines Corporation

Inventors: Gordon James Smith, George Willard Van Leeuwen
Speech recognition system including manner discrimination

Patent number: 6671668

Abstract: A speech recognition system is trained to be sensitive not only to the actual spoken text, but also to the manner in which the text is spoken, for example, whether something is said confidently, or hesitatingly. In the preferred embodiment, this is achieved by using a Hidden Markov Model (HMM) as the recognition engine, and training the HMM to recognise different styles of input. This approach finds particular application in the telephony voice processing environment, where short caller responses need to be recognised, and the system can then react in a fashion appropriate to the tone or manner in which the caller has spoken.

Type: Grant

Filed: December 20, 2002

Date of Patent: December 30, 2003

Assignee: International Business Machines Corporation

Inventor: Robert Harris
Method for transforming HMMs for speaker-independent recognition in a noisy environment

Patent number: 6658385

Abstract: On improved transformation method uses an initial set of Hidden Markov Models (HMMs) trained on a large amount of speech recorded in a low noise environment R to provide rich information on co-articulation and speaker variation and a smaller database in a more noisy target environment T. A set H of HMMs is trained with data provided in the low noise environment R and the utterances in the noisy environment T are transcribed phonetically using set H of HMMs. The transcribed segments are grouped into a set of Classes C. For each subclass c of Classes C, the transformation &PHgr;c is found to maximize likelihood utterances in T, given H. The HMMs are transformed and steps repeated until likelihood stabilizes.

Type: Grant

Filed: February 10, 2000

Date of Patent: December 2, 2003

Assignee: Texas Instruments Incorporated

Inventors: Yifan Gong, John J. Godfrey
Decoding circuit and reproduction apparatus which mutes audio after header parameter changes

Patent number: 6631352

Abstract: A decoding circuit, for receiving a bit stream including an encoded audio signal and header information used for-decoding the encoded audio signal, and decoding the encoded audio signal based on the header information, includes a header analysis section for outputting at least one decoding parameter obtained from the header information and decoding parameter change information indicating whether or not the at least one decoding parameter has been changed; a signal processing section for decoding the encoded audio signal, based on the at least one decoding parameter, into a decoded signal and outputting the decoded signal; an automatic mute processing section for executing automatic mute on the decoded signal after the at least one decoding parameter is changed; and an output section for outputting the decoded signal output from the automatic mute processing section.

Type: Grant

Filed: January 3, 2000

Date of Patent: October 7, 2003

Assignee: Matushita Electric Industrial Co. Ltd.

Inventors: Takeshi Fujita, Takashi Katayama, Masahiro Sueyoshi, Shuji Miyasaka, Masaharu Matsumoto, Akihisa Kawamura, Kazutaka Abe, Kousuke Nishio
Apparatus and method of coding a mono signal and stereo information

Patent number: 6629078

Abstract: A method of coding a time-discrete stereo signal, the stereo signal having a first and a second channel, permits scalable stereo coding. At first, a mono signal is formed from the stereo signal, which is then coded, whereupon the coded mono signal is transmitted to a bit stream. Thereafter, the coded mono singal is decoded again, whereupon stereo information is formed on the basis of the coded/decoded mono signal and the first and second channels, with such stereo information being coded and being also written into the bit stream in order to obtain a bit stream comprising a complete coded monolayer as well as a layer with coded stereo information.

Type: Grant

Filed: December 13, 1999

Date of Patent: September 30, 2003

Assignee: Fraunhofer-Gesellschaft zur Forderung der angewandten Forschung e.V.

Inventors: Bernhard Grill, Bodo Teichmann, Karlheinz Brandenburg
Portable acoustic interface for remote access to automatic speech/speaker recognition server

Patent number: 6615171

Abstract: A portable speech signal preprocessing (SSP) device having a microphone for receiving spoken speech and background noise, a digital signal processor (DSP) for processing the received noise into feature vectors, a coupler for coupling to a communication device and for transmission over a communication channel. An automatic speech/speaker recognition (ASSR) server receives over the communication channel the preprocessed speech data and recognizes the spoken speech/speaker. A system having the portable SSP device and the ASSR server can be used to remotely activate, reset, or change PIN codes in smartcards, magnetic cards, or electronic money cards.

Type: Grant

Filed: August 13, 1999

Date of Patent: September 2, 2003

Assignee: International Business Machines Corporation

Inventors: Dimitri Kanevsky, Stephane Herman Maes, Peter S Poon, Carl Prochilo
Method and apparatus for retrieving a video and audio scene using an index generated by speech recognition

Patent number: 6611803

Abstract: A video retrieval apparatus includes a retrieval data generator that is configured to extract a characteristic pattern from a voice signal synchronous with a video signal to generate an index for video retrieval. The video retrieval apparatus also includes a retrieval processor that is configured to input a key word from a retriever and collate the key word with the index to retrieve a desired video. The retrieval data generator includes a multiplexor that is configured to multiplex video signals, voice signals and indexes to output in data stream format. The retrieval processor includes a demultiplexor that is configured to demultiplex the multiplexed data stream into the video signals, the voice signals and the indexes. A video reproduction apparatus may collate a visual pattern of the key word visual pattern data of the video signal at the time a person vocalizes a sound as the index for retrieval.

Type: Grant

Filed: August 14, 2000

Date of Patent: August 26, 2003

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Hiroshi Furuyama, Hitoshi Yashio, Ikuo Inoue, Mitsuru Endo, Masakatsu Hoshimi
Speech enhancement with gain limitations based on speech activity

Patent number: 6604071

Abstract: An apparatus and method for data processing that improves estimation of spectral parameters of speech data and reduces algorithmic delay in a data coding operation. Estimation of spectral parameters is improved by adaptively adjusting a gain function used to enhance data based on whether the data contains information speech and noise or noise only. A determination is made concerning whether the speech signal to be processed represents articulated speech or a speech pause and a gain is formed for application to the speech signal. The lowest value the gain may assume (i.e., its lower limit) is determined based on whether the speech signal is known to represent articulated speech or not. The lower limit of the gain during periods of speech activity is constrained to be lower than the lower limit of the gain during speech pause. Also, the gain that is applied to a data frame of the speech signal is adaptively limited based on limited a priori signal-to-noise (SNR) values.

Type: Grant

Filed: February 8, 2000

Date of Patent: August 5, 2003

Assignee: AT&T Corp.

Inventors: Richard Vandervoort Cox, Rainer Martin
Method and apparatus for determining speech coding parameters

Patent number: 6587817

Abstract: A method which comprises forming a first noise reduction frame (18) containing speech samples; which is windowed by a first window function. For the windowed frame, noise reduction is performed for producing a second noise reduction frame (19; 45). A speech coding frame (44) to be formed comprises noise-reduced samples of at least two successive second noise reduction frames (45, 46), partly summed with one another. On the basis of said speech coding frame (44), a set of speech coding parameters pj are determined. A lookahead part (42) of the speech coding frame is at least partly formed of a first slope (41), the first slope (10, 41) comprising a set of most recent noise-reduced samples of the second noise reduction frame, not summed with the samples of any other second noise reduction frame. The method reduces the delay caused by speech coding and noise reduction.

Type: Grant

Filed: January 7, 2000

Date of Patent: July 1, 2003

Assignee: Nokia Mobile Phones Ltd.

Inventors: Antti Vähätalo, Erkki Paajanen

prev … 3 4 5 6 7 8 9 10 11 … next