Patents by Inventor Masafumi Nishimura

Masafumi Nishimura has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Methods and apparatus for natural spoken language speech recognition with word prediction

Patent number: 8000966

Abstract: A word prediction method and apparatus improves precision and accuracy. For the prediction of a sixth word “?”, a partial analysis tree having a modification relationship with the sixth word is predicted. “sara-ni sho-senkyoku no” has two partial analysis trees, “sara-ni” and “sho-senkyoku no”. It is predicted that “sara-ni” does not have a modification relationship with the sixth word, and that “sho-senkyoku no” does. Then, “donyu”, which is the sixth word from “sho-senkyoku no”, is predicted. In this example, since “sara-ni” is not useful information for the prediction of “donyu”, it is preferable that “donyu” be predicted only by “sho-senkyoku no”.

Type: Grant

Filed: March 10, 2008

Date of Patent: August 16, 2011

Assignee: Nuance Communications, Inc.

Inventors: Shinsuke Mori, Masafumi Nishimura, Nobuyasu Itoh
TARGET VOICE EXTRACTION METHOD, APPARATUS AND PROGRAM PRODUCT

Publication number: 20110131044

Abstract: An apparatus, program product and method is provided for separating a target voice from a plurality of other voices having different directions of arrival. The method comprises the steps of disposing a first and a second voice input device at a predetermined distance from one another and upon receipt of voice signals at said devices calculating discrete Fourier transforms for the signals and calculating a CSP (cross-power spectrum phase) coefficient by superpositioning multiple frequency-bin components based on correlation of the two spectra signals received and then calculating a weighted CSP coefficient from said two discrete Fourier-transformed speech signals. A target voice is separated when received by said devices from other voice signals in a spectrum by using the calculated weighted CSP coefficient.

Type: Application

Filed: November 29, 2010

Publication date: June 2, 2011

Applicant: International Business Machines Corporation

Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
System and method for supporting text-to-speech

Patent number: 7921014

Abstract: A system for generating high-quality synthesized text-to-speech includes a learning data generating unit, a frequency data generating unit, and a setting unit. The learning data generating unit recognizes inputted speech, and then generates first learning data in which wordings of phrases are associated with readings thereof. The frequency data generating unit generates, based on the first learning data, frequency data indicating appearance frequencies of both wordings and readings of phrases. The setting unit sets the thus generated frequency data for a language processing unit in order to approximate outputted speech of text-to-speech to the inputted speech. Furthermore, the language processing unit generates, from a wording of text, a reading corresponding to the wording, on the basis of the appearance frequencies.

Type: Grant

Filed: July 9, 2007

Date of Patent: April 5, 2011

Assignee: Nuance Communications, Inc.

Inventors: Gakuto Kurata, Toru Nagano, Masafumi Nishimura, Ryuki Tachibana
Signal enhancement via noise reduction for speech recognition

Patent number: 7895038

Abstract: Speech enhancement techniques for extemporaneous noise without a noise interval and unknown extemporaneous noise are provided with a method of signal enhancement including subtracting a given reference signal from an input signal containing a target signal and a noise signal by spectral subtraction; applying an adaptive filter to the reference signal; and controlling a filter coefficient of the adaptive filter in order to reduce components of the noise signal in the input signal. In signal enhancement, a database of a signal model concerning the target signal expressing a given feature by a given statistical model is provided, and the filter coefficient is controlled based on the likelihood of the signal model with respect to an output signal from the spectral subtraction means.

Type: Grant

Filed: May 26, 2008

Date of Patent: February 22, 2011

Assignee: International Business Machines Corporation

Inventors: Masafumi Nishimura, Tetsuya Takiguchi
Method for processing speech signal data with reverberation filtering

Patent number: 7856353

Abstract: Method for processing speech signal data. A speech signal is divided into frames. Each frame is characterized by a frame number T representing a unique interval of time. Each speech signal is characterized by a power spectrum with respect to frame T and frequency band ?. A speech segment and a reverberation segment of the speech signal is determined. L filter coefficients W(k) (k=1, 2, . . . , L) respectively corresponding to L frames immediately preceding frame T are computed such that the L filter coefficients minimize a function ? that is a linear combination of sum of squares of a residual speech power in the reverberation segment and a sum of squares of a subtracted speech power in the speech segment. The computed L filter coefficients are stored within storage media of the computing apparatus.

Type: Grant

Filed: August 7, 2007

Date of Patent: December 21, 2010

Assignee: Nuance Communications, Inc.

Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
Apparatus, method, and program for supporting speech interface design

Patent number: 7747443

Abstract: For design of a speech interface accepting speech control options, speech samples are stored on a computer-readable medium. A similarity calculating unit calculates a certain indication of similarity of first and second sets of ones of the speech samples, the first set of speech samples being associated with a first speech control option and the second set of speech samples being associated with a second speech control option. A display unit displays the similarity indication. In another aspect, word vectors are generated for the respective speech sample sets, indicating frequencies of occurrence of respective words in the respective speech sample sets. The similarity calculating unit calculates the similarity indication responsive to the word vectors of the respective speech sample sets. In another aspect, a perplexity indication is calculated for respective speech sample sets responsive to language models for the respective speech sample sets.

Type: Grant

Filed: July 3, 2007

Date of Patent: June 29, 2010

Assignee: Nuance Communications, Inc.

Inventors: Osamu Ichikawa, Gakuto Kurata, Masafumi Nishimura
Apparatus, method, and program for supporting speech interface design

Patent number: 7729921

Abstract: For design of a speech interface accepting speech control options, speech samples are stored on a computer-readable medium. A similarity calculating unit calculates a certain indication of similarity of first and second sets of ones of the speech samples, the first set of speech samples being associated with a first speech control option and the second set of speech samples being associated with a second speech control option. A display unit displays the similarity indication. In another aspect, word vectors are generated for the respective speech sample sets, indicating frequencies of occurrence of respective words in the respective speech sample sets. The similarity calculating unit calculates the similarity indication responsive to the word vectors of the respective speech sample sets. In another aspect, a perplexity indication is calculated for respective speech sample sets responsive to language models for the respective speech sample sets.

Type: Grant

Filed: July 31, 2008

Date of Patent: June 1, 2010

Assignee: Nuance Communications, Inc.

Inventors: Osamu Ichikawa, Gakuto Kurata, Masafumi Nishimura
STOCHASTIC PHONEME AND ACCENT GENERATION USING ACCENT CLASS

Publication number: 20100125459

Abstract: Exemplary embodiments provide for determining a sequence of words in a TTS system. An input text is analyzed using two models, a word n-gram model and an accent class n-gram model. A list of all possible words for each word in the input is generated for each model. Each word in each list for each model is given a score based on the probability that the word is the correct word in the sequence, based on the particular model. The two lists are combined and the two scores are combined for each word. A set of sequences of words are generated. Each sequence of words comprises a unique combination of an attribute and associated word for each word in the input. The combined score of each of word in the sequence of words is combined. A sequence of words having the highest score is selected and presented to a user.

Type: Application

Filed: July 1, 2009

Publication date: May 20, 2010

Applicant: Nuance Communications, Inc.

Inventors: Nobuyasu Itoh, Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
Speech recognition apparatus, speech recognition apparatus and program thereof

Patent number: 7720679

Abstract: Provided is a method for canceling background noise of a sound source other than a target direction sound source in order to realize highly accurate speech recognition, and a system using the same. In terms of directional characteristics of a microphone array, due to a capability of approximating a power distribution of each angle of each of possible various sound source directions by use of a sum of coefficient multiples of a base form angle power distribution of a target sound source measured beforehand by base form angle by using a base form sound, and power distribution of a non-directional background sound by base form, only a component of the target sound source direction is extracted at a noise suppression part. In addition, when the target sound source direction is unknown, at a sound source localization part, a distribution for minimizing the approximate residual is selected from base form angle power distributions of various sound source directions to assume a target sound source direction.

Type: Grant

Filed: September 24, 2008

Date of Patent: May 18, 2010

Assignee: Nuance Communications, Inc.

Inventors: Osamu Ichikawa, Tetsuya Takiguchi, Masafumi Nishimura
System and Method for Extracting a Specific Situation From a Conversation

Publication number: 20100114575

Abstract: A system, method, and computer readable article of manufacture for extracting a specific situation in a conversation. The system includes: an acquisition unit for acquiring speech voice data of speakers in the conversation; a specific expression detection unit for detecting the speech voice data of a specific expression from speech voice data of a specific speaker in the conversation; and a specific situation extraction unit for extracting, from the speech voice data of the speakers in the conversation, a portion of the speech voice data that forms a speech pattern that includes the speech voice data of the specific expression detected by the specific expression detection unit.

Type: Application

Filed: October 9, 2009

Publication date: May 6, 2010

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Nobuyasu Itoh, Gakuto Kurata, Masafumi Nishimura
Speech recognition system and program thereof

Patent number: 7660717

Abstract: Speech recognition is performed by matching between a characteristic quantity of an inputted speech and a composite HMM obtained by synthesizing a speech HMM (hidden Markov model) and a noise HMM for each speech frame of the inputted speech by use of the composite HMM.

Type: Grant

Filed: January 9, 2008

Date of Patent: February 9, 2010

Assignee: Nuance Communications, Inc.

Inventors: Tetsuya Takiguchi, Masafumi Nishimura
ANNOTATING PHONEMES AND ACCENTS FOR TEXT-TO-SPEECH SYSTEM

Publication number: 20100030561

Abstract: A system that outputs phonemes and accents of texts. The system has a storage section storing a first corpus in which spellings, phonemes, and accents of a text input beforehand are recorded separately for individual segmentations of the words that are contained in the text. A text for which phonemes and accents are to be output is acquired and the first corpus is searched to retrieve at least one set of spellings that match the spellings in the text from among sets of contiguous spellings. Then, the combination of a phoneme and an accent that has a higher probability of occurrence in the first corpus than a predetermined reference probability is selected as the phonemes and accent of the text.

Type: Application

Filed: August 3, 2009

Publication date: February 4, 2010

Applicant: Nuance Communications, Inc.

Inventors: Shinsuke Mori, Toru Nagano, Masafumi Nishimura
METHOD AND SYSTEM FOR POSITION DETECTION OF A SOUND SOURCE

Publication number: 20100008516

Abstract: A position detection method, system, and computer readable article of manufacture tangibly embodying computer readable instructions for executing the method for detecting the position of a sound source using at least two microphones. The method includes the steps of: emitting a reproduced sound from the sound source; observing the reproduced sound and an observed sound at the microphones; converting the reproduced sound and the observed sound into electrical signals; transforming the signals of the reproduced sound and of the observed sound into frequency spectra by a frequency spectrum transformer apparatus; calculating Crosspower Spectrum Phase (CSP) coefficients of the frequency spectra of the signals by a CSP coefficient calculator apparatus; and calculating distances between the position of the sound source and the positions of the microphones based on the calculated CSP coefficients by a distance calculating apparatus, thereby detecting the position of the sound source.

Type: Application

Filed: July 6, 2009

Publication date: January 14, 2010

Applicant: International Business Machines Corporation

Inventors: Osamu Ichikawa, Tohru Nagano, Masafumi Nishimura
SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, COMPUTER-EXECUTABLE PROGRAM FOR CAUSING COMPUTER TO EXECUTE RECOGNITION METHOD, AND STORAGE MEDIUM

Publication number: 20090306977

Abstract: A speech recognition device and method configured to include a computer, for recognizing speech, including: a storage location for storing a feature quantity acquired from a speech signal for each frame; storage portions for storing acoustic model data and language model data; a echo speech component for generating echo speech model data from a speech signal acquired prior to a speech signal to be processed at the current time point and using the echo speech model data to generate adapted acoustic model data; and a processing component for utilizing the feature quantity, the adapted acoustic model data, and the language model data to provide a speech recognition result of the speech signal.

Type: Application

Filed: June 2, 2009

Publication date: December 10, 2009

Applicant: Nuance Communications, Inc.

Inventors: Tetsuya Takiguchi, Masafumi Nishimura
Voice recording system, recording device, voice analysis device, voice recording method and program

Patent number: 7599836

Abstract: To provide a method of specifying each of speakers of individual voices, based on recorded voices made by a plurality of speakers, with a simple system configuration, and to provide a system using the method. The system includes: microphones individually provided for each of the speakers; a voice processing unit which gives a unique characteristic to each pair of two-channel voice signals recorded with each of the microphones 10, by executing different kinds of voice processing on the respective pairs of voice signals, and which mixes the voice signals for each channel; and an analysis unit which performs an analysis according to the unique characteristics, given to the voice signals concerning the respective microphones through the processing by the voice processing unit, and which specifies the speaker for each speech segment of the voice signals.

Type: Grant

Filed: May 25, 2005

Date of Patent: October 6, 2009

Assignee: Nuance Communications, Inc.

Inventors: Osamu Ichikawa, Masafumi Nishimura, Tetsuya Takiguchi
Method for processing speech signal data and finding a filter coefficient

Patent number: 7590526

Abstract: Method and computing apparatus for processing speech signal data. A speech signal is divided into frames. Each frame is characterized by a frame number T representing a unique interval of time. Each speech signal is characterized by a power spectrum with respect to frame T and frequency band ?. A speech segment and a reverberation segment of the speech signal is determined. L filter coefficients W(k) (k=1, 2, . . . , L) respectively corresponding to L frames immediately preceding frame T are computed such that the L filter coefficients minimize a function ? that is a linear combination of sum of squares of a residual speech power in the reverberation segment and a sum of squares of a subtracted speech power in the speech segment. The computed L filter coefficients are stored within storage media of the computing apparatus.

Type: Grant

Filed: August 7, 2007

Date of Patent: September 15, 2009

Assignee: Nuance Communications, Inc.

Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
SYSTEM, METHOD, AND PROGRAM PRODUCT FOR PROCESSING VOICE DATA IN A CONVERSATION BETWEEN TWO PERSONS

Publication number: 20090228268

Abstract: A system, method, and program product for processing voice data in a conversation between two persons to determine characteristic conversation patterns. The system includes: a variation calculator for calculating a variation of a speech ratio of a first speaker and a variation calculator for calculating a variation of a speech ratio of a second speaker; a difference calculator for calculating a difference data string; a smoother for generating a smoothed difference data string; and a presenter for presenting the difference between the variation of the speech ratio of the first speaker and the speech ratio of the second speaker. The method includes: calculating a variation of a speech ratio of a first speaker and a second speaker; calculating a difference data string; generating a smoothed difference data string; and grouping them according to their patterns.

Type: Application

Filed: March 6, 2009

Publication date: September 10, 2009

Inventors: Gakuto Kurata, Masafumi Nishimura
VOICE ACTIVITY DETECTION SYSTEM, METHOD, AND PROGRAM PRODUCT

Publication number: 20090222258

Abstract: A voice activity detection method in a low SNR environment. The voice activity detection is performed by extracting a long-term spectrum variation component and a harmonic structure as feature vectors from a speech signal and increasing difference in feature vectors between speech and non-speech (i) using the long-term spectrum variation component feature or (ii) using a long-term spectrum variation component extraction and a harmonic structure feature extraction. A correct rate and an accuracy rate of the voice activity detection is improved over conventional methods by using a long-term spectrum variation component having a window length over an average phoneme duration of an utterance in the speech signal. The voice activity detection system and method provides speech processing, automatic speech recognition, and speech output capable of very accurate voice activity detection.

Type: Application

Filed: February 27, 2009

Publication date: September 3, 2009

Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
SYSTEM, METHOD AND PROGRAM FOR SPEECH PROCESSING

Publication number: 20090210224

Abstract: The present invention relates to a system, method and program for speech recognition. In an embodiment of the invention a method for processing a speech signal consists of receiving a power spectrum of a speech signal and generating a log power spectrum signal of the power spectrum. The method further consists of performing discrete cosine transformation on the log power spectrum signal and cutting off cepstrum upper and lower terms of the discrete cosine transformed signal. The method further consists of performing inverse discrete cosine transformation on the signal from which the cepstrum upper and lower terms are cut off. The method further consists of converting the inverse discrete cosine transformed signal so as to bring the signal back to a power spectrum domain and filtering the power spectrum of the speech signal by using, as a filter, the signal which is brought back to the power spectrum domain.

Type: Application

Filed: August 28, 2008

Publication date: August 20, 2009

Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
Signal enhancement via noise reduction for speech recognition

Patent number: 7533015

Abstract: Provides speech enhancement techniques for extemporaneous noise without a noise interval and unknown extemporaneous noise. Signal enhancement includes: subtracting a given reference signal from an input signal containing a target signal and a noise signal by spectral subtraction; applying an adaptive filter to the reference signal; and controlling a filter coefficient of the adaptive filter in order to reduce components of the noise signal in the input signal. In signal enhancement, a database of a signal model concerning the target signal expressing a given feature by a given statistical model is provided, and the filter coefficient is controlled based on the likelihood of the signal model with respect to an output signal from the spectral subtraction means.

Type: Grant

Filed: February 28, 2005

Date of Patent: May 12, 2009

Assignee: International Business Machines Corporation

Inventors: Tetsuya Takiguchi, Masafumi Nishimura

prev 1 2 3 4 5 6 7 next