Patents by Inventor Masafumi Nishimura

Masafumi Nishimura has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20090076815
    Abstract: Provided is a method for canceling background noise of a sound source other than a target direction sound source in order to realize highly accurate speech recognition, and a system using the same. In terms of directional characteristics of a microphone array, due to a capability of approximating a power distribution of each angle of each of possible various sound source directions by use of a sum of coefficient multiples of a base form angle power distribution of a target sound source measured beforehand by base form angle by using a base form sound, and power distribution of a non-directional background sound by base form, only a component of the target sound source direction is extracted at a noise suppression part. In addition, when the target sound source direction is unknown, at a sound source localization part, a distribution for minimizing the approximate residual is selected from base form angle power distributions of various sound source directions to assume a target sound source direction.
    Type: Application
    Filed: September 24, 2008
    Publication date: March 19, 2009
    Applicant: International Business Machines Corporation
    Inventors: Osamu Ichikawa, Ketsuya Takiguchi, Masafumi Nishimura
  • Publication number: 20090070115
    Abstract: It is an objective of the present invention to provide waveform concatenation speech synthesis with high sound quality utilizing its advantages in the case where there is a large quantity of speech segments while providing waveform concatenation speech synthesis with accurate accents in other cases. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. In the preferred embodiment of the present invention, an accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for.
    Type: Application
    Filed: August 15, 2008
    Publication date: March 12, 2009
    Applicant: International Business Machines Corporation
    Inventors: Ryuki Tachibana, Masafumi Nishimura
  • Publication number: 20090043570
    Abstract: Method for processing speech signal data. A speech signal is divided into frames. Each frame is characterized by a frame number T representing a unique interval of time. Each speech signal is characterized by a power spectrum with respect to frame T and frequency band ?. A speech segment and a reverberation segment of the speech signal is determined. L filter coefficients W(k) (k=1, 2, . . . , L) respectively corresponding to L frames immediately preceding frame T are computed such that the L filter coefficients minimize a function ? that is a linear combination of sum of squares of a residual speech power in the reverberation segment and a sum of squares of a subtracted speech power in the speech segment. The computed L filter coefficients are stored within storage media of the computing apparatus.
    Type: Application
    Filed: August 7, 2007
    Publication date: February 12, 2009
    Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
  • Patent number: 7480612
    Abstract: A word predicting method for use with a voice recognition using a computer includes the steps of specifying a sentence structure of a history up to a word immediately before the word to be predicted, referring to a context tree stored in arboreal context tree storage section having information about possible structures of a sentence and a probability of appearance of words with respect to the structures at nodes, and predicting words based on the context tree and the specified sentence structure of the history.
    Type: Grant
    Filed: August 22, 2002
    Date of Patent: January 20, 2009
    Assignee: International Business Machines Corporation
    Inventors: Shinsuke Mori, Masafumi Nishimura, Nobuyasu Itoh
  • Patent number: 7478041
    Abstract: Provided is a method for canceling background noise of a sound source other than a target direction sound source in order to realize highly accurate speech recognition, and a system using the same. In terms of directional characteristics of a microphone array, due to a capability of approximating a power distribution of each angle of each of possible various sound source directions by use of a sum of coefficient multiples of a base form angle power distribution of a target sound source measured beforehand by base form angle by using a base form sound, and power distribution of a non-directional background sound by base form, only a component of the target sound source direction is extracted at a noise suppression part. In addition, when the target sound source direction is unknown, at a sound source localization part, a distribution for minimizing the approximate residual is selected from base form angle power distributions of various sound source directions to assume a target sound source direction.
    Type: Grant
    Filed: March 12, 2003
    Date of Patent: January 13, 2009
    Assignee: International Business Machines Corporation
    Inventors: Osamu Ichikawa, Tetsuya Takiguchi, Masafumi Nishimura
  • Publication number: 20080306742
    Abstract: For design of a speech interface accepting speech control options, speech samples are stored on a computer-readable medium. A similarity calculating unit calculates a certain indication of similarity of first and second sets of ones of the speech samples, the first set of speech samples being associated with a first speech control option and the second set of speech samples being associated with a second speech control option. A display unit displays the similarity indication. In another aspect, word vectors are generated for the respective speech sample sets, indicating frequencies of occurrence of respective words in the respective speech sample sets. The similarity calculating unit calculates the similarity indication responsive to the word vectors of the respective speech sample sets. In another aspect, a perplexity indication is calculated for respective speech sample sets responsive to language models for the respective speech sample sets.
    Type: Application
    Filed: July 31, 2008
    Publication date: December 11, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Osamu Ichikawa, Gakuto Kurata, Masafumi Nishimura
  • Publication number: 20080294432
    Abstract: Provides speech enhancement techniques which are effective even for extemporaneous noise without a noise interval and unknown extemporaneous noise. An example of a signal enhancement device includes: spectral subtraction means for subtracting a given reference signal from an input signal containing a target signal and a noise signal by spectral subtraction; an adaptive filter applied to the reference signal; and coefficient control means for controlling a filter coefficient of the adaptive filter in order to reduce components of the noise signal in the input signal. In the signal enhancement device, a database of a signal model concerning the target signal expressing a given feature by means of a given statistical model is provided, and the filter coefficient is controlled based on the likelihood of the signal model with respect to an output signal from the spectral subtraction means.
    Type: Application
    Filed: May 26, 2008
    Publication date: November 27, 2008
    Inventors: Tetsuya Takiguchi, Masafumi Nishimura
  • Publication number: 20080270131
    Abstract: The present invention relates to a method, preprocessor, speech recognition system, and program product for extracting a target speech by removing noise. In an embodiment of the invention target speech is extracted from two input speeches, which are obtained through at least two speech input devices installed in different places in a space, applies a spectrum subtraction process by using a noise power spectrum (U?) estimated by one or both of the two speech input devices (X?(T)) and an arbitrary subtraction constant (?) to obtain a resultant subtracted power spectrum (Y?(T)). The invention further applies a gain control based on the two speech input devices to the resultant subtracted power spectrum to obtain a gain-controlled power spectrum (D?(T)). The invention further applies a flooring process to said resultant gain-controlled power spectrum on the basis of arbitrary Flooring factor (?) to obtain a power spectrum for speech recognition (Z?(T)).
    Type: Application
    Filed: April 18, 2008
    Publication date: October 30, 2008
    Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
  • Publication number: 20080221890
    Abstract: Techniques for acquiring, from an input text and an input speech, a set of a character string and a pronunciation thereof which should be recognized as a word. A system according to the present invention: selects, from an input text, plural candidate character strings which are candidates to be recognized as a word; generates plural pronunciation candidates of the selected candidate character strings; generates frequency data by combining data in which the generated pronunciation candidates are respectively associated with the character strings; generates recognition data in which character strings respectively indicating plural words contained in the input speech are associated with pronunciations; and selects and outputs a combination contained in the recognition data, out of combinations each consisting of one of the candidate character strings and one of the pronunciation candidates.
    Type: Application
    Filed: March 6, 2008
    Publication date: September 11, 2008
    Inventors: Gakuto Kurata, Shinsuke Mori, Masafumi Nishimura
  • Publication number: 20080221873
    Abstract: A word prediction apparatus and method that improves the precision accuracy, and a speech recognition method and an apparatus therefor are provided. For the prediction of a sixth word “?”, a partial analysis tree having a modification relationship with the sixth word is predicted. “sara-ni sho-senkyoku no” has two partial analysis trees, “sara-ni” and “sho-senkyoku no”. It is predicted that “sara-ni” does not have a modification relationship with the sixth word, and that “sho-senkyoku no” does. Then, “donyu”, which is the sixth word from “sho-senkyoku no”, is predicted. In this example, since “sara-ni” is not useful information for the prediction of “donyu”, it is preferable that “donyu” be predicted only by “sho-senkyoku no”.
    Type: Application
    Filed: March 10, 2008
    Publication date: September 11, 2008
    Applicant: International Business Machines Corporation
    Inventors: Shinsuke Mori, Masafumi Nishimura, Nobuyasu Itoh
  • Publication number: 20080221872
    Abstract: A word prediction method and apparatus improves precision and accuracy. For the prediction of a sixth word “?”, a partial analysis tree having a modification relationship with the sixth word is predicted. “sara-ni sho-senkyoku no” has two partial analysis trees, “sara-ni” and “sho-senkyoku no”. It is predicted that “sara-ni” does not have a modification relationship with the sixth word, and that “sho-senkyoku no” does. Then, “donyu”, which is the sixth word from “sho-senkyoku no”, is predicted. In this example, since “sara-ni” is not useful information for the prediction of “donyu”, it is preferable that “donyu” be predicted only by “sho-senkyoku no”.
    Type: Application
    Filed: March 10, 2008
    Publication date: September 11, 2008
    Inventors: Shinsuke Mori, Masafumi Nishimura, Nobuyasu Itoh
  • Publication number: 20080183473
    Abstract: A synthetic speech system includes a phoneme segment storage section for storing multiple phoneme segment data pieces; a synthesis section for generating voice data from text by reading phoneme segment data pieces representing the pronunciation of an inputted text from the phoneme segment storage section and connecting the phoneme segment data pieces to each other; a computing section for computing a score indicating the unnaturalness of the voice data representing the synthetic speech of the text; a paraphrase storage section for storing multiple paraphrases of the multiple first phrases; a replacement section for searching the text and replacing with appropriate paraphrases; and a judgment section for outputting generated voice data on condition that the computed score is smaller than a reference value and for inputting the text after the replacement to the synthesis section to cause the synthesis section to further generate voice data for the text.
    Type: Application
    Filed: January 30, 2008
    Publication date: July 31, 2008
    Applicant: International Business Machines Corporation
    Inventors: Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
  • Publication number: 20080183472
    Abstract: Speech recognition is performed by matching between a characteristic quantity of an inputted speech and a composite HMM obtained by synthesizing a speech HMM (hidden Markov model) and a noise HMM for each speech frame of the inputted speech by use of the composite HMM.
    Type: Application
    Filed: January 9, 2008
    Publication date: July 31, 2008
    Applicant: International Business Machine Corporation
    Inventors: Teksuya Takiguchi, Masafumi Nishimura
  • Publication number: 20080177543
    Abstract: Training wording data indicating the wording of each of the words in training text, training speech data indicating characteristics of speech of each of the words, and training boundary data indicating whether each word in training speech is a boundary of a prosodic phrase are stored. After inputting candidates for boundary data, a first likelihood that each of the a boundary of a prosodic phrase of the words in the inputted text would agree with one of the inputted boundary data candidates is calculated and a second likelihood is calculated. Thereafter, one boundary data candidate maximizing a product of the first and second likelihoods is searched out from among the inputted boundary data candidates, and then a result of the searching is outputted.
    Type: Application
    Filed: November 27, 2007
    Publication date: July 24, 2008
    Applicant: International Business Machines Corporation
    Inventors: Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana, Gakuto Kurata
  • Patent number: 7403896
    Abstract: Speech recognition is performed by matching between a characteristic quantity of an inputted speech and a composite HMM obtained by synthesizing a speech HMM (hidden Markov model) and a noise HMM for each speech frame of the inputted speech by use of the composite HMM.
    Type: Grant
    Filed: March 14, 2003
    Date of Patent: July 22, 2008
    Assignee: International Business Machines Corporation
    Inventors: Tetsuya Takiguchi, Masafumi Nishimura
  • Patent number: 7359852
    Abstract: A word prediction method that improves the precision accuracy, and a speech recognition method and an apparatus therefor are provided. For the prediction of a sixth word “?”, a partial analysis tree having a modification relationship with the sixth word is predicted. “sara-ni sho-senkyoku no” has two partial analysis trees, “sara-ni” and “sho-senkyoku no”. It is predicted that “sara-ni” does not have a modification relationship with the sixth word, and that “sho-senkyoku no” does. Then, “donyu”, which is the sixth word from “sho-senkyoku no”, is predicted. In this example, since “sara-ni” is not useful information for the prediction of “donyu”, it is preferable that “donyu” be predicted only by “sho-senkyoku no”.
    Type: Grant
    Filed: July 11, 2001
    Date of Patent: April 15, 2008
    Assignee: International Business Machines Corporation
    Inventors: Shinsuke Mori, Masafumi Nishimura, Nobuyasu Itoh
  • Publication number: 20080059157
    Abstract: Method and computing apparatus for processing speech signal data. A speech signal is divided into frames. Each frame is characterized by a frame number T representing a unique interval of time. Each speech signal is characterized by a power spectrum with respect to frame T and frequency band ?. A speech segment and a reverberation segment of the speech signal is determined. L filter coefficients W(k) (k=1, 2, . . . , L) respectively corresponding to L frames immediately preceding frame T are computed such that the L filter coefficients minimize a function ? that is a linear combination of sum of squares of a residual speech power in the reverberation segment and a sum of squares of a subtracted speech power in the speech segment. The computed L filter coefficients are stored within storage media of the computing apparatus.
    Type: Application
    Filed: August 7, 2007
    Publication date: March 6, 2008
    Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
  • Publication number: 20080046247
    Abstract: A system for generating high-quality synthesized text-to-speech includes a learning data generating unit, a frequency data generating unit, and a setting unit. The learning data generating unit recognizes inputted speech, and then generates first learning data in which wordings of phrases are associated with readings thereof. The frequency data generating unit generates, based on the first learning data, frequency data indicating appearance frequencies of both wordings and readings of phrases. The setting unit sets the thus generated frequency data for a language processing unit in order to approximate outputted speech of text-to-speech to the inputted speech. Furthermore, the language processing unit generates, from a wording of text, a reading corresponding to the wording, on the basis of the appearance frequencies.
    Type: Application
    Filed: July 9, 2007
    Publication date: February 21, 2008
    Inventors: Gakuto Kurata, Toru Nagano, Masafumi Nishimura, Ryuki Tachibana
  • Publication number: 20080040119
    Abstract: For design of a speech interface accepting speech control options, speech samples are stored on a computer-readable medium. A similarity calculating unit calculates a certain indication of similarity of first and second sets of ones of the speech samples, the first set of speech samples being associated with a first speech control option and the second set of speech samples being associated with a second speech control option. A display unit displays the similarity indication. In another aspect, word vectors are generated for the respective speech sample sets, indicating frequencies of occurrence of respective words in the respective speech sample sets. The similarity calculating unit calculates the similarity indication responsive to the word vectors of the respective speech sample sets. In another aspect, a perplexity indication is calculated for respective speech sample sets responsive to language models for the respective speech sample sets.
    Type: Application
    Filed: July 3, 2007
    Publication date: February 14, 2008
    Inventors: Osamu Ichikawa, Gakuto Kurata, Masafumi Nishimura
  • Publication number: 20070016422
    Abstract: A system that outputs phonemes and accents of texts. The system has a storage section storing a first corpus in which spellings, phonemes, and accents of a text input beforehand are recorded separately for individual segmentations of the words that are contained in the text. A text for which phonemes and accents are to be output is acquired and the first corpus is searched to retrieve at least one set of spellings that match the spellings in the text from among sets of contiguous spellings. Then, the combination of a phoneme and an accent that has a higher probability of occurrence in the first corpus than a predetermined reference probability is selected as the phonemes and accent of the text.
    Type: Application
    Filed: July 12, 2006
    Publication date: January 18, 2007
    Inventors: Shinsuke Mori, Toru Nagano, Masafumi Nishimura