Patents by Inventor Masafumi Nishimura

Masafumi Nishimura has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8712770
    Abstract: The present invention relates to a method, preprocessor, speech recognition system, and program product for extracting a target speech by removing noise. In an embodiment of the invention target speech is extracted from two input speeches, which are obtained through at least two speech input devices installed in different places in a space, applies a spectrum subtraction process by using a noise power spectrum (U?) estimated by one or both of the two speech input devices (X?(T)) and an arbitrary subtraction constant (?) to obtain a resultant subtracted power spectrum (Y?(T)). The invention further applies a gain control based on the two speech input devices to the resultant subtracted power spectrum to obtain a gain-controlled power spectrum (D?(T)). The invention further applies a flooring process to said resultant gain-controlled power spectrum on the basis of arbitrary Flooring factor (?) to obtain a power spectrum for speech recognition (Z?(T)).
    Type: Grant
    Filed: April 18, 2008
    Date of Patent: April 29, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
  • Patent number: 8566084
    Abstract: A speech signal processing system which outputs a speech feature, divides an input speech signal into frames so that each pair of consecutive frames have a frame shift length equal to at least one period of the speech signal and have an overlap equal to at least a predetermined length, applies discrete Fourier transform to each of the frames, calculates a CSP coefficient for the pair, searches a predetermined search range in which a speech wave lags a period equal to at least one period to obtain the maximum value of the CSP coefficient for the pair, and generates time-series data of the maximum CSP coefficient values arranged in the order in which the frames appear. A method and a computer readable article of manufacture for the implementing the same are also provided.
    Type: Grant
    Filed: June 1, 2011
    Date of Patent: October 22, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Osamu Ichikawa, Masafumi Nishimura
  • Publication number: 20130268275
    Abstract: Waveform concatenation speech synthesis with high sound quality. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. An accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values.
    Type: Application
    Filed: December 31, 2012
    Publication date: October 10, 2013
    Inventors: Ryuki Tachibana, Masafumi Nishimura
  • Patent number: 8468016
    Abstract: A speech feature extraction apparatus, speech feature extraction method, and speech feature extraction program. A speech feature extraction apparatus includes: first difference calculation module to: (i) receive, as an input, a spectrum of a speech signal segmented into frames for each frequency bin; and (ii) calculate a delta spectrum for each of the frame, where the delta spectrum is a difference of the spectrum within continuous frames for the frequency bin; and first normalization module to normalize the delta spectrum of the frame for the frequency bin by dividing the delta spectrum by a function of an average spectrum; where the average spectrum is an average of spectra through all frames that are overall speech for the frequency bin; and where an output of the first normalization module is defined as a first delta feature.
    Type: Grant
    Filed: September 6, 2012
    Date of Patent: June 18, 2013
    Assignee: International Business Machines Corporation
    Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
  • Patent number: 8370149
    Abstract: Waveform concatenation speech synthesis with high sound quality. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. An accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values.
    Type: Grant
    Filed: August 15, 2008
    Date of Patent: February 5, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Ryuki Tachibana, Masafumi Nishimura
  • Publication number: 20130006991
    Abstract: An information processing apparatus determines a weight of each physical feature for hierarchical clustering by acquiring training data of multiple pieces of content in triplets with label information indicating a pair specified by a user as having a highest degree of similarity among three contents of the triplet and executing hierarchical clustering using a feature vector of each piece of content of the training data and the weight of each feature to determine the hierarchical structure of the training data. The information processing apparatus updates the weight of each feature so that the degree of agreement between a pair combined first as being the same clusters among three contents of the triplet in a determined hierarchical structure and a pair indicated by label information corresponding to the triplet increases.
    Type: Application
    Filed: June 28, 2012
    Publication date: January 3, 2013
    Inventors: Toru Nagano, Masafumi Nishimura, Takashima Ryoichi, Ryuki Tachibana
  • Publication number: 20120330657
    Abstract: A speech feature extraction apparatus, speech feature extraction method, and speech feature extraction program. A speech feature extraction apparatus includes: first difference calculation module to: (i) receive, as an input, a spectrum of a speech signal segmented into frames for each frequency bin; and (ii) calculate a delta spectrum for each of the frame, where the delta spectrum is a difference of the spectrum within continuous frames for the frequency bin; and first normalization module to normalize the delta spectrum of the frame for the frequency bin by dividing the delta spectrum by a function of an average spectrum; where the average spectrum is an average of spectra through all frames that are overall speech for the frequency bin; and where an output of the first normalization module is defined as a first delta feature.
    Type: Application
    Filed: September 6, 2012
    Publication date: December 27, 2012
    Applicant: International Business Machines Corporation
    Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
  • Publication number: 20120330957
    Abstract: An information processing apparatus determines a weight of each physical feature for hierarchical clustering by acquiring training data of multiple pieces of content in triplets with label information indicating a pair specified by a user as having a highest degree of similarity among three contents of the triplet and executing hierarchical clustering using a feature vector of each piece of content of the training data and the weight of each feature to determine the hierarchical structure of the training data. The information processing apparatus updates the weight of each feature so that the degree of agreement between a pair combined first as being the same clusters among three contents of the triplet in a determined hierarchical structure and a pair indicated by label information corresponding to the triplet increases.
    Type: Application
    Filed: September 6, 2012
    Publication date: December 27, 2012
    Applicant: International Business Machines Corporation
    Inventors: Toru Nagano, Masafumi Nishimura, Takashima Ryoichi, Ryuki Tachibana
  • Publication number: 20120316880
    Abstract: An information processing apparatus, information processing method, and computer readable non-transitory storage medium for analyzing words reflecting information that is not explicitly recognized verbally. An information processing method includes the steps of: extracting speech data and sound data used for recognizing phonemes included in the speech data as words; identifying a section surrounded by pauses within a speech spectrum of the speech data; performing sound analysis on the identified section to identify a word in the section; generating prosodic feature values for the words; acquiring frequencies of occurrence of the word within the speech data; calculating a degree of fluctuation within the speech data for the prosodic feature values of high frequency words where the high frequency words are any words whose frequency of occurrence meets a threshold; and determining a key phrase based on the degree of fluctuation.
    Type: Application
    Filed: August 22, 2012
    Publication date: December 13, 2012
    Applicant: International Business Machines Corporation
    Inventors: Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
  • Publication number: 20120197644
    Abstract: An information processing apparatus, information processing method, and computer readable non-transitory storage medium for analyzing words reflecting information that is not explicitly recognized verbally. An information processing method includes the steps of: extracting speech data and sound data used for recognizing phonemes included in the speech data as words; identifying a section surrounded by pauses within a speech spectrum of the speech data; performing sound analysis on the identified section to identify a word in the section; generating prosodic feature values for the words; acquiring frequencies of occurrence of the word within the speech data; calculating a degree of fluctuation within the speech data for the prosodic feature values of high frequency words where the high frequency words are any words whose frequency of occurrence meets a threshold; and determining a key phrase based on the degree of fluctuation.
    Type: Application
    Filed: January 30, 2012
    Publication date: August 2, 2012
    Applicant: International Business Machines Corporation
    Inventors: Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
  • Publication number: 20120185243
    Abstract: A speech feature extraction apparatus, speech feature extraction method, and speech feature extraction program. A speech feature extraction apparatus includes: first difference calculation module to: (i) receive, as an input, a spectrum of a speech signal segmented into frames for each frequency bin; and (ii) calculate a delta spectrum for each of the frame, where the delta spectrum is a difference of the spectrum within continuous frames for the frequency bin; and first normalization module to normalize the delta spectrum of the frame for the frequency bin by dividing the delta spectrum by a function of an average spectrum; where the average spectrum is an average of spectra through all frames that are overall speech for the frequency bin; and where an output of the first normalization module is defined as a first delta feature.
    Type: Application
    Filed: July 10, 2010
    Publication date: July 19, 2012
    Applicant: International Business Machines Corp.
    Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
  • Patent number: 8165317
    Abstract: A position detection method, system, and computer readable article of manufacture tangibly embodying computer readable instructions for executing the method for detecting the position of a sound source using at least two microphones. The method includes the steps of: emitting a reproduced sound from the sound source; observing the reproduced sound and an observed sound at the microphones; converting the reproduced sound and the observed sound into electrical signals; transforming the signals of the reproduced sound and of the observed sound into frequency spectra by a frequency spectrum transformer apparatus; calculating Crosspower Spectrum Phase (CSP) coefficients of the frequency spectra of the signals by a CSP coefficient calculator apparatus; and calculating distances between the position of the sound source and the positions of the microphones based on the calculated CSP coefficients by a distance calculating apparatus, thereby detecting the position of the sound source.
    Type: Grant
    Filed: July 6, 2009
    Date of Patent: April 24, 2012
    Assignee: International Business Machines Corporation
    Inventors: Osamu Ichikawa, Tohru Nagano, Masafumi Nishimura
  • Patent number: 8165874
    Abstract: A system, method, and program product for processing voice data in a conversation between two persons to determine characteristic conversation patterns. The system includes: a variation calculator for calculating a variation of a speech ratio of a first speaker and a variation calculator for calculating a variation of a speech ratio of a second speaker; a difference calculator for calculating a difference data string; a smoother for generating a smoothed difference data string; and a presenter for presenting the difference between the variation of the speech ratio of the first speaker and the speech ratio of the second speaker. The method includes: calculating a variation of a speech ratio of a first speaker and a second speaker; calculating a difference data string; generating a smoothed difference data string; and grouping them according to their patterns.
    Type: Grant
    Filed: March 6, 2009
    Date of Patent: April 24, 2012
    Assignee: International Business Machines Corporation
    Inventors: Gakuto Kurata, Masafumi Nishimura
  • Patent number: 8150687
    Abstract: An example embodiment of the invention includes a speech recognition processing unit for specifying speech segments for speech data, recognizing a speech in each of the speech segments, and associating a character string of obtained recognition data with the speech data for each speech segment, based on information on a time of the speech, and an output control unit for displaying/outputting the text prepared by sorting the recognition data in each speech segment. Sometimes, the system further includes a text editing unit for editing the prepared text, and a speech correspondence estimation unit for associating a character string in the edited text with the speech data by using a technique of dynamic programming.
    Type: Grant
    Filed: November 30, 2004
    Date of Patent: April 3, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Shinsuke Mori, Nobuyasu Itoh, Masafumi Nishimura
  • Patent number: 8150693
    Abstract: A word prediction apparatus and method that improves the precision accuracy, and a speech recognition method and an apparatus therefor are provided. For the prediction of a sixth word “?”, a partial analysis tree having a modification relationship with the sixth word is predicted. “sara-ni sho-senkyoku no” has two partial analysis trees, “sara-ni” and “sho-senkyoku no”. It is predicted that “sara-ni” does not have a modification relationship with the sixth word, and that “sho-senkyoku no” does. Then, “donyu”, which is the sixth word from “sho-senkyoku no”, is predicted. In this example, since “sara-ni” is not useful information for the prediction of “donyu”, it is preferable that “donyu” be predicted only by “sho-senkyoku no”.
    Type: Grant
    Filed: March 10, 2008
    Date of Patent: April 3, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Shinsuke Mori, Masafumi Nishimura, Nobuyasu Itoh
  • Publication number: 20120059654
    Abstract: An objective is to provide a technique for accurately reproducing features of a fundamental frequency of a target-speaker's voice on the basis of only a small amount of learning data. A learning apparatus learns shift amounts from a reference source F0 pattern to a target F0 pattern of a target-speaker's voice. The learning apparatus associates a source F0 pattern of a learning text to a target F0 pattern of the same learning text by associating their peaks and troughs. For each of points on the target F0 pattern, the learning apparatus obtains shift amounts in a time-axis direction and in a frequency-axis direction from a corresponding point on the source F0 pattern in reference to a result of the association, and learns a decision tree using, as an input feature vector, linguistic information obtained by parsing the learning text, and using, as an output feature vector, the calculated shift amounts.
    Type: Application
    Filed: March 16, 2010
    Publication date: March 8, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Masafumi Nishimura, Ryuki Tachibana
  • Publication number: 20110301945
    Abstract: A speech signal processing system which outputs a speech feature, divides an input speech signal into frames so that each pair of consecutive frames have a frame shift length equal to at least one period of the speech signal and have an overlap equal to at least a predetermined length, applies discrete Fourier transform to each of the frames, calculates a CSP coefficient for the pair, searches a predetermined search range in which a speech wave lags a period equal to at least one period to obtain the maximum value of the CSP coefficient for the pair, and generates time-series data of the maximum CSP coefficient values arranged in the order in which the frames appear. A method and a computer readable article of manufacture for the implementing the same are also provided.
    Type: Application
    Filed: June 1, 2011
    Publication date: December 8, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Osamu Ichikawa, Masafumi Nishimura
  • Patent number: 8065149
    Abstract: Techniques for acquiring, from an input text and an input speech, a set of a character string and a pronunciation thereof which should be recognized as a word. A system according to the present invention: selects, from an input text, plural candidate character strings which are candidates to be recognized as a word; generates plural pronunciation candidates of the selected candidate character strings; generates frequency data by combining data in which the generated pronunciation candidates are respectively associated with the character strings; generates recognition data in which character strings respectively indicating plural words contained in the input speech are associated with pronunciations; and selects and outputs a combination contained in the recognition data, out of combinations each consisting of one of the candidate character strings and one of the pronunciation candidates.
    Type: Grant
    Filed: March 6, 2008
    Date of Patent: November 22, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Gakuto Kurata, Shinsuke Mori, Masafumi Nishimura
  • Patent number: 8024184
    Abstract: A speech recognition device and method configured to include a computer, for recognizing speech, including: a storage location for storing a feature quantity acquired from a speech signal for each frame; storage portions for storing acoustic model data and language model data; a echo speech component for generating echo speech model data from a speech signal acquired prior to a speech signal to be processed at the current time point and using the echo speech model data to generate adapted acoustic model data; and a processing component for utilizing the feature quantity, the adapted acoustic model data, and the language model data to provide a speech recognition result of the speech signal.
    Type: Grant
    Filed: June 2, 2009
    Date of Patent: September 20, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Tetsuya Takiguchi, Masafumi Nishimura
  • Patent number: 8015011
    Abstract: A synthetic speech system includes a phoneme segment storage section for storing multiple phoneme segment data pieces; a synthesis section for generating voice data from text by reading phoneme segment data pieces representing the pronunciation of an inputted text from the phoneme segment storage section and connecting the phoneme segment data pieces to each other; a computing section for computing a score indicating the unnaturalness of the voice data representing the synthetic speech of the text; a paraphrase storage section for storing multiple paraphrases of the multiple first phrases; a replacement section for searching the text and replacing with appropriate paraphrases; and a judgment section for outputting generated voice data on condition that the computed score is smaller than a reference value and for inputting the text after the replacement to the synthesis section to cause the synthesis section to further generate voice data for the text.
    Type: Grant
    Filed: January 30, 2008
    Date of Patent: September 6, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana