Patents by Inventor Sadaoki Furui

Sadaoki Furui has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7552049
    Abstract: An object of the present invention is to enable optimal clustering for many types of noise data and to improve the accuracy of estimation of a speech model sequence of input speech. Noise is added to speech in accordance with noise-to-signal ratio conditions to generate noise-added speech (step S1), the mean value of speech cepstral is subtracted from the generated, noise-added speech (step 2), a Gaussian distribution model of each piece of noise-added speech is created (step S3), the likelihoods of the pieces of noise-added speech are calculated to generate a likelihood matrix (step S4) to obtain a clustering result. An optimum model is selected (step S7) and linear transformation is performed to provide a maximized likelihood (step S8). Because noise-added speech is consistently used both in clustering and model learning, clustering for many types of noise data and an accurate estimation of a speech model sequence can be achieved.
    Type: Grant
    Filed: March 10, 2004
    Date of Patent: June 23, 2009
    Assignees: NTT DoCoMo, Inc., Sadaoki Furui
    Inventors: Zhipeng Zhang, Kiyotaka Otsuji, Toshiaki Sugimura, Sadaoki Furui
  • Patent number: 7424426
    Abstract: An object of the present invention is to facilitate dealing with noisy speech with varying SNR and save calculation costs by generating a speech model with a single-tree-structure and using the model for speech recognition. Every piece of noise data stored in a noise database is used under every SNR condition to calculate the distance between all noise models with the SNR conditions and the noise-added speech is clustered. Based on the result of the clustering, a single-tree-structure model space into which the noise and SNR are integrated is generated (steps S1 to S5). At a noise extraction step (step S6), inputted noisy speech to be recognized is analyzed to extract a feature parameter string and the likelihoods of HMMs are compared one another to select an optimum model from the tree-structure noisy speech model space (step S7). Linear transformation is applied to the selected noisy speech model space so that the likelihood is maximized (step S8).
    Type: Grant
    Filed: August 18, 2004
    Date of Patent: September 9, 2008
    Assignee: Sadaoki Furui and NTT DoCoMo, Inc.
    Inventors: Sadaoki Furui, Zhipeng Zhang, Tsutomu Horikoshi, Toshiaki Sugimura
  • Publication number: 20050080623
    Abstract: An object of the present invention is to facilitate dealing with noisy speech with varying SNR and save calculation costs by generating a speech model with a single-tree-structure and using the model for speech recognition. Every piece of noise data stored in a noise database is used under every SNR condition to calculate the distance between all noise models with the SNR conditions and the noise-added speech is clustered. Based on the result of the clustering, a single-tree-structure model space into which the noise and SNR are integrated is generated (steps S1 to S5). At a noise extraction step (step S6), inputted noisy speech to be recognized is analyzed to extract a feature parameter string and the likelihoods of HMMs are compared one another to select an optimum model from the tree-structure noisy speech model space (step S7). Linear transformation is applied to the selected noisy speech model space so that the likelihood is maximized (step S8).
    Type: Application
    Filed: August 18, 2004
    Publication date: April 14, 2005
    Applicant: NTT DOCOMO, INC.
    Inventors: Sadaoki Furui, Zhipeng Zhang, Tsutomu Horikoshi, Toshiaki Sugimura
  • Publication number: 20040204937
    Abstract: An object of the present invention is to enable optimal clustering for many types of noise data and to improve the accuracy of estimation of a speech model sequence of input speech. Noise is added to speech in accordance with noise-to-signal ratio conditions to generate noise-added speech (step S1), the mean value of speech cepstral is subtracted from the generated, noise-added speech (step S2), a Gaussian distribution model of each piece of noise-added speech is created (step S3), the likelihoods of the pieces of noise-added speech are calculated to generate a likelihood matrix (step S4) to obtain a clustering result. An optimum model is selected (step S7) and linear transformation is performed to provide a maximized likelihood (step S8). Because noise-added speech is consistently used both in clustering and model learning, clustering for many types of noise data and an accurate estimation of a speech model sequence can be achieved.
    Type: Application
    Filed: March 10, 2004
    Publication date: October 14, 2004
    Applicant: NTT DoCoMo, Inc.
    Inventors: Zhipeng Zhang, Kiyotaka Otsuji, Toshiaki Sugimura, Sadaoki Furui
  • Patent number: 5835890
    Abstract: In a speaker adaptation method for speech models, input speech is transformed to a feature parameter sequence like a cepstral sequence, and N model sequences of maximum likelihood for the feature parameter sequence are extracted from speaker-independent speech HMMs by an N-best hypothesis extraction method. The extracted model sequences are each provisionally adapted to maximize its likelihood for the feature parameter sequence of the input speech while changing the HMM parameters of each sequence, and that one of the provisionally adapted model sequences which has the maximum likelihood for the feature parameter sequence of the input speech is selected and speech models of the selected sequence are provided as adapted HMMs of the speaker to be recognized.
    Type: Grant
    Filed: April 9, 1997
    Date of Patent: November 10, 1998
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Tomoko Matsui, Sadaoki Furui
  • Patent number: 5721808
    Abstract: Noise-resistant speech HMMs are composed by: recording noise in the environment of utterance (S.sub.1); preparing HMMs of the noise (S.sub.2); transforming the output probability distribution of each of the noise HMMs and speech HMMs prepared from speech unaffected by noise and multiplicative distortion to a linear spectral domain (S.sub.31); multiplying the speech HMM distribution in the linear spectral domain by a multiplicative distortion W that is an unknown variable (S.sub.321); convoluting the multiplied value and the noise HMM distribution in the linear spectral domain (S.sub.322); inversely transforming the convoluted value to the original domain of the speech HMM (S.sub.33) to compose incomplete noise-resistant speech HMMs each containing multiplicative distortion as an unknown variable (S.sub.
    Type: Grant
    Filed: March 4, 1996
    Date of Patent: February 24, 1998
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Yasuhiro Minami, Tomoko Matsui, Sadaoki Furui