Patents by Inventor Osamu Ichikawa

Osamu Ichikawa has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20190220747
    Abstract: A technique for training a neural network including an input layer, one or more hidden layers and an output layer, in which the trained neural network can be used to perform a task such as speech recognition. In the technique, a base of the neural network having at least a pre-trained hidden layer is prepared. A parameter set associated with one pre-trained hidden layer in the neural network is decomposed into a plurality of new parameter sets. The number of hidden layers in the neural network is increased by using the plurality of the new parameter sets. Pre-training for the neural network is performed.
    Type: Application
    Filed: April 9, 2019
    Publication date: July 18, 2019
    Inventors: Takashi Fukuda, Osamu Ichikawa
  • Patent number: 10346125
    Abstract: A method, a system, and a computer program product detect a clipping event in audio signals. The method includes digitalizing audio signals having limited frequency bands, at a sampling frequency which is greater than two times as large as the maximum frequency component of the audio signal; and detecting a clipping event of the audio signals, based on magnitudes of spectrum in a bandwidth which is greater than or equal to the limited frequency band. The sampling frequency may be greater than or equal to three times as large as the maximum frequency component of the audio signal. The detection of a clipping event may include determining, for each frame, whether or not a sum or average of the magnitudes of spectrum at the bandwidth which is greater than or equal to the limited frequency band is larger than a predetermined threshold.
    Type: Grant
    Filed: August 18, 2015
    Date of Patent: July 9, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Takashi Fukuda, Osamu Ichikawa
  • Publication number: 20190206394
    Abstract: Acoustic change is detected by a method including preparing a first Gaussian Mixture Model (GMM) trained with first audio data of first speech sound from a speaker at a first distance from an audio interface and a second GMM generated from the first GMM using second audio data of second speech sound from the speaker at a second distance from the audio interface; calculating a first output of the first GMM and a second output of the second GMM by inputting obtained third audio data into the first GMM and the second GMM; and transmitting a notification in response to determining at least that a difference between the first output and the second output exceeds a threshold. Each Gaussian distribution of the second GMM has a mean obtained by shifting a mean of a corresponding Gaussian distribution of the first GMM by a common channel bias.
    Type: Application
    Filed: January 3, 2018
    Publication date: July 4, 2019
    Inventors: Osamu Ichikawa, Gakuto Kurata, Takashi Fukuda
  • Publication number: 20190080684
    Abstract: A computer-implemented method for processing a speech signal, includes: identifying speech segments in an input speech signal; calculating an upper variance and a lower variance, the upper variance being a variance of upper spectra larger than a criteria among speech spectra corresponding to frames in the speech segments, the lower variance being a variance of lower spectra smaller than a criteria among the speech spectra corresponding to the frames in the speech segments; determining whether the input speech signal is a special input speech signal using a difference between the upper variance and the lower variance; and performing speech recognition of the input speech signal which has been determined to be the special input speech signal, using a special acoustic model for the special input speech signal.
    Type: Application
    Filed: September 14, 2017
    Publication date: March 14, 2019
    Inventors: Osamu Ichikawa, Takashi Fukuda, Gakuto Kurata, Bhuvana Ramabhadran
  • Patent number: 10217456
    Abstract: A method and system for generating training data for a target domain using speech data of a source domain. The training data generation method including: reading out a Gaussian mixture model (GMM) of a target domain trained with a clean speech data set of the target domain; mapping, by referring to the GMM of the target domain, a set of source domain speech data received as an input to the set of target domain speech data on a basis of a channel characteristic of the target domain speech data; and adding a noise of the target domain to the mapped set of source domain speech data to output a set of pseudo target domain speech data.
    Type: Grant
    Filed: April 14, 2014
    Date of Patent: February 26, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Osamu Ichikawa, Steven J Rennie
  • Publication number: 20190012594
    Abstract: A technique for training a neural network including an input layer, one or more hidden layers and an output layer, in which the trained neural network can be used to perform a task such as speech recognition. In the technique, a base of the neural network having at least a pre-trained hidden layer is prepared. A parameter set associated with one pre-trained hidden layer in the neural network is decomposed into a plurality of new parameter sets. The number of hidden layers in the neural network is increased by using the plurality of the new parameter sets. Pre-training for the neural network is performed.
    Type: Application
    Filed: July 5, 2017
    Publication date: January 10, 2019
    Inventors: Takashi Fukuda, Osamu Ichikawa
  • Publication number: 20180350347
    Abstract: A method, computer system, and a computer program product for generating a plurality of voice data having a particular speaking style is provided. The present invention may include preparing a plurality of original voice data corresponding to at least one word or at least one phrase is prepared. The present invention may also include attenuating a low frequency component and a high frequency component in the prepared plurality of original voice data. The present invention may then include reducing power at a beginning and an end of the prepared plurality of original voice data. The present invention may further include storing a plurality of resultant voice data obtained after the attenuating and the reducing.
    Type: Application
    Filed: May 31, 2017
    Publication date: December 6, 2018
    Inventors: Takashi Fukuda, Osamu Ichikawa, Gakuto Kurata, Masayuki Suzuki
  • Publication number: 20180350348
    Abstract: A method, computer system, and a computer program product for generating a plurality of voice data having a particular speaking style is provided. The present invention may include preparing a plurality of original voice data corresponding to at least one word or at least one phrase is prepared. The present invention may also include attenuating a low frequency component and a high frequency component in the prepared plurality of original voice data. The present invention may then include reducing power at a beginning and an end of the prepared plurality of original voice data. The present invention may further include storing a plurality of resultant voice data obtained after the attenuating and the reducing.
    Type: Application
    Filed: December 28, 2017
    Publication date: December 6, 2018
    Inventors: Takashi Fukuda, Osamu Ichikawa, Gakuto Kurata, Masayuki Suzuki
  • Publication number: 20180277104
    Abstract: A computer-implemented method is provided. The computer-implemented method is performed by a speech recognition system having at least a processor. The method includes estimating sound identification information from a neural network having periodic indications and components of a frequency spectrum of an audio signal data inputted thereto. The method further includes performing a speech recognition operation on the audio signal data to decode the audio signal data into a textual representation based on the estimated sound identification information. The neural network includes a plurality of fully-connected network layers having a first layer that includes a plurality of first nodes and a plurality of second nodes. The method further comprises training the neural network by initially isolating the periodic indications from the components of the frequency spectrum in the first layer by setting weights between the first nodes and a plurality of input nodes corresponding to the periodic indications to 0.
    Type: Application
    Filed: May 30, 2018
    Publication date: September 27, 2018
    Inventors: Takashi Fukuda, Osamu Ichikawa, Bhuvana Ramabhadran
  • Publication number: 20180247641
    Abstract: A computer-implemented method and an apparatus are provided. The method includes obtaining, by a processor, a frequency spectrum of an audio signal data. The method further includes extracting, by the processor, periodic indications from the frequency spectrum. The method also includes inputting, by the processor, the periodic indications and components of the frequency spectrum into a neural network. The method additionally includes estimating, by the processor, sound identification information from the neural network.
    Type: Application
    Filed: February 24, 2017
    Publication date: August 30, 2018
    Inventors: Takashi Fukuda, Osamu Ichikawa, Bhuvana Ramabhadran
  • Patent number: 10062378
    Abstract: A computer-implemented method and an apparatus are provided. The method includes obtaining, by a processor, a frequency spectrum of an audio signal data. The method further includes extracting, by the processor, periodic indications from the frequency spectrum. The method also includes inputting, by the processor, the periodic indications and components of the frequency spectrum into a neural network. The method additionally includes estimating, by the processor, sound identification information from the neural network.
    Type: Grant
    Filed: February 24, 2017
    Date of Patent: August 28, 2018
    Assignee: International Business Machines Corporation
    Inventors: Takashi Fukuda, Osamu Ichikawa, Bhuvana Ramabhadran
  • Patent number: 9818428
    Abstract: Methods and systems are provided for separating a target speech from a plurality of other speeches having different directions of arrival. One of the methods includes obtaining speech signals from speech input devices disposed apart in predetermined distances from one another, calculating a direction of arrival of target speeches and directions of arrival of other speeches other than the target speeches for each of at least one pair of speech input devices, calculating an aliasing metric, wherein the aliasing metric indicates which frequency band of speeches is susceptible to spatial aliasing, enhancing speech signals arrived from the direction of arrival of the target speech signals, based on the speech signals and the direction of arrival of the target speeches, to generate the enhanced speech signals, reading a probability model, and inputting the enhanced speech signals and the aliasing metric to the probability model to output target speeches.
    Type: Grant
    Filed: February 23, 2017
    Date of Patent: November 14, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Takashi Fukuda, Osamu Ichikawa
  • Publication number: 20170278509
    Abstract: A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence.
    Type: Application
    Filed: June 13, 2017
    Publication date: September 28, 2017
    Inventors: Takashi Fukuda, Osamu Ichikawa, Futoshi Iwama
  • Publication number: 20170278524
    Abstract: Methods and systems are provided for separating a target speech from a plurality of other speeches having different directions of arrival. One of the methods includes obtaining speech signals from speech input devices disposed apart in predetermined distances from one another, calculating a direction of arrival of target speeches and directions of arrival of other speeches other than the target speeches for each of at least one pair of speech input devices, calculating an aliasing metric, wherein the aliasing metric indicates which frequency band of speeches is susceptible to spatial aliasing, enhancing speech signals arrived from the direction of arrival of the target speech signals, based on the speech signals and the direction of arrival of the target speeches, to generate the enhanced speech signals, reading a probability model, and inputting the enhanced speech signals and the aliasing metric to the probability model to output target speeches.
    Type: Application
    Filed: February 23, 2017
    Publication date: September 28, 2017
    Inventors: Takashi Fukuda, Osamu Ichikawa
  • Publication number: 20170243113
    Abstract: A method for learning a neural network having a plurality of filters for extracting local features performed by a computing device is disclosed. The computing device calculates a plurality of projection parameter sets by analyzing one or more training data. The plurality of the projection parameter sets define a projection of each training data into a new space and each projection parameter set has a same size as the filters in the neural network. At least part of the plurality of the projection parameter sets is set as initial parameters of at least part of the plurality of the filters in the neural network for training.
    Type: Application
    Filed: February 24, 2016
    Publication date: August 24, 2017
    Inventors: Takashi Fukuda, Osamu Ichikawa
  • Patent number: 9734821
    Abstract: A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence.
    Type: Grant
    Filed: June 30, 2015
    Date of Patent: August 15, 2017
    Assignee: International Business Machines Corporation
    Inventors: Takashi Fukuda, Osamu Ichikawa, Futoshi Iwama
  • Patent number: 9640197
    Abstract: Methods and systems are provided for separating a target speech from a plurality of other speeches having different directions of arrival. One of the methods includes obtaining speech signals from speech input devices disposed apart in predetermined distances from one another, calculating a direction of arrival of target speeches and directions of arrival of other speeches other than the target speeches for each of at least one pair of speech input devices, calculating an aliasing metric, wherein the aliasing metric indicates which frequency band of speeches is susceptible to spatial aliasing, enhancing speech signals arrived from the direction of arrival of the target speech signals, based on the speech signals and the direction of arrival of the target speeches, to generate the enhanced speech signals, reading a probability model, and inputting the enhanced speech signals and the aliasing metric to the probability model to output target speeches.
    Type: Grant
    Filed: March 22, 2016
    Date of Patent: May 2, 2017
    Assignee: International Business Machines Corporation
    Inventors: Takashi Fukuda, Osamu Ichikawa
  • Publication number: 20170052758
    Abstract: A method, a system, and a computer program product detect a clipping event in audio signals. The method includes digitalizing audio signals having limited frequency bands, at a sampling frequency which is greater than two times as large as the maximum frequency component of the audio signal; and detecting a clipping event of the audio signals, based on magnitudes of spectrum in a bandwidth which is greater than or equal to the limited frequency band. The sampling frequency may be greater than or equal to three times as large as the maximum frequency component of the audio signal. The detection of a clipping event may include determining, for each frame, whether or not a sum or average of the magnitudes of spectrum at the bandwidth which is greater than or equal to the limited frequency band is larger than a predetermined threshold.
    Type: Application
    Filed: August 18, 2015
    Publication date: February 23, 2017
    Inventors: Takashi Fukuda, Osamu Ichikawa
  • Publication number: 20170004823
    Abstract: A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence.
    Type: Application
    Filed: June 30, 2015
    Publication date: January 5, 2017
    Inventors: Takashi Fukuda, Osamu Ichikawa, Futoshi Iwama
  • Patent number: 9238436
    Abstract: A joint portion is provided to an upper case of a mirror unit tilting mechanism so as to protrude therefrom. A fitting portion is formed on a pivot plate side so as to correspond to the joint portion. The joint portion is formed into a hollow hemisphere shape. A support shaft is provided upright at a center portion of an outer wall, and a spherical portion is formed in the outer wall. The fitting portion has a hollow hemisphere dome shape and includes a spherical side wall portion and a ceiling portion which has a wave shape in cross section. The fitting portion is lightly press-fitted to the joint portion through one-touch insertion operation. The fitting portion and the joint portion are swingably connected to each other under a state in which curved surfaces of the spherical portion and the side wall portion are held in contact with each other.
    Type: Grant
    Filed: September 29, 2010
    Date of Patent: January 19, 2016
    Assignee: MITSUBA CORPORATION
    Inventors: Masaru Chino, Osamu Ichikawa, Yukinori Suto, Yoshitaka Kaneko