Patents by Inventor Osamu Ichikawa
Osamu Ichikawa has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20190220747Abstract: A technique for training a neural network including an input layer, one or more hidden layers and an output layer, in which the trained neural network can be used to perform a task such as speech recognition. In the technique, a base of the neural network having at least a pre-trained hidden layer is prepared. A parameter set associated with one pre-trained hidden layer in the neural network is decomposed into a plurality of new parameter sets. The number of hidden layers in the neural network is increased by using the plurality of the new parameter sets. Pre-training for the neural network is performed.Type: ApplicationFiled: April 9, 2019Publication date: July 18, 2019Inventors: Takashi Fukuda, Osamu Ichikawa
-
Patent number: 10346125Abstract: A method, a system, and a computer program product detect a clipping event in audio signals. The method includes digitalizing audio signals having limited frequency bands, at a sampling frequency which is greater than two times as large as the maximum frequency component of the audio signal; and detecting a clipping event of the audio signals, based on magnitudes of spectrum in a bandwidth which is greater than or equal to the limited frequency band. The sampling frequency may be greater than or equal to three times as large as the maximum frequency component of the audio signal. The detection of a clipping event may include determining, for each frame, whether or not a sum or average of the magnitudes of spectrum at the bandwidth which is greater than or equal to the limited frequency band is larger than a predetermined threshold.Type: GrantFiled: August 18, 2015Date of Patent: July 9, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Takashi Fukuda, Osamu Ichikawa
-
Publication number: 20190206394Abstract: Acoustic change is detected by a method including preparing a first Gaussian Mixture Model (GMM) trained with first audio data of first speech sound from a speaker at a first distance from an audio interface and a second GMM generated from the first GMM using second audio data of second speech sound from the speaker at a second distance from the audio interface; calculating a first output of the first GMM and a second output of the second GMM by inputting obtained third audio data into the first GMM and the second GMM; and transmitting a notification in response to determining at least that a difference between the first output and the second output exceeds a threshold. Each Gaussian distribution of the second GMM has a mean obtained by shifting a mean of a corresponding Gaussian distribution of the first GMM by a common channel bias.Type: ApplicationFiled: January 3, 2018Publication date: July 4, 2019Inventors: Osamu Ichikawa, Gakuto Kurata, Takashi Fukuda
-
Publication number: 20190080684Abstract: A computer-implemented method for processing a speech signal, includes: identifying speech segments in an input speech signal; calculating an upper variance and a lower variance, the upper variance being a variance of upper spectra larger than a criteria among speech spectra corresponding to frames in the speech segments, the lower variance being a variance of lower spectra smaller than a criteria among the speech spectra corresponding to the frames in the speech segments; determining whether the input speech signal is a special input speech signal using a difference between the upper variance and the lower variance; and performing speech recognition of the input speech signal which has been determined to be the special input speech signal, using a special acoustic model for the special input speech signal.Type: ApplicationFiled: September 14, 2017Publication date: March 14, 2019Inventors: Osamu Ichikawa, Takashi Fukuda, Gakuto Kurata, Bhuvana Ramabhadran
-
Patent number: 10217456Abstract: A method and system for generating training data for a target domain using speech data of a source domain. The training data generation method including: reading out a Gaussian mixture model (GMM) of a target domain trained with a clean speech data set of the target domain; mapping, by referring to the GMM of the target domain, a set of source domain speech data received as an input to the set of target domain speech data on a basis of a channel characteristic of the target domain speech data; and adding a noise of the target domain to the mapped set of source domain speech data to output a set of pseudo target domain speech data.Type: GrantFiled: April 14, 2014Date of Patent: February 26, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Osamu Ichikawa, Steven J Rennie
-
Publication number: 20190012594Abstract: A technique for training a neural network including an input layer, one or more hidden layers and an output layer, in which the trained neural network can be used to perform a task such as speech recognition. In the technique, a base of the neural network having at least a pre-trained hidden layer is prepared. A parameter set associated with one pre-trained hidden layer in the neural network is decomposed into a plurality of new parameter sets. The number of hidden layers in the neural network is increased by using the plurality of the new parameter sets. Pre-training for the neural network is performed.Type: ApplicationFiled: July 5, 2017Publication date: January 10, 2019Inventors: Takashi Fukuda, Osamu Ichikawa
-
Publication number: 20180350347Abstract: A method, computer system, and a computer program product for generating a plurality of voice data having a particular speaking style is provided. The present invention may include preparing a plurality of original voice data corresponding to at least one word or at least one phrase is prepared. The present invention may also include attenuating a low frequency component and a high frequency component in the prepared plurality of original voice data. The present invention may then include reducing power at a beginning and an end of the prepared plurality of original voice data. The present invention may further include storing a plurality of resultant voice data obtained after the attenuating and the reducing.Type: ApplicationFiled: May 31, 2017Publication date: December 6, 2018Inventors: Takashi Fukuda, Osamu Ichikawa, Gakuto Kurata, Masayuki Suzuki
-
Publication number: 20180350348Abstract: A method, computer system, and a computer program product for generating a plurality of voice data having a particular speaking style is provided. The present invention may include preparing a plurality of original voice data corresponding to at least one word or at least one phrase is prepared. The present invention may also include attenuating a low frequency component and a high frequency component in the prepared plurality of original voice data. The present invention may then include reducing power at a beginning and an end of the prepared plurality of original voice data. The present invention may further include storing a plurality of resultant voice data obtained after the attenuating and the reducing.Type: ApplicationFiled: December 28, 2017Publication date: December 6, 2018Inventors: Takashi Fukuda, Osamu Ichikawa, Gakuto Kurata, Masayuki Suzuki
-
Publication number: 20180277104Abstract: A computer-implemented method is provided. The computer-implemented method is performed by a speech recognition system having at least a processor. The method includes estimating sound identification information from a neural network having periodic indications and components of a frequency spectrum of an audio signal data inputted thereto. The method further includes performing a speech recognition operation on the audio signal data to decode the audio signal data into a textual representation based on the estimated sound identification information. The neural network includes a plurality of fully-connected network layers having a first layer that includes a plurality of first nodes and a plurality of second nodes. The method further comprises training the neural network by initially isolating the periodic indications from the components of the frequency spectrum in the first layer by setting weights between the first nodes and a plurality of input nodes corresponding to the periodic indications to 0.Type: ApplicationFiled: May 30, 2018Publication date: September 27, 2018Inventors: Takashi Fukuda, Osamu Ichikawa, Bhuvana Ramabhadran
-
Publication number: 20180247641Abstract: A computer-implemented method and an apparatus are provided. The method includes obtaining, by a processor, a frequency spectrum of an audio signal data. The method further includes extracting, by the processor, periodic indications from the frequency spectrum. The method also includes inputting, by the processor, the periodic indications and components of the frequency spectrum into a neural network. The method additionally includes estimating, by the processor, sound identification information from the neural network.Type: ApplicationFiled: February 24, 2017Publication date: August 30, 2018Inventors: Takashi Fukuda, Osamu Ichikawa, Bhuvana Ramabhadran
-
Patent number: 10062378Abstract: A computer-implemented method and an apparatus are provided. The method includes obtaining, by a processor, a frequency spectrum of an audio signal data. The method further includes extracting, by the processor, periodic indications from the frequency spectrum. The method also includes inputting, by the processor, the periodic indications and components of the frequency spectrum into a neural network. The method additionally includes estimating, by the processor, sound identification information from the neural network.Type: GrantFiled: February 24, 2017Date of Patent: August 28, 2018Assignee: International Business Machines CorporationInventors: Takashi Fukuda, Osamu Ichikawa, Bhuvana Ramabhadran
-
Patent number: 9818428Abstract: Methods and systems are provided for separating a target speech from a plurality of other speeches having different directions of arrival. One of the methods includes obtaining speech signals from speech input devices disposed apart in predetermined distances from one another, calculating a direction of arrival of target speeches and directions of arrival of other speeches other than the target speeches for each of at least one pair of speech input devices, calculating an aliasing metric, wherein the aliasing metric indicates which frequency band of speeches is susceptible to spatial aliasing, enhancing speech signals arrived from the direction of arrival of the target speech signals, based on the speech signals and the direction of arrival of the target speeches, to generate the enhanced speech signals, reading a probability model, and inputting the enhanced speech signals and the aliasing metric to the probability model to output target speeches.Type: GrantFiled: February 23, 2017Date of Patent: November 14, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Takashi Fukuda, Osamu Ichikawa
-
Publication number: 20170278509Abstract: A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence.Type: ApplicationFiled: June 13, 2017Publication date: September 28, 2017Inventors: Takashi Fukuda, Osamu Ichikawa, Futoshi Iwama
-
Publication number: 20170278524Abstract: Methods and systems are provided for separating a target speech from a plurality of other speeches having different directions of arrival. One of the methods includes obtaining speech signals from speech input devices disposed apart in predetermined distances from one another, calculating a direction of arrival of target speeches and directions of arrival of other speeches other than the target speeches for each of at least one pair of speech input devices, calculating an aliasing metric, wherein the aliasing metric indicates which frequency band of speeches is susceptible to spatial aliasing, enhancing speech signals arrived from the direction of arrival of the target speech signals, based on the speech signals and the direction of arrival of the target speeches, to generate the enhanced speech signals, reading a probability model, and inputting the enhanced speech signals and the aliasing metric to the probability model to output target speeches.Type: ApplicationFiled: February 23, 2017Publication date: September 28, 2017Inventors: Takashi Fukuda, Osamu Ichikawa
-
Publication number: 20170243113Abstract: A method for learning a neural network having a plurality of filters for extracting local features performed by a computing device is disclosed. The computing device calculates a plurality of projection parameter sets by analyzing one or more training data. The plurality of the projection parameter sets define a projection of each training data into a new space and each projection parameter set has a same size as the filters in the neural network. At least part of the plurality of the projection parameter sets is set as initial parameters of at least part of the plurality of the filters in the neural network for training.Type: ApplicationFiled: February 24, 2016Publication date: August 24, 2017Inventors: Takashi Fukuda, Osamu Ichikawa
-
Patent number: 9734821Abstract: A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence.Type: GrantFiled: June 30, 2015Date of Patent: August 15, 2017Assignee: International Business Machines CorporationInventors: Takashi Fukuda, Osamu Ichikawa, Futoshi Iwama
-
Patent number: 9640197Abstract: Methods and systems are provided for separating a target speech from a plurality of other speeches having different directions of arrival. One of the methods includes obtaining speech signals from speech input devices disposed apart in predetermined distances from one another, calculating a direction of arrival of target speeches and directions of arrival of other speeches other than the target speeches for each of at least one pair of speech input devices, calculating an aliasing metric, wherein the aliasing metric indicates which frequency band of speeches is susceptible to spatial aliasing, enhancing speech signals arrived from the direction of arrival of the target speech signals, based on the speech signals and the direction of arrival of the target speeches, to generate the enhanced speech signals, reading a probability model, and inputting the enhanced speech signals and the aliasing metric to the probability model to output target speeches.Type: GrantFiled: March 22, 2016Date of Patent: May 2, 2017Assignee: International Business Machines CorporationInventors: Takashi Fukuda, Osamu Ichikawa
-
Publication number: 20170052758Abstract: A method, a system, and a computer program product detect a clipping event in audio signals. The method includes digitalizing audio signals having limited frequency bands, at a sampling frequency which is greater than two times as large as the maximum frequency component of the audio signal; and detecting a clipping event of the audio signals, based on magnitudes of spectrum in a bandwidth which is greater than or equal to the limited frequency band. The sampling frequency may be greater than or equal to three times as large as the maximum frequency component of the audio signal. The detection of a clipping event may include determining, for each frame, whether or not a sum or average of the magnitudes of spectrum at the bandwidth which is greater than or equal to the limited frequency band is larger than a predetermined threshold.Type: ApplicationFiled: August 18, 2015Publication date: February 23, 2017Inventors: Takashi Fukuda, Osamu Ichikawa
-
Publication number: 20170004823Abstract: A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence.Type: ApplicationFiled: June 30, 2015Publication date: January 5, 2017Inventors: Takashi Fukuda, Osamu Ichikawa, Futoshi Iwama
-
Patent number: 9238436Abstract: A joint portion is provided to an upper case of a mirror unit tilting mechanism so as to protrude therefrom. A fitting portion is formed on a pivot plate side so as to correspond to the joint portion. The joint portion is formed into a hollow hemisphere shape. A support shaft is provided upright at a center portion of an outer wall, and a spherical portion is formed in the outer wall. The fitting portion has a hollow hemisphere dome shape and includes a spherical side wall portion and a ceiling portion which has a wave shape in cross section. The fitting portion is lightly press-fitted to the joint portion through one-touch insertion operation. The fitting portion and the joint portion are swingably connected to each other under a state in which curved surfaces of the spherical portion and the side wall portion are held in contact with each other.Type: GrantFiled: September 29, 2010Date of Patent: January 19, 2016Assignee: MITSUBA CORPORATIONInventors: Masaru Chino, Osamu Ichikawa, Yukinori Suto, Yoshitaka Kaneko