Patents by Inventor Osamu Ichikawa

Osamu Ichikawa has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

PRE-TRAINING OF NEURAL NETWORK BY PARAMETER DECOMPOSITION

Publication number: 20190220747

Abstract: A technique for training a neural network including an input layer, one or more hidden layers and an output layer, in which the trained neural network can be used to perform a task such as speech recognition. In the technique, a base of the neural network having at least a pre-trained hidden layer is prepared. A parameter set associated with one pre-trained hidden layer in the neural network is decomposed into a plurality of new parameter sets. The number of hidden layers in the neural network is increased by using the plurality of the new parameter sets. Pre-training for the neural network is performed.

Type: Application

Filed: April 9, 2019

Publication date: July 18, 2019

Inventors: Takashi Fukuda, Osamu Ichikawa
Detection of clipping event in audio signals

Patent number: 10346125

Abstract: A method, a system, and a computer program product detect a clipping event in audio signals. The method includes digitalizing audio signals having limited frequency bands, at a sampling frequency which is greater than two times as large as the maximum frequency component of the audio signal; and detecting a clipping event of the audio signals, based on magnitudes of spectrum in a bandwidth which is greater than or equal to the limited frequency band. The sampling frequency may be greater than or equal to three times as large as the maximum frequency component of the audio signal. The detection of a clipping event may include determining, for each frame, whether or not a sum or average of the magnitudes of spectrum at the bandwidth which is greater than or equal to the limited frequency band is larger than a predetermined threshold.

Type: Grant

Filed: August 18, 2015

Date of Patent: July 9, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Takashi Fukuda, Osamu Ichikawa
ACOUSTIC CHANGE DETECTION FOR ROBUST AUTOMATIC SPEECH RECOGNITION

Publication number: 20190206394

Abstract: Acoustic change is detected by a method including preparing a first Gaussian Mixture Model (GMM) trained with first audio data of first speech sound from a speaker at a first distance from an audio interface and a second GMM generated from the first GMM using second audio data of second speech sound from the speaker at a second distance from the audio interface; calculating a first output of the first GMM and a second output of the second GMM by inputting obtained third audio data into the first GMM and the second GMM; and transmitting a notification in response to determining at least that a difference between the first output and the second output exceeds a threshold. Each Gaussian distribution of the second GMM has a mean obtained by shifting a mean of a corresponding Gaussian distribution of the first GMM by a common channel bias.

Type: Application

Filed: January 3, 2018

Publication date: July 4, 2019

Inventors: Osamu Ichikawa, Gakuto Kurata, Takashi Fukuda
PROCESSING OF SPEECH SIGNAL

Publication number: 20190080684

Abstract: A computer-implemented method for processing a speech signal, includes: identifying speech segments in an input speech signal; calculating an upper variance and a lower variance, the upper variance being a variance of upper spectra larger than a criteria among speech spectra corresponding to frames in the speech segments, the lower variance being a variance of lower spectra smaller than a criteria among the speech spectra corresponding to the frames in the speech segments; determining whether the input speech signal is a special input speech signal using a difference between the upper variance and the lower variance; and performing speech recognition of the input speech signal which has been determined to be the special input speech signal, using a special acoustic model for the special input speech signal.

Type: Application

Filed: September 14, 2017

Publication date: March 14, 2019

Inventors: Osamu Ichikawa, Takashi Fukuda, Gakuto Kurata, Bhuvana Ramabhadran
Method, apparatus, and program for generating training speech data for target domain

Patent number: 10217456

Abstract: A method and system for generating training data for a target domain using speech data of a source domain. The training data generation method including: reading out a Gaussian mixture model (GMM) of a target domain trained with a clean speech data set of the target domain; mapping, by referring to the GMM of the target domain, a set of source domain speech data received as an input to the set of target domain speech data on a basis of a channel characteristic of the target domain speech data; and adding a noise of the target domain to the mapped set of source domain speech data to output a set of pseudo target domain speech data.

Type: Grant

Filed: April 14, 2014

Date of Patent: February 26, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Osamu Ichikawa, Steven J Rennie
PRE-TRAINING OF NEURAL NETWORK BY PARAMETER DECOMPOSITION

Publication number: 20190012594

Abstract: A technique for training a neural network including an input layer, one or more hidden layers and an output layer, in which the trained neural network can be used to perform a task such as speech recognition. In the technique, a base of the neural network having at least a pre-trained hidden layer is prepared. A parameter set associated with one pre-trained hidden layer in the neural network is decomposed into a plurality of new parameter sets. The number of hidden layers in the neural network is increased by using the plurality of the new parameter sets. Pre-training for the neural network is performed.

Type: Application

Filed: July 5, 2017

Publication date: January 10, 2019

Inventors: Takashi Fukuda, Osamu Ichikawa
GENERATION OF VOICE DATA AS DATA AUGMENTATION FOR ACOUSTIC MODEL TRAINING

Publication number: 20180350347

Abstract: A method, computer system, and a computer program product for generating a plurality of voice data having a particular speaking style is provided. The present invention may include preparing a plurality of original voice data corresponding to at least one word or at least one phrase is prepared. The present invention may also include attenuating a low frequency component and a high frequency component in the prepared plurality of original voice data. The present invention may then include reducing power at a beginning and an end of the prepared plurality of original voice data. The present invention may further include storing a plurality of resultant voice data obtained after the attenuating and the reducing.

Type: Application

Filed: May 31, 2017

Publication date: December 6, 2018

Inventors: Takashi Fukuda, Osamu Ichikawa, Gakuto Kurata, Masayuki Suzuki
GENERATION OF VOICE DATA AS DATA AUGMENTATION FOR ACOUSTIC MODEL TRAINING

Publication number: 20180350348

Abstract: A method, computer system, and a computer program product for generating a plurality of voice data having a particular speaking style is provided. The present invention may include preparing a plurality of original voice data corresponding to at least one word or at least one phrase is prepared. The present invention may also include attenuating a low frequency component and a high frequency component in the prepared plurality of original voice data. The present invention may then include reducing power at a beginning and an end of the prepared plurality of original voice data. The present invention may further include storing a plurality of resultant voice data obtained after the attenuating and the reducing.

Type: Application

Filed: December 28, 2017

Publication date: December 6, 2018

Inventors: Takashi Fukuda, Osamu Ichikawa, Gakuto Kurata, Masayuki Suzuki
SOUND IDENTIFICATION UTILIZING PERIODIC INDICATIONS

Publication number: 20180277104

Abstract: A computer-implemented method is provided. The computer-implemented method is performed by a speech recognition system having at least a processor. The method includes estimating sound identification information from a neural network having periodic indications and components of a frequency spectrum of an audio signal data inputted thereto. The method further includes performing a speech recognition operation on the audio signal data to decode the audio signal data into a textual representation based on the estimated sound identification information. The neural network includes a plurality of fully-connected network layers having a first layer that includes a plurality of first nodes and a plurality of second nodes. The method further comprises training the neural network by initially isolating the periodic indications from the components of the frequency spectrum in the first layer by setting weights between the first nodes and a plurality of input nodes corresponding to the periodic indications to 0.

Type: Application

Filed: May 30, 2018

Publication date: September 27, 2018

Inventors: Takashi Fukuda, Osamu Ichikawa, Bhuvana Ramabhadran
SOUND IDENTIFICATION UTILIZING PERIODIC INDICATIONS

Publication number: 20180247641

Abstract: A computer-implemented method and an apparatus are provided. The method includes obtaining, by a processor, a frequency spectrum of an audio signal data. The method further includes extracting, by the processor, periodic indications from the frequency spectrum. The method also includes inputting, by the processor, the periodic indications and components of the frequency spectrum into a neural network. The method additionally includes estimating, by the processor, sound identification information from the neural network.

Type: Application

Filed: February 24, 2017

Publication date: August 30, 2018

Inventors: Takashi Fukuda, Osamu Ichikawa, Bhuvana Ramabhadran
Sound identification utilizing periodic indications

Patent number: 10062378

Abstract: A computer-implemented method and an apparatus are provided. The method includes obtaining, by a processor, a frequency spectrum of an audio signal data. The method further includes extracting, by the processor, periodic indications from the frequency spectrum. The method also includes inputting, by the processor, the periodic indications and components of the frequency spectrum into a neural network. The method additionally includes estimating, by the processor, sound identification information from the neural network.

Type: Grant

Filed: February 24, 2017

Date of Patent: August 28, 2018

Assignee: International Business Machines Corporation

Inventors: Takashi Fukuda, Osamu Ichikawa, Bhuvana Ramabhadran
Extraction of target speeches

Patent number: 9818428

Abstract: Methods and systems are provided for separating a target speech from a plurality of other speeches having different directions of arrival. One of the methods includes obtaining speech signals from speech input devices disposed apart in predetermined distances from one another, calculating a direction of arrival of target speeches and directions of arrival of other speeches other than the target speeches for each of at least one pair of speech input devices, calculating an aliasing metric, wherein the aliasing metric indicates which frequency band of speeches is susceptible to spatial aliasing, enhancing speech signals arrived from the direction of arrival of the target speech signals, based on the speech signals and the direction of arrival of the target speeches, to generate the enhanced speech signals, reading a probability model, and inputting the enhanced speech signals and the aliasing metric to the probability model to output target speeches.

Type: Grant

Filed: February 23, 2017

Date of Patent: November 14, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Takashi Fukuda, Osamu Ichikawa
TESTING WORDS IN A PRONUNCIATION LEXICON

Publication number: 20170278509

Abstract: A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence.

Type: Application

Filed: June 13, 2017

Publication date: September 28, 2017

Inventors: Takashi Fukuda, Osamu Ichikawa, Futoshi Iwama
EXTRACTION OF TARGET SPEECHES

Publication number: 20170278524

Abstract: Methods and systems are provided for separating a target speech from a plurality of other speeches having different directions of arrival. One of the methods includes obtaining speech signals from speech input devices disposed apart in predetermined distances from one another, calculating a direction of arrival of target speeches and directions of arrival of other speeches other than the target speeches for each of at least one pair of speech input devices, calculating an aliasing metric, wherein the aliasing metric indicates which frequency band of speeches is susceptible to spatial aliasing, enhancing speech signals arrived from the direction of arrival of the target speech signals, based on the speech signals and the direction of arrival of the target speeches, to generate the enhanced speech signals, reading a probability model, and inputting the enhanced speech signals and the aliasing metric to the probability model to output target speeches.

Type: Application

Filed: February 23, 2017

Publication date: September 28, 2017

Inventors: Takashi Fukuda, Osamu Ichikawa
LEARNING OF NEURAL NETWORK

Publication number: 20170243113

Abstract: A method for learning a neural network having a plurality of filters for extracting local features performed by a computing device is disclosed. The computing device calculates a plurality of projection parameter sets by analyzing one or more training data. The plurality of the projection parameter sets define a projection of each training data into a new space and each projection parameter set has a same size as the filters in the neural network. At least part of the plurality of the projection parameter sets is set as initial parameters of at least part of the plurality of the filters in the neural network for training.

Type: Application

Filed: February 24, 2016

Publication date: August 24, 2017

Inventors: Takashi Fukuda, Osamu Ichikawa
Testing words in a pronunciation lexicon

Patent number: 9734821

Abstract: A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence.

Type: Grant

Filed: June 30, 2015

Date of Patent: August 15, 2017

Assignee: International Business Machines Corporation

Inventors: Takashi Fukuda, Osamu Ichikawa, Futoshi Iwama
Extraction of target speeches

Patent number: 9640197

Abstract: Methods and systems are provided for separating a target speech from a plurality of other speeches having different directions of arrival. One of the methods includes obtaining speech signals from speech input devices disposed apart in predetermined distances from one another, calculating a direction of arrival of target speeches and directions of arrival of other speeches other than the target speeches for each of at least one pair of speech input devices, calculating an aliasing metric, wherein the aliasing metric indicates which frequency band of speeches is susceptible to spatial aliasing, enhancing speech signals arrived from the direction of arrival of the target speech signals, based on the speech signals and the direction of arrival of the target speeches, to generate the enhanced speech signals, reading a probability model, and inputting the enhanced speech signals and the aliasing metric to the probability model to output target speeches.

Type: Grant

Filed: March 22, 2016

Date of Patent: May 2, 2017

Assignee: International Business Machines Corporation

Inventors: Takashi Fukuda, Osamu Ichikawa
DETECTION OF CLIPPING EVENT IN AUDIO SIGNALS

Publication number: 20170052758

Abstract: A method, a system, and a computer program product detect a clipping event in audio signals. The method includes digitalizing audio signals having limited frequency bands, at a sampling frequency which is greater than two times as large as the maximum frequency component of the audio signal; and detecting a clipping event of the audio signals, based on magnitudes of spectrum in a bandwidth which is greater than or equal to the limited frequency band. The sampling frequency may be greater than or equal to three times as large as the maximum frequency component of the audio signal. The detection of a clipping event may include determining, for each frame, whether or not a sum or average of the magnitudes of spectrum at the bandwidth which is greater than or equal to the limited frequency band is larger than a predetermined threshold.

Type: Application

Filed: August 18, 2015

Publication date: February 23, 2017

Inventors: Takashi Fukuda, Osamu Ichikawa
TESTING WORDS IN A PRONUNCIATION LEXICON

Publication number: 20170004823

Abstract: A method, for testing words defined in a pronunciation lexicon used in an automatic speech recognition (ASR) system, is provided. The method includes: obtaining test sentences which can be accepted by a language model used in the ASR system. The test sentences cover words defined in the pronunciation lexicon. The method further includes obtaining variations of speech data corresponding to each test sentence, and obtaining a plurality of texts by recognizing the variations of speech data, or a plurality of texts generated by recognizing the variation of speech data. The method also includes constructing a word graph, using the plurality of texts, for each test sentence, where each word in the word graph corresponds to each word defined in the pronunciation lexicon; and determining whether or not all or parts of words in a test sentence are present in a path of the word graph derived from the test sentence.

Type: Application

Filed: June 30, 2015

Publication date: January 5, 2017

Inventors: Takashi Fukuda, Osamu Ichikawa, Futoshi Iwama
Mirror for vehicle

Patent number: 9238436

Abstract: A joint portion is provided to an upper case of a mirror unit tilting mechanism so as to protrude therefrom. A fitting portion is formed on a pivot plate side so as to correspond to the joint portion. The joint portion is formed into a hollow hemisphere shape. A support shaft is provided upright at a center portion of an outer wall, and a spherical portion is formed in the outer wall. The fitting portion has a hollow hemisphere dome shape and includes a spherical side wall portion and a ceiling portion which has a wave shape in cross section. The fitting portion is lightly press-fitted to the joint portion through one-touch insertion operation. The fitting portion and the joint portion are swingably connected to each other under a state in which curved surfaces of the spherical portion and the side wall portion are held in contact with each other.

Type: Grant

Filed: September 29, 2010

Date of Patent: January 19, 2016

Assignee: MITSUBA CORPORATION

Inventors: Masaru Chino, Osamu Ichikawa, Yukinori Suto, Yoshitaka Kaneko

prev 1 2 3 4 5 6 … next