Patents by Inventor Masatsune Tamura
Masatsune Tamura has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9280967Abstract: According to one embodiment, an apparatus for supporting reading of a document includes a model storage unit, a document acquisition unit, a feature information extraction, and an utterance style estimation unit. The model storage unit is configured to store a model which has trained a correspondence relationship between first feature information and an utterance style. The first feature information is extracted from a plurality of sentences in a training document. The document acquisition unit is configured to acquire a document to be read. The feature information extraction unit is configured to extract second feature information from each sentence in the document to be read. The utterance style estimation unit is configured to compare the second feature information of a plurality of sentences in the document to be read with the model, and to estimate an utterance style of the each sentence of the document to be read.Type: GrantFiled: September 14, 2011Date of Patent: March 8, 2016Assignee: Kabushiki Kaisha ToshibaInventors: Kosei Fume, Masaru Suzuki, Masahiro Morita, Kentaro Tachibana, Kouichirou Mori, Yuji Shimizu, Takehiko Kagoshima, Masatsune Tamura, Tomohiro Yamasaki
-
Publication number: 20160012035Abstract: According to an embodiment, a device includes a table creator, an estimator, and a dictionary creator. The table creator is configured to create a table based on similarity between distributions of nodes of speech synthesis dictionaries of a specific speaker in respective first and second languages. The estimator is configured to estimate a matrix to transform the speech synthesis dictionary of the specific speaker in the first language to a speech synthesis dictionary of a target speaker in the first language, based on speech and a recorded text of the target speaker in the first language and the speech synthesis dictionary of the specific speaker in the first language. The dictionary creator is configured to create a speech synthesis dictionary of the target speaker in the second language, based on the table, the matrix, and the speech synthesis dictionary of the specific speaker in the second language.Type: ApplicationFiled: July 9, 2015Publication date: January 14, 2016Inventors: Kentaro Tachibana, Masatsune Tamura, Yamato Ohtani
-
Publication number: 20150325232Abstract: According to an embodiment, a speech synthesizer includes a source generator, a phase modulator, and a vocal tract filter unit. The source generator generates a source signal by using a fundamental frequency sequence and a pulse signal. The phase modulator modulates, with respect to the source signal generated by the source generator, a phase of the pulse signal at each pitch mark based on audio watermarking information. The vocal tract filter unit generates a speech signal by using a spectrum parameter sequence with respect to the source signal in which the phase of the pulse signal is modulated by the phase modulator.Type: ApplicationFiled: July 16, 2015Publication date: November 12, 2015Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Kentaro TACHIBANA, Takehiko KAGOSHIMA, Masatsune TAMURA, Masahiro MORITA
-
Patent number: 9135910Abstract: According to an embodiment, a speech synthesis device includes a first storage, a second storage, a first generator, a second generator, a third generator, and a fourth generator. The first storage is configured to store therein first information obtained from a target uttered voice. The second storage is configured to store therein second information obtained from an arbitrary uttered voice. The first generator is configured to generate third information by converting the second information so as to be close to a target voice quality or prosody. The second generator is configured to generate an information set including the first information and the third information. The third generator is configured to generate fourth information used to generate a synthesized speech, based on the information set. The fourth generator configured to generate the synthesized speech corresponding to input text using the fourth information.Type: GrantFiled: February 12, 2013Date of Patent: September 15, 2015Assignee: KABUSHIKI KAISHA TOSHIBAInventors: Masatsune Tamura, Masahiro Morita
-
Patent number: 9110887Abstract: According to one embodiment, a speech synthesis apparatus includes a language analyzer, statistical model storage, model selector, parameter generator, basis model storage, and filter processor. The language analyzer analyzes text data and outputs language information data that represents linguistic information of the text data. The statistical model storage stores statistical models prepared by statistically modeling acoustic information included in speech. The model selector selects a statistical model from the models based on the language information data. The parameter generator generates speech parameter sequences using the statistical model selected by the model selector. The basis model storage stores a basis model including basis vectors, each of which expresses speech information for each limited frequency range. The filter processor outputs synthetic speech by executing filter processing of the speech parameter sequences and the basis model.Type: GrantFiled: December 26, 2012Date of Patent: August 18, 2015Assignee: KABUSHIKI KAISHA TOSHIBAInventors: Yamato Ohtani, Masatsune Tamura, Masahiro Morita
-
Patent number: 9058807Abstract: According to one embodiment, a first storage unit stores n band noise signals obtained by applying n band-pass filters to a noise signal. A second storage unit stores n band pulse signals. A parameter input unit inputs a fundamental frequency, n band noise intensities, and a spectrum parameter. A extraction unit extracts for each pitch mark the n band noise signals while shifting. An amplitude control unit changes amplitudes of the extracted band noise signals and band pulse signals in accordance with the band noise intensities. A generation unit generates a mixed sound source signal by adding the n band noise signals and the n band pulse signals. A generation unit generates the mixed sound source signal generated based on the pitch mark. A vocal tract filter unit generates a speech waveform by applying a vocal tract filter using the spectrum parameter to the generated mixed sound source signal.Type: GrantFiled: March 18, 2011Date of Patent: June 16, 2015Assignee: Kabushiki Kaisha ToshibaInventors: Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima
-
Publication number: 20150081306Abstract: According to an embodiment, a prosody editing device includes an approximate contour generator, a setter, a display controller, an operation receiver, and an updater. The approximate contour generator approximates a contour representing a time series of prosody information with a parametric curve including a control point to generate an approximate contour. The setter sets, on the approximate contour, an operation point corresponding to the control point. The display controller displays, on a display device, an operation screen including the approximate contour on which the operation point is shown. The operation receiver receives an operation to move the operation point optionally selected on the operation screen. The updater calculates a position of the control point from a moving amount of the operation point and updates the approximate contour.Type: ApplicationFiled: September 2, 2014Publication date: March 19, 2015Inventors: Kouichirou MORI, Yu NASU, Masatsune TAMURA, Masahiro MORITA
-
Publication number: 20140257816Abstract: According to an embodiment, a speech synthesis dictionary modification device includes an extracting unit, a display unit, an acquiring unit, an modification unit, and an updating unit. The extracting unit extracts a synthesis information containing a feature sequence of a synthetic speech from the synthetic speech generated by using a speech synthesis dictionary containing probability distributions of speech features. The display unit displays an image prompting to modify a probability distribution contained in the speech synthesis dictionary on a basis of the synthesis information extracted by the extracting unit. The acquiring unit acquires an instruction to modify the probability distribution contained in the speech synthesis dictionary. The modification unit modifies the probability distribution contained in the speech synthesis dictionary according to the instruction.Type: ApplicationFiled: January 31, 2014Publication date: September 11, 2014Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Ryo Morinaka, Masatsune Tamura, Masahiro Morita
-
Publication number: 20130262087Abstract: According to one embodiment, a speech synthesis apparatus includes a language analyzer, statistical model storage, model selector, parameter generator, basis model storage, and filter processor. The language analyzer analyzes text data and outputs language information data that represents linguistic information of the text data. The statistical model storage stores statistical models prepared by statistically modeling acoustic information included in speech. The model selector selects a statistical model from the models based on the language information data. The parameter generator generates speech parameter sequences using the statistical model selected by the model selector. The basis model storage stores a basis model including basis vectors, each of which expresses speech information for each limited frequency range. The filter processor outputs synthetic speech by executing filter processing of the speech parameter sequences and the basis model.Type: ApplicationFiled: December 26, 2012Publication date: October 3, 2013Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Yamato OHTANI, Masatsune TAMURA, Masahiro MORITA
-
Patent number: 8438033Abstract: A voice conversion apparatus stores, in a parameter memory, target speech spectral parameters of target speech, stores, in a voice conversion rule memory, a voice conversion rule for converting voice quality of source speech into voice quality of the target speech, extracts, from an input source speech, a source speech spectral parameter of the input source speech, converts extracted source speech spectral parameter into a first conversion spectral parameter by using the voice conversion rule, selects target speech spectral parameter similar to the first conversion spectral parameter from the parameter memory, generates an aperiodic component spectral parameter representing from selected target speech spectral parameter, mixes a periodic component spectral parameter included in the first conversion spectral parameter with the aperiodic component spectral parameter, to obtain a second conversion spectral parameter, and generates a speech waveform from the second conversion spectral parameter.Type: GrantFiled: July 20, 2009Date of Patent: May 7, 2013Assignee: Kabushiki Kaisha ToshibaInventors: Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima
-
Publication number: 20120323569Abstract: According to one embodiment, a speech processing apparatus includes a histogram calculation unit, a cumulative frequency calculation unit, and a filter production unit. The histogram calculation unit is configured to calculate a first histogram from a first speech feature extracted from speech data, and to calculate a second histogram from a second speech feature different from the first speech feature. The cumulative frequency calculation unit is configured to calculate a first cumulative frequency by accumulating a frequency of the first histogram, and to calculate a second cumulative frequency by accumulating a frequency of the second histogram. The filter production unit is configured to produce a filter having a characteristic to get the second cumulative frequency near to the first cumulative frequency.Type: ApplicationFiled: March 15, 2012Publication date: December 20, 2012Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Yamato Ohtani, Masatsune Tamura, Masahiro Morita
-
Patent number: 8321208Abstract: An information extraction unit extracts spectral envelope information of L-dimension from each frame of speech data by discrete Fourier transform. The spectral envelope information is represented by L points. A basis storage unit stores N bases (L>N>1). Each basis is differently a frequency band having a maximum as a peak frequency in a spectral domain having L-dimension. A value corresponding to a frequency outside the frequency band along a frequency axis of the spectral domain is zero. Two frequency bands of which two peak frequencies are adjacent along the frequency axis partially overlap. A parameter calculation unit minimizes a distortion between the spectral envelope information and a linear combination of each basis with a coefficient for each of L points of the spectral envelope information by changing the coefficient, and sets the coefficient of each basis from which the distortion is minimized to a spectral envelope parameter of the spectral envelope information.Type: GrantFiled: December 3, 2008Date of Patent: November 27, 2012Assignee: Kabushiki Kaisha ToshibaInventors: Masatsune Tamura, Katsumi Tsuchiya, Takehiko Kagoshima
-
Publication number: 20120239390Abstract: According to one embodiment, an apparatus for supporting reading of a document includes a model storage unit, a document acquisition unit, a feature information extraction, and an utterance style estimation unit. The model storage unit is configured to store a model which has trained a correspondence relationship between first feature information and an utterance style. The first feature information is extracted from a plurality of sentences in a training document. The document acquisition unit is configured to acquire a document to be read. The feature information extraction unit is configured to extract second feature information from each sentence in the document to be read. The utterance style estimation unit is configured to compare the second feature information of a plurality of sentences in the document to be read with the model, and to estimate an utterance style of the each sentence of the document to be read.Type: ApplicationFiled: September 14, 2011Publication date: September 20, 2012Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Kosei Fume, Masaru Suzuki, Masahiro Morita, Kentaro Tachibana, Kouichirou Mori, Yuji Shimizu, Takehiko Kagoshima, Masatsune Tamura, Tomohiro Yamasaki
-
Patent number: 8175881Abstract: A phoneme sequence corresponding to a target speech is divided into a plurality of segments. A plurality of speech units for each segment is selected from a speech unit memory that stores speech units having at least one frame. The plurality of speech units has a prosodic feature accordant or similar to the target speech. A formant parameter having at least one formant frequency is generated for each frame of the plurality of speech units. A fused formant parameter of each frame is generated from formant parameters of each frame of the plurality of speech units. A fused speech unit of each segment is generated from the fused formant parameter of each frame. A synthesized speech is generated by concatenating the fused speech unit of each segment.Type: GrantFiled: August 14, 2008Date of Patent: May 8, 2012Assignee: Kabushiki Kaisha ToshibaInventors: Ryo Morinaka, Masatsune Tamura, Takehiko Kagoshima
-
Publication number: 20120053933Abstract: According to one embodiment, a first storage unit stores n band noise signals obtained by applying n band-pass filters to a noise signal. A second storage unit stores n band pulse signals. A parameter input unit inputs a fundamental frequency, n band noise intensities, and a spectrum parameter. A extraction unit extracts for each pitch mark the n band noise signals while shifting. An amplitude control unit changes amplitudes of the extracted band noise signals and band pulse signals in accordance with the band noise intensities. A generation unit generates a mixed sound source signal by adding the n band noise signals and the n band pulse signals. A generation unit generates the mixed sound source signal generated based on the pitch mark. A vocal tract filter unit generates a speech waveform by applying a vocal tract filter using the spectrum parameter to the generated mixed sound source signal.Type: ApplicationFiled: March 18, 2011Publication date: March 1, 2012Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima
-
Patent number: 8010362Abstract: A voice conversion rule and a rule selection parameter are stored. The voice conversion rule converts a spectral parameter vector of a source speaker to a spectral parameter vector of a target speaker. The rule selection parameter represents the spectral parameter vector of the source speaker. A first voice conversion rule of start time and a second voice conversion rule of end time in a speech unit of the source speaker are selected by the spectral parameter vector of the start time and the end time. An interpolation coefficient corresponding to the spectral parameter vector of each time in the speech unit is calculated by the first voice conversion rule and the second voice conversion rule. A third voice conversion rule corresponding to the spectral parameter vector of each time in the speech unit is calculated by interpolating the first voice conversion rule and the second voice conversion rule with the interpolation coefficient.Type: GrantFiled: January 22, 2008Date of Patent: August 30, 2011Assignee: Kabushiki Kaisha ToshibaInventors: Masatsune Tamura, Takehiro Kagoshima
-
Publication number: 20100049522Abstract: A voice conversion apparatus stores, in a parameter memory, target speech spectral parameters of target speech, stores, in a voice conversion rule memory, a voice conversion rule for converting voice quality of source speech into voice quality of the target speech, extracts, from an input source speech, a source speech spectral parameter of the input source speech, converts extracted source speech spectral parameter into a first conversion spectral parameter by using the voice conversion rule, selects target speech spectral parameter similar to the first conversion spectral parameter from the parameter memory, generates an aperiodic component spectral parameter representing from selected target speech spectral parameter, mixes a periodic component spectral parameter included in the first conversion spectral parameter with the aperiodic component spectral parameter, to obtain a second conversion spectral parameter, and generates a speech waveform from the second conversion spectral parameter.Type: ApplicationFiled: July 20, 2009Publication date: February 25, 2010Inventors: Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima
-
Patent number: 7630896Abstract: A speech synthesis system in a preferred embodiment includes a speech unit storage section, a phonetic environment storage section, a phonetic sequence/prosodic information input section, a plural-speech-unit selection section, a fused-speech-unit sequence generation section, and a fused-speech-unit modification/concatenation section. By fusing a plurality of selected speech units in the fused speech unit sequence generation section, a fused speech unit is generated. In the fused speech unit sequence generation section, the average power information is calculated for a plurality of selected M speech units, N speech units are fused together, and the power information of the fused speech unit is so corrected as to be equalized with the average power information of the M speech units.Type: GrantFiled: September 23, 2005Date of Patent: December 8, 2009Assignee: Kabushiki Kaisha ToshibaInventors: Masatsune Tamura, Gou Hirabayashi, Takehiko Kagoshima
-
Patent number: 7580839Abstract: A speech processing apparatus according to an embodiment of the invention includes a conversion-source-speaker speech-unit database; a voice-conversion-rule-learning-data generating means; and a voice-conversion-rule learning means, with which it makes voice conversion rules. The voice-conversion-rule-learning-data generating means includes a conversion-target-speaker speech-unit extracting means; an attribute-information generating means; a conversion-source-speaker speech-unit database; and a conversion-source-speaker speech-unit selection means.Type: GrantFiled: September 19, 2006Date of Patent: August 25, 2009Assignee: Kabushiki Kaisha ToshibaInventors: Masatsune Tamura, Takehiko Kagoshima
-
Publication number: 20090144053Abstract: An information extraction unit extracts spectral envelope information of L-dimension from each frame of speech data. The spectral envelope information does not have a spectral fine structure. A basis storage unit stores N bases (L>N>1). Each basis is differently a frequency band having a maximum as a peak frequency in a spectral domain having L-dimension. A value corresponding to a frequency outside the frequency band along a frequency axis of the spectral domain is zero. Two frequency bands of which two peak frequencies are adjacent along the frequency axis partially overlap. A parameter calculation unit minimizes a distortion between the spectral envelope information and a linear combination of each basis with a coefficient by changing the coefficient, and sets the coefficient of each basis from which the distortion is minimized to a spectral envelope parameter of the spectral envelope information.Type: ApplicationFiled: December 3, 2008Publication date: June 4, 2009Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Masatsune TAMURA, Katsumi TSUCHIYA, Takehiko KAGOSHIMA