Patents by Inventor Masami Akamine

Masami Akamine has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11417319
    Abstract: According to one embodiment, a dialogue system includes a setting apparatus and a processing apparatus. The setting apparatus sets in advance a plurality of words that are in impossible combination relationships to each other. The processing apparatus acquires speech of a user, and when a speech recognition result of an object included in the speech includes a word combination included in the plurality of words that are in impossible combination relationships to each other, output a notification to the user that processing of the object cannot be carried out.
    Type: Grant
    Filed: February 20, 2018
    Date of Patent: August 16, 2022
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Takami Yoshida, Kenji Iwata, Yuka Kobayashi, Masami Akamine
  • Patent number: 11270683
    Abstract: According to one embodiment, an interactive system includes following units. The knowledge reference unit refers to a question-answering knowledge based on a result of analyzing an input sentence to acquire a candidate for an answer to the input sentence. The unknown keyword detection unit detects, from the input sentence, an unknown keyword. The related keyword estimation unit acquires, in response to the detection of the unknown keyword, one or more candidates for a related keyword having a meaning close to the unknown keyword from predetermined keywords. The response generation unit generates a response to the input sentence based on the one or more candidates for the related keyword when the unknown keyword is detected.
    Type: Grant
    Filed: August 30, 2019
    Date of Patent: March 8, 2022
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Kenji Iwata, Hiroshi Fujimura, Yuka Kobayashi, Takami Yoshida, Masami Akamine
  • Patent number: 10847151
    Abstract: According to an embodiment, a dialogue system includes a satisfaction estimator, a dialogue state estimator, and a behavior determiner. The satisfaction estimator estimates a satisfaction of a user based on a speech input from the user. The dialogue state estimator estimates a dialogue state with the user based on the speech input from the user and the estimated satisfaction of the user. The behavior determiner determines a behavior towards the user based on the estimated dialogue state.
    Type: Grant
    Filed: February 20, 2018
    Date of Patent: November 24, 2020
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masami Akamine, Takami Yoshida
  • Publication number: 20200143792
    Abstract: According to one embodiment, an interactive system includes following units. The knowledge reference unit refers to a question-answering knowledge based on a result of analyzing an input sentence to acquire a candidate for an answer to the input sentence. The unknown keyword detection unit detects, from the input sentence, an unknown keyword. The related keyword estimation unit acquires, in response to the detection of the unknown keyword, one or more candidates for a related keyword having a meaning close to the unknown keyword from predetermined keywords. The response generation unit generates a response to the input sentence based on the one or more candidates for the related keyword when the unknown keyword is detected.
    Type: Application
    Filed: August 30, 2019
    Publication date: May 7, 2020
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Kenji IWATA, Hiroshi Fujimura, Yuka Kobayashi, Takami Yoshida, Masami Akamine
  • Publication number: 20190139537
    Abstract: According to an embodiment, a dialogue system includes a satisfaction estimator, a dialogue state estimator, and a behavior determiner. The satisfaction estimator estimates a satisfaction of a user based on a speech input from the user. The dialogue state estimator estimates a dialogue state with the user based on the speech input from the user and the estimated satisfaction of the user. The behavior determiner determines a behavior towards the user based on the estimated dialogue state.
    Type: Application
    Filed: February 20, 2018
    Publication date: May 9, 2019
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masami AKAMINE, Takami YOSHIDA
  • Publication number: 20190088252
    Abstract: According to one embodiment, a dialogue system includes a setting apparatus and a processing apparatus. The setting apparatus sets in advance a plurality of words that are in impossible combination relationships to each other. The processing apparatus acquires speech of a user, and when a speech recognition result of an object included in the speech includes a word combination included in the plurality of words that are in impossible combination relationships to each other, output a notification to the user that processing of the object cannot be carried out.
    Type: Application
    Filed: February 20, 2018
    Publication date: March 21, 2019
    Inventors: Takami Yoshida, Kenji Iwata, Yuka Kobayashi, Masami Akamine
  • Patent number: 9454963
    Abstract: A text-to-speech method for simulating a plurality of different voice characteristics includes dividing inputted text into a sequence of acoustic units; selecting voice characteristics for the inputted text; converting the sequence of acoustic units to a sequence of speech vectors using an acoustic model having a plurality of model parameters provided in clusters each having at least one sub-cluster and describing probability distributions which relate an acoustic unit to a speech vector; and outputting the sequence of speech vectors as audio with the selected voice characteristics. A parameter of a predetermined type of each probability distribution is expressed as a weighted sum of parameters of the same type using voice characteristic dependent weighting. In converting the sequence of acoustic units to a sequence of speech vectors, the voice characteristic dependent weights for the selected voice characteristics are retrieved for each cluster such that there is one weight per sub-cluster.
    Type: Grant
    Filed: March 13, 2013
    Date of Patent: September 27, 2016
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Javier Latorre-Martinez, Vincent Ping Leung Wan, Kean Kheong Chin, Mark John Francis Gales, Katherine Mary Knill, Masami Akamine, Byung Ha Chung
  • Patent number: 9269347
    Abstract: A text-to-speech method configured to output speech having a selected speaker voice and a selected speaker attribute, including: inputting text; dividing the inputted text into a sequence of acoustic units; selecting a speaker for the inputted text; selecting a speaker attribute for the inputted text; converting the sequence of acoustic units to a sequence of speech vectors using an acoustic model; and outputting the sequence of speech vectors as audio with the selected speaker voice and a selected speaker attribute. The acoustic model includes a first set of parameters relating to speaker voice and a second set of parameters relating to speaker attributes, which parameters do not overlap. The selecting a speaker voice includes selecting parameters from the first set of parameters and the selecting the speaker attribute includes selecting the parameters from the second set of parameters.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: February 23, 2016
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Javier Latorre-Martinez, Vincent Ping Leung Wan, Kean Kheong Chin, Mark John Francis Gales, Katherine Mary Knill, Masami Akamine
  • Publication number: 20130262109
    Abstract: A text-to-speech method for simulating a plurality of different voice characteristics includes dividing inputted text into a sequence of acoustic units; selecting voice characteristics for the inputted text; converting the sequence of acoustic units to a sequence of speech vectors using an acoustic model having a plurality of model parameters provided in clusters each having at least one sub-cluster and describing probability distributions which relate an acoustic unit to a speech vector; and outputting the sequence of speech vectors as audio with the selected voice characteristics. A parameter of a predetermined type of each probability distribution is expressed as a weighted sum of parameters of the same type using voice characteristic dependent weighting. In converting the sequence of acoustic units to a sequence of speech vectors, the voice characteristic dependent weights for the selected voice characteristics are retrieved for each cluster such that there is one weight per sub-cluster.
    Type: Application
    Filed: March 13, 2013
    Publication date: October 3, 2013
    Inventors: Javier Latorre-Martinez, Vincent Ping Leung Wan, Kean Kheong Chin, Mark John Francis Gales, Katherine Mary Knill, Masami Akamine, Byung Ha Chung
  • Publication number: 20130262119
    Abstract: A text-to-speech method configured to output speech having a selected speaker voice and a selected speaker attribute, including: inputting text; dividing the inputted text into a sequence of acoustic units; selecting a speaker for the inputted text; selecting a speaker attribute for the inputted text; converting the sequence of acoustic units to a sequence of speech vectors using an acoustic model; and outputting the sequence of speech vectors as audio with the selected speaker voice and a selected speaker attribute. The acoustic model includes a first set of parameters relating to speaker voice and a second set of parameters relating to speaker attributes, which parameters do not overlap. The selecting a speaker voice includes selecting parameters from the first set of parameters and the selecting the speaker attribute includes selecting the parameters from the second set of parameters.
    Type: Application
    Filed: March 15, 2013
    Publication date: October 3, 2013
    Applicant: Kabushiki Kaisha Toshiba
    Inventors: Javier LATORRE-MARTINEZ, Vincent Ping Leung Wan, Kean Kheong Chin, Mark John Francis Gales, Katherine Mary Knill, Masami Akamine
  • Patent number: 8494856
    Abstract: According to one embodiment, a speech synthesizer includes an analyzer, a first estimator, a selector, a generator, a second estimator, and a synthesizer. The analyzer analyzes text and extracts a linguistic feature. The first estimator selects a first prosody model adapted to the linguistic feature and estimates prosody information that maximizes a first likelihood representing probability of the selected first prosody model. The selector selects speech units that minimize a cost function determined in accordance with the prosody information. The generator generates a second prosody model that is a model of the prosody information of the speech units. The second estimator estimates prosody information that maximizes a third likelihood calculated on the basis of the first likelihood and a second likelihood representing probability of the second prosody model. The synthesizer generates synthetic speech by concatenating the speech units on the basis of the prosody information estimated by the second estimator.
    Type: Grant
    Filed: October 12, 2011
    Date of Patent: July 23, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Javier Latorre, Masami Akamine
  • Patent number: 8407053
    Abstract: A speech processing apparatus, including a segmenting unit to divide a fundamental frequency signal of a speech signal corresponding to an input text into pitch segments, based on an alignment between samples of at least one given linguistic level included in the input text and the speech signal. Character strings of the input text are divided into the samples based on each linguistic level. A parameterizing unit generates a parametric representation of the pitch segments using a predetermined invertible operator and generates a group of first parameters in correspondence with each linguistic level. A descriptor generating unit generates, for each linguistic level, a descriptor that includes a set of features describing each sample in the input text and a model learning unit classifies the first parameters of each linguistic level of all speech signals in a memory into clusters based on the descriptor corresponding to the linguistic level.
    Type: Grant
    Filed: March 17, 2009
    Date of Patent: March 26, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Javier Latorre, Masami Akamine
  • Patent number: 8380500
    Abstract: A spectrum calculating unit calculates, for each of the frames, a spectrum by performing a frequency analysis on an acoustic signal. An estimating unit estimates a noise spectrum. An energy calculating unit calculates an energy characteristic amount. An entropy calculating unit calculates a normalized spectral entropy value. A generating unit generates a characteristic vector based on the energy characteristic amounts and the normalized spectral entropy values that have been calculated for a plurality of frames. A likelihood calculating unit calculates a speech likelihood value of a target frame that corresponds to the characteristic vector. In a case where the speech likelihood value is larger than a threshold value, a judging unit judges that the target frame is a speech frame.
    Type: Grant
    Filed: September 22, 2008
    Date of Patent: February 19, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Koichi Yamamoto, Masami Akamine
  • Patent number: 8370139
    Abstract: A noise-environment storing unit stores therein a compensation vector for compensating a feature vector of a speech. A feature-vector extracting unit extracts the feature vector of the speech in each of a plurality of frames. A noise-environment-series estimating unit estimates a noise-environment series based on a feature-vector series and a degree of similarity. A calculating unit obtains a compensation vector corresponding to each noise environment in estimated noise-environment series based on the compensation vector present in the noise-environment storing unit. A compensating unit compensates the extracted feature vector of the speech based on obtained compensation vector.
    Type: Grant
    Filed: March 19, 2007
    Date of Patent: February 5, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masami Akamine, Takashi Masuko, Daniel Barreda, Remco Teunen
  • Publication number: 20120089402
    Abstract: According to one embodiment, a speech synthesizer includes an analyzer, a first estimator, a selector, a generator, a second estimator, and a synthesizer. The analyzer analyzes text and extracts a linguistic feature. The first estimator selects a first prosody model adapted to the linguistic feature and estimates prosody information that maximizes a first likelihood representing probability of the selected first prosody model. The selector selects speech units that minimize a cost function determined in accordance with the prosody information. The generator generates a second prosody model that is a model of the prosody information of the speech units. The second estimator estimates prosody information that maximizes a third likelihood calculated on the basis of the first likelihood and a second likelihood representing probability of the second prosody model. The synthesizer generates synthetic speech by concatenating the speech units on the basis of the prosody information estimated by the second estimator.
    Type: Application
    Filed: October 12, 2011
    Publication date: April 12, 2012
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Javier Latorre, Masami Akamine
  • Publication number: 20120065961
    Abstract: According to one embodiment, a speech model generating apparatus includes a spectrum analyzer, a chunker, a parameterizer, a clustering unit, and a model training unit. The spectrum analyzer acquires a speech signal corresponding to text information and calculates a set of spectral coefficients. The chunker acquires boundary information indicating a beginning and an end of linguistic units and chunks the speech signal into linguistic units. The parameterizer calculates a set of spectral trajectory parameters for a trajectory of the spectral trajectory parameters of the linguistic unit on the basis of the spectral coefficients. The clustering unit clusters the spectral trajectory parameters calculated for each of the linguistic units into clusters on the basis of linguistic information. The model training unit obtains a trained spectral trajectory model indicating a characteristic of a cluster based on the spectral trajectory parameters belonging to the same cluster.
    Type: Application
    Filed: September 21, 2011
    Publication date: March 15, 2012
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Javier Latorre, Masami Akamine
  • Patent number: 8078462
    Abstract: A transformation-parameter calculating unit calculates a first model parameter indicating a parameter of a speaker model for causing a first likelihood for a clean feature to maximum, and calculates a transformation parameter for causing the first likelihood to maximum. The transformation parameter transforms, for each of the speakers, a distribution of the clean feature corresponding to the identification information of the speaker to a distribution represented by the speaker model of the first model parameter. A model-parameter calculating unit transforms a noisy feature corresponding to identification information for each of speakers by using the transformation parameter, and calculates a second model parameter indicating a parameter of the speaker model for causing a second likelihood for the transformed noisy feature to maximum.
    Type: Grant
    Filed: October 2, 2008
    Date of Patent: December 13, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Yusuke Shinohara, Masami Akamine
  • Patent number: 8046225
    Abstract: Normalization parameters are generated at a normalization-parameter generating unit by calculating the mean values and the standard deviations of an initial prosody pattern and a prosody pattern of a training sentence of a speech corpus. Then, the variance range or variance width of the initial prosody pattern is normalized at the prosody-pattern normalizing unit in accordance with the normalization parameters. As a result, a prosody pattern similar to speech of human beings and improved in naturalness can be generated with a small amount of calculation.
    Type: Grant
    Filed: February 8, 2008
    Date of Patent: October 25, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Takashi Masuko, Masami Akamine
  • Publication number: 20100169094
    Abstract: A speaker adaptation apparatus includes an acquiring unit configured to acquire an acoustic model including HMMs and decision trees for estimating what type of the phoneme or the word is included in a feature value used for speech recognition, the HMMs having a plurality of states on a phoneme-to-phoneme basis or a word-to-word basis, and the decision trees being configured to reply to questions relating to the feature value and output likelihoods in the respective states of the HMMs, and a speaker adaptation unit configured to adapt the decision trees to a speaker, the decision trees being adapted using speaker adaptation data vocalized by the speaker of an input speech.
    Type: Application
    Filed: September 17, 2009
    Publication date: July 1, 2010
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masami Akamine, Jitendra Ajmera, Partha Lal
  • Publication number: 20100076759
    Abstract: A noisy vector is extracted from a noisy speech, which is a clean speech on which a noise is superimposed. A noise parameter of the noise is estimated from the noisy vector. A prior distribution parameter of a clean vector of the clean speech is already stored. A joint Gaussian distribution parameter between the clean vector and the noisy vector is calculated by unscented transformation, from the noise parameter and the prior distribution parameter. A posterior distribution parameter of the clean vector is calculated by the joint Gaussian distribution parameter, from the noisy vector. By comparing the posterior distribution parameter with a standard pattern of each word previously stored, a word sequence of the noisy speech is output.
    Type: Application
    Filed: September 8, 2009
    Publication date: March 25, 2010
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Yusuke Shinohara, Masami Akamine