Patents by Inventor Ryuki Tachibana
Ryuki Tachibana has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20160210964Abstract: A reading accuracy-improving system includes: a reading conversion unit for retrieving a plurality of candidate word strings from speech recognition results to determine the reading of each candidate word string; a reading score calculating unit for determining the speech recognition score for each of one or more candidate word strings with the same reading to determine a reading score; and a candidate word string selection unit for selecting a candidate to output from the plurality of candidate word strings on the basis of the reading score and speech recognition score corresponding to each candidate word string.Type: ApplicationFiled: March 28, 2016Publication date: July 21, 2016Inventors: Gakuto Kurata, Masafumi Nishimura, Ryuki Tachibana
-
Patent number: 9384730Abstract: A reading accuracy-improving system includes: a reading conversion unit for retrieving a plurality of candidate word strings from speech recognition results to determine the reading of each candidate word string; a reading score calculating unit for determining the speech recognition score for each of one or more candidate word strings with the same reading to determine a reading score; and a candidate word string selection unit for selecting a candidate to output from the plurality of candidate word strings on the basis of the reading score and speech recognition score corresponding to each candidate word string.Type: GrantFiled: April 14, 2014Date of Patent: July 5, 2016Assignee: International Business Machines CorporationInventors: Gakuto Kurata, Masafumi Nishimura, Ryuki Tachibana
-
Publication number: 20160086599Abstract: A construction method for a speech recognition model, in which a computer system includes; a step of acquiring alignment between speech of each of a plurality of speakers and a transcript of the speaker; a step of joining transcripts of the respective ones of the plurality of speakers along a time axis, creating a transcript of speech of mixed speakers obtained from synthesized speech of the speakers, and replacing predetermined transcribed portions of the plurality of speakers overlapping on the time axis with a unit which represents a simultaneous speech segment; and a step of constructing at least one of an acoustic model and a language model which make up a speech recognition model, based on the transcript of the speech of the mixed speakers.Type: ApplicationFiled: September 23, 2015Publication date: March 24, 2016Inventors: Gakuto Kurata, Toru Nagano, Masayuki Suzuki, Ryuki Tachibana
-
Patent number: 9275631Abstract: Waveform concatenation speech synthesis with high sound quality. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. An accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values.Type: GrantFiled: December 31, 2012Date of Patent: March 1, 2016Assignee: Nuance Communications, Inc.Inventors: Ryuki Tachibana, Masafumi Nishimura
-
Patent number: 8972407Abstract: An information processing apparatus determines a weight of each physical feature for hierarchical clustering by acquiring training data of multiple pieces of content in triplets with label information indicating a pair specified by a user as having a highest degree of similarity among three contents of the triplet and executing hierarchical clustering using a feature vector of each piece of content of the training data and the weight of each feature to determine the hierarchical structure of the training data. The information processing apparatus updates the weight of each feature so that the degree of agreement between a pair combined first as being the same clusters among three contents of the triplet in a determined hierarchical structure and a pair indicated by label information corresponding to the triplet increases.Type: GrantFiled: September 6, 2012Date of Patent: March 3, 2015Assignee: International Business Machines CorporationInventors: Toru Nagano, Masafumi Nishimura, Takashima Ryoichi, Ryuki Tachibana
-
Patent number: 8918396Abstract: An information processing apparatus determines a weight of each physical feature for hierarchical clustering by acquiring training data of multiple pieces of content in triplets with label information indicating a pair specified by a user as having a highest degree of similarity among three contents of the triplet and executing hierarchical clustering using a feature vector of each piece of content of the training data and the weight of each feature to determine the hierarchical structure of the training data. The information processing apparatus updates the weight of each feature so that the degree of agreement between a pair combined first as being the same clusters among three contents of the triplet in a determined hierarchical structure and a pair indicated by label information corresponding to the triplet increases.Type: GrantFiled: June 28, 2012Date of Patent: December 23, 2014Assignee: International Business Machines CorporationInventors: Toru Nagano, Masafumi Nishimura, Takashima Ryoichi, Ryuki Tachibana
-
Publication number: 20140358533Abstract: A reading accuracy-improving system includes: a reading conversion unit for retrieving a plurality of candidate word strings from speech recognition results to determine the reading of each candidate word string; a reading score calculating unit for determining the speech recognition score for each of one or more candidate word strings with the same reading to determine a reading score; and a candidate word string selection unit for selecting a candidate to output from the plurality of candidate word strings on the basis of the reading score and speech recognition score corresponding to each candidate word string.Type: ApplicationFiled: April 14, 2014Publication date: December 4, 2014Applicant: International Business Machines CorporationInventors: Gakuto Kurata, Masafumi Nishimura, Ryuki Tachibana
-
Patent number: 8744853Abstract: An objective is to provide a technique for accurately reproducing features of a fundamental frequency of a target-speaker's voice on the basis of only a small amount of learning data. A learning apparatus learns shift amounts from a reference source F0 pattern to a target F0 pattern of a target-speaker's voice. The learning apparatus associates a source F0 pattern of a learning text to a target F0 pattern of the same learning text by associating their peaks and troughs. For each of points on the target F0 pattern, the learning apparatus obtains shift amounts in a time-axis direction and in a frequency-axis direction from a corresponding point on the source F0 pattern in reference to a result of the association, and learns a decision tree using, as an input feature vector, linguistic information obtained by parsing the learning text, and using, as an output feature vector, the calculated shift amounts.Type: GrantFiled: March 16, 2010Date of Patent: June 3, 2014Assignee: International Business Machines CorporationInventors: Masafumi Nishimura, Ryuki Tachibana
-
Publication number: 20130268275Abstract: Waveform concatenation speech synthesis with high sound quality. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. An accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values.Type: ApplicationFiled: December 31, 2012Publication date: October 10, 2013Inventors: Ryuki Tachibana, Masafumi Nishimura
-
Patent number: 8370149Abstract: Waveform concatenation speech synthesis with high sound quality. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. An accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values.Type: GrantFiled: August 15, 2008Date of Patent: February 5, 2013Assignee: Nuance Communications, Inc.Inventors: Ryuki Tachibana, Masafumi Nishimura
-
Publication number: 20130006991Abstract: An information processing apparatus determines a weight of each physical feature for hierarchical clustering by acquiring training data of multiple pieces of content in triplets with label information indicating a pair specified by a user as having a highest degree of similarity among three contents of the triplet and executing hierarchical clustering using a feature vector of each piece of content of the training data and the weight of each feature to determine the hierarchical structure of the training data. The information processing apparatus updates the weight of each feature so that the degree of agreement between a pair combined first as being the same clusters among three contents of the triplet in a determined hierarchical structure and a pair indicated by label information corresponding to the triplet increases.Type: ApplicationFiled: June 28, 2012Publication date: January 3, 2013Inventors: Toru Nagano, Masafumi Nishimura, Takashima Ryoichi, Ryuki Tachibana
-
Publication number: 20120330957Abstract: An information processing apparatus determines a weight of each physical feature for hierarchical clustering by acquiring training data of multiple pieces of content in triplets with label information indicating a pair specified by a user as having a highest degree of similarity among three contents of the triplet and executing hierarchical clustering using a feature vector of each piece of content of the training data and the weight of each feature to determine the hierarchical structure of the training data. The information processing apparatus updates the weight of each feature so that the degree of agreement between a pair combined first as being the same clusters among three contents of the triplet in a determined hierarchical structure and a pair indicated by label information corresponding to the triplet increases.Type: ApplicationFiled: September 6, 2012Publication date: December 27, 2012Applicant: International Business Machines CorporationInventors: Toru Nagano, Masafumi Nishimura, Takashima Ryoichi, Ryuki Tachibana
-
Publication number: 20120316880Abstract: An information processing apparatus, information processing method, and computer readable non-transitory storage medium for analyzing words reflecting information that is not explicitly recognized verbally. An information processing method includes the steps of: extracting speech data and sound data used for recognizing phonemes included in the speech data as words; identifying a section surrounded by pauses within a speech spectrum of the speech data; performing sound analysis on the identified section to identify a word in the section; generating prosodic feature values for the words; acquiring frequencies of occurrence of the word within the speech data; calculating a degree of fluctuation within the speech data for the prosodic feature values of high frequency words where the high frequency words are any words whose frequency of occurrence meets a threshold; and determining a key phrase based on the degree of fluctuation.Type: ApplicationFiled: August 22, 2012Publication date: December 13, 2012Applicant: International Business Machines CorporationInventors: Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
-
Publication number: 20120197644Abstract: An information processing apparatus, information processing method, and computer readable non-transitory storage medium for analyzing words reflecting information that is not explicitly recognized verbally. An information processing method includes the steps of: extracting speech data and sound data used for recognizing phonemes included in the speech data as words; identifying a section surrounded by pauses within a speech spectrum of the speech data; performing sound analysis on the identified section to identify a word in the section; generating prosodic feature values for the words; acquiring frequencies of occurrence of the word within the speech data; calculating a degree of fluctuation within the speech data for the prosodic feature values of high frequency words where the high frequency words are any words whose frequency of occurrence meets a threshold; and determining a key phrase based on the degree of fluctuation.Type: ApplicationFiled: January 30, 2012Publication date: August 2, 2012Applicant: International Business Machines CorporationInventors: Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
-
Publication number: 20120059654Abstract: An objective is to provide a technique for accurately reproducing features of a fundamental frequency of a target-speaker's voice on the basis of only a small amount of learning data. A learning apparatus learns shift amounts from a reference source F0 pattern to a target F0 pattern of a target-speaker's voice. The learning apparatus associates a source F0 pattern of a learning text to a target F0 pattern of the same learning text by associating their peaks and troughs. For each of points on the target F0 pattern, the learning apparatus obtains shift amounts in a time-axis direction and in a frequency-axis direction from a corresponding point on the source F0 pattern in reference to a result of the association, and learns a decision tree using, as an input feature vector, linguistic information obtained by parsing the learning text, and using, as an output feature vector, the calculated shift amounts.Type: ApplicationFiled: March 16, 2010Publication date: March 8, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Masafumi Nishimura, Ryuki Tachibana
-
Patent number: 8055505Abstract: Digital watermark detection apparatus including detection units which calculate detected values of watermark signals by use of keys for PCM data of channels of audio content, a plurality of units which add the detected values corresponding to each of the channels and each of the keys for each possible combination of the respective channels and the respective keys, and a unit which selects and outputs one adding result from the respective adding results by the plurality of detected value adding units. Moreover, it includes units which accumulate the detected values in accumulation cycles different from one another to restore messages embedded as digital watermarks from the accumulated detected values, and perform boundary detection of the audio contents to detect the audio contents in which the digital watermarks are embedded, and a detection result output unit which synthesizes and outputs respective processing results by the message restoration units.Type: GrantFiled: June 17, 2008Date of Patent: November 8, 2011Assignee: International Business Machines CorporationInventors: Ryuki Tachibana, Norishige Morimoto
-
Patent number: 8015011Abstract: A synthetic speech system includes a phoneme segment storage section for storing multiple phoneme segment data pieces; a synthesis section for generating voice data from text by reading phoneme segment data pieces representing the pronunciation of an inputted text from the phoneme segment storage section and connecting the phoneme segment data pieces to each other; a computing section for computing a score indicating the unnaturalness of the voice data representing the synthetic speech of the text; a paraphrase storage section for storing multiple paraphrases of the multiple first phrases; a replacement section for searching the text and replacing with appropriate paraphrases; and a judgment section for outputting generated voice data on condition that the computed score is smaller than a reference value and for inputting the text after the replacement to the synthesis section to cause the synthesis section to further generate voice data for the text.Type: GrantFiled: January 30, 2008Date of Patent: September 6, 2011Assignee: Nuance Communications, Inc.Inventors: Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
-
Patent number: 7921014Abstract: A system for generating high-quality synthesized text-to-speech includes a learning data generating unit, a frequency data generating unit, and a setting unit. The learning data generating unit recognizes inputted speech, and then generates first learning data in which wordings of phrases are associated with readings thereof. The frequency data generating unit generates, based on the first learning data, frequency data indicating appearance frequencies of both wordings and readings of phrases. The setting unit sets the thus generated frequency data for a language processing unit in order to approximate outputted speech of text-to-speech to the inputted speech. Furthermore, the language processing unit generates, from a wording of text, a reading corresponding to the wording, on the basis of the appearance frequencies.Type: GrantFiled: July 9, 2007Date of Patent: April 5, 2011Assignee: Nuance Communications, Inc.Inventors: Gakuto Kurata, Toru Nagano, Masafumi Nishimura, Ryuki Tachibana
-
Patent number: 7797542Abstract: An apparatus 10 for generating watermark signals to be embedded as a digital watermark in real-time contents includes: input means 12 for inputting the real-time contents; an input buffer 14 for storing the real-time contents; generation means for generating watermark signals corresponding to predicted intensities of the real-time contents from divided real-time contents; and an output buffer 18 for storing the generated watermark signals to be outputted. The generation means is configured by including prediction means 16 for predicting intensities of the watermark signals; control means 20 for controlling embedding by use of a message to be embedded as the digital watermark in the divided real-time contents; and means 22 for generating the watermark signals to be outputted.Type: GrantFiled: July 28, 2009Date of Patent: September 14, 2010Assignee: International Business Machines CorporationInventors: Ryuki Tachibana, Ryo Subihara
-
Publication number: 20100125459Abstract: Exemplary embodiments provide for determining a sequence of words in a TTS system. An input text is analyzed using two models, a word n-gram model and an accent class n-gram model. A list of all possible words for each word in the input is generated for each model. Each word in each list for each model is given a score based on the probability that the word is the correct word in the sequence, based on the particular model. The two lists are combined and the two scores are combined for each word. A set of sequences of words are generated. Each sequence of words comprises a unique combination of an attribute and associated word for each word in the input. The combined score of each of word in the sequence of words is combined. A sequence of words having the highest score is selected and presented to a user.Type: ApplicationFiled: July 1, 2009Publication date: May 20, 2010Applicant: Nuance Communications, Inc.Inventors: Nobuyasu Itoh, Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana