Patents by Inventor Yasuo Okutani
Yasuo Okutani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 7039588Abstract: Input text data undergoes language analysis to generate prosody, and a speech database is searched for a synthesis unit on the basis of the prosody. A modification distortion of the found synthesis unit, and concatenation distortions upon connecting that synthesis unit to those in the preceding phoneme are computed, and a distortion determination unit weights the modification and concatenation distortions to determine the total distortion. An Nbest determination unit obtains N best paths that can minimize the distortion using the A* search algorithm, and a registration unit determination unit selects a synthesis unit to be registered in a synthesis unit inventory on the basis of the N best paths in the order of frequencies of occurrence, and registers it in the synthesis unit inventory.Type: GrantFiled: August 30, 2004Date of Patent: May 2, 2006Assignee: Canon Kabushiki KaishaInventors: Yasuo Okutani, Yasuhiro Komori
-
Publication number: 20060085194Abstract: Input text data undergoes language analysis to generate prosody, and a speech database is searched for a synthesis unit on the basis of the prosody. A modification distortion of the found synthesis unit, and concatenation distortions upon connecting that synthesis unit to those in the preceding phoneme are computed, and a distortion determination unit weights the modification and concatenation distortions to determine the total distortion. An Nbest determination unit obtains N best paths that can minimize the distortion using the A* search algorithm, and a registration unit determination unit selects a synthesis unit to be registered in a synthesis unit inventory on the basis of the N best paths in the order of frequencies of occurrence, and registers it in the synthesis unit inventory.Type: ApplicationFiled: December 7, 2005Publication date: April 20, 2006Applicant: CANON KABUSHIKI KAISHAInventors: Yasuo Okutani, Yasuhiro Komori
-
Patent number: 7031919Abstract: A speech synthesizing apparatus for synthesizing a speech waveform stores speech data, which is obtained by adding attribute information onto phoneme data, in a database. In accordance with prescribed retrieval conditions, a phoneme retrieval unit retrieves phoneme data from the speech data that has been stored in the database and retains the retrieved results in a retrieved-result storage area. A processing unit for assigning a power penalty and a processing unit for assigning a phoneme-duration penalty assign the penalties, on the basis of power and phoneme duration constituting the attribute information, to a set of phoneme data stored in the retrieved-result storage area. A processing unit for determining typical phoneme data performs sorting on the basis of the assigned penalties and, based upon the stored results, selects phoneme data to be employed in the synthesis of a speech waveform.Type: GrantFiled: August 30, 1999Date of Patent: April 18, 2006Assignee: Canon Kabushiki KaishaInventors: Yasuo Okutani, Masayuki Yamada
-
Publication number: 20060047508Abstract: A speech processing apparatus and method for restricting use of generated speech data for purposes other than a particular purpose. An adder adds predetermined audio data within audio-frequency band excluding predetermined frequency band, to input speech data. A band limiting filter limits the speech data to which the predetermined audio data has been added by the adder, to the predetermined frequency band. A communication device transmits the speech data which has been limited to the predetermined frequency band by the band limiting filter.Type: ApplicationFiled: August 24, 2005Publication date: March 2, 2006Inventor: Yasuo Okutani
-
Publication number: 20060031072Abstract: An electronic dictionary apparatus and its control method are provided. A database contains entry words and advanced phonetic information corresponding to each entry word. A dictionary search section searches the database using an entry word specified by a user as a search key and acquires the advanced phonetic information corresponding to the entry word. A display section displays simple phonetic information generated based on the acquired advanced phonetic information. A speech output section performs speech synthesis based on the acquired advanced phonetic information and outputs the synthesized speech.Type: ApplicationFiled: August 4, 2005Publication date: February 9, 2006Inventors: Yasuo Okutani, Michio Aizawa
-
Patent number: 6980955Abstract: Input text data undergoes language analysis to generate prosody, and a speech database is searched for a synthesis unit on the basis of the prosody. A modification distortion of the found synthesis unit, and concatenation distortions upon connecting that synthesis unit to those in the preceding phoneme are computed, and a distortion determination unit weights the modification and concatenation distortions to determine the total distortion. An Nbest determination unit obtains N best paths that can minimize the distortion using the A* search algorithm, and a registration unit determination unit selects a synthesis unit to be registered in a synthesis unit inventory on the basis of the N best paths in the order of frequencies of occurrence, and registers it in the synthesis unit inventory.Type: GrantFiled: March 28, 2001Date of Patent: December 27, 2005Assignee: Canon Kabushiki KaishaInventors: Yasuo Okutani, Yasuhiro Komori
-
Publication number: 20050209855Abstract: A speech segment search unit searches a speech database for speech segments that satisfy a phonetic environment, and a HMM learning unit computes the HMMs of phonemes on the basis of the search result. A segment recognition unit performs segment recognition of speech segments on the basis of the computed HMMs of the phonemes, and when the phoneme of the segment recognition result is equal to a phoneme of the source speech segment, that speech segment is registered in a segment dictionary.Type: ApplicationFiled: May 11, 2005Publication date: September 22, 2005Applicant: CANON KABUSHIKI KAISHAInventors: Yasuo Okutani, Yasuhiro Komori, Toshiaki Fukada
-
Publication number: 20050191036Abstract: In cases where at least one item of sound information has been associated with at least image, at least one desired item of sound information is selected and the sound information is played back in a prescribed order. According, in an information processing apparatus, a playback sequence decision unit (103) reads in image data as well as sound data, which has been assigned within the image data, from a image/sound data storage unit (107), generates a still image in which the positions at which sound data has been recorded is denoted on the image, and displays the generated still image on a image display unit (106). A sound data specifying unit (102) searches the image/sound data storage unit (107) for sound data that has been associated with the interior of an image area specified by an input from a user. When applicable sound data is found to exist, the playback sequence decision unit (103) decides the order in which the applicable sound data is to be played back.Type: ApplicationFiled: February 7, 2005Publication date: September 1, 2005Applicant: Canon Kabushiki KaishaInventors: Yasuo Okutani, Yasuhiro Komori
-
Publication number: 20050027532Abstract: Input text data undergoes language analysis to generate prosody, and a speech database is searched for a synthesis unit on the basis of the prosody. A modification distortion of the found synthesis unit, and concatenation distortions upon connecting that synthesis unit to those in the preceding phoneme are computed, and a distortion determination unit weights the modification and concatenation distortions to determine the total distortion. An Nbest determination unit obtains N best paths that can minimize the distortion using the A* search algorithm, and a registration unit determination unit selects a synthesis unit to be registered in a synthesis unit inventory on the basis of the N best paths in the order of frequencies of occurrence, and registers it in the synthesis unit inventory.Type: ApplicationFiled: August 30, 2004Publication date: February 3, 2005Applicant: CANON KABUSHIKI KAISHAInventors: Yasuo Okutani, Yasuhiro Komori
-
Publication number: 20040093209Abstract: A data input device for inputting numeric data by voice includes a range prediction part, a history holding part, a speech recognition part, a recognition result holding part, a comparison part, a presentation part, and a result storing part. The range prediction part estimates a range of a value expected to be input on the basis of meter-reading history data held in the history holding part. The speech recognition part recognizes speech representing a meter reading and stores the recognition result in the recognition result holding part. The comparison part determines whether or not the meter reading for this month represented by the data stored in the recognition result holding part is within the prediction range. If the meter reading for this month is within the prediction range, the presentation part presents the recognition result to a user, and the speech recognition result is stored in the result storing part.Type: ApplicationFiled: October 20, 2003Publication date: May 13, 2004Applicant: Canon Kabushiki KaishaInventor: Yasuo Okutani
-
Publication number: 20040088165Abstract: Even when a copying machine with a voice guidance function is used, a problem of wastefully copying wrong documents or documents with missing pages remains unsolved for visually impaired persons. To this end, a document image is read, character strings on the read document image are recognized, a character string indicating the contents of the document is chosen from the recognized character strings, the chosen character string is converted into speech, and synthetic speech is output.Type: ApplicationFiled: July 28, 2003Publication date: May 6, 2004Applicant: Canon Kabushiki KaishaInventors: Yasuo Okutani, Tetsuo Kosaka
-
Publication number: 20030158735Abstract: With this invention, an information processing apparatus which has an audio data playback function and text-to-speech synthesis function allows the user to input an instruction by a fewer operations and provides a fast-forward/fast-reverse function optimal to speech synthesis. During speech synthesis, an instruction input by a button operation is supplied to a speech synthesis unit. When playback of audio data is underway, but speech synthesis is inactive, an instruction input by a button operation is supplied to an audio data playback unit. In a fast-forward mode, an abstract is read aloud or head parts of sentences are read aloud. In a fast-reverse mode, head parts of sentences are read aloud. Also, given tones are generated in correspondence with skipped parts.Type: ApplicationFiled: February 11, 2003Publication date: August 21, 2003Applicant: Canon Kabushiki KaishaInventors: Masayuki Yamada, Katsuhiko Kawasaki, Toshiaki Fukada, Yasuo Okutani
-
Publication number: 20030125949Abstract: A speech synthesizing apparatus for synthesizing a speech waveform stores speech data, which is obtained by adding attribute information onto phoneme data, in a database. In accordance with prescribed retrieval conditions, a phoneme retrieval unit retrieves phoneme data from the speech data that has been stored in the database and retains the retrieved results in a retrieved-result storage area. A processing unit for assigning a power penalty and a processing unit for assigning a phoneme-duration penalty assign the penalties, on the basis of power and phoneme duration constituting the attribute information, to a set of phoneme data stored in the retrieved-result storage area. A processing unit for determining typical phoneme data performs sorting on the basis of the assigned penalties and, based upon the stored results, selects phoneme data to be employed in the synthesis of a speech waveform.Type: ApplicationFiled: August 30, 1999Publication date: July 3, 2003Inventors: YASUO OKUTANI, MASAYUKI YAMADA
-
Publication number: 20020051955Abstract: A speech segment search unit searches a speech database for speech segments that satisfy a phonetic environment, and a HMM learning unit computes the HMMs of phonemes on the basis of the search result. A segment recognition unit performs segment recognition of speech segments on the basis of the computed HMMs of the phonemes, and when the phoneme of the segment recognition result is equal to a phoneme of the source speech segment, that speech segment is registered in a segment dictionary.Type: ApplicationFiled: March 29, 2001Publication date: May 2, 2002Inventors: Yasuo Okutani, Yasuhiro Komori, Toshiaki Fukada
-
Publication number: 20010047259Abstract: Input text data undergoes language analysis to generate prosody, and a speech database is searched for a synthesis unit on the basis of the prosody. A modification distortion of the found synthesis unit, and concatenation distortions upon connecting that synthesis unit to those in the preceding phoneme are computed, and a distortion determination unit weights the modification and concatenation distortions to determine the total distortion. An Nbest determination unit obtains N best paths that can minimize the distortion using the A* search algorithm, and a registration unit determination unit selects a synthesis unit to be registered in a synthesis unit inventory on the basis of the N best paths in the order of frequencies of occurrence, and registers it in the synthesis unit inventory.Type: ApplicationFiled: March 28, 2001Publication date: November 29, 2001Inventors: Yasuo Okutani, Yasuhiro Komori
-
Publication number: 20010032079Abstract: An object of the present invention is to suppress degradation of the quality in speech synthesis by selecting synthesis units so as to minimize a distortion caused by concatenation distortions and modification distortions. For that purpose, speech synthesis is performed by extracting a plurality of synthesis units corresponding to a phoneme environment from a synthesis-unit holding unit for holding a plurality of synthesis units so as to correspond to a predetermined prosody environment, calculating a distortion of each of the plurality of extracted synthesis units, obtaining a minimum distortion within a predetermined interval determined based on the prosody environment, selecting a series of synthesis units providing a minimum-distortion path, and modifying and concatenating the synthesis units.Type: ApplicationFiled: March 28, 2001Publication date: October 18, 2001Inventors: Yasuo Okutani, Yasuhiro Komori
-
Patent number: 6021388Abstract: A speech synthesis apparatus for outputting synthesized speech on the basis of a parameter sequence of a speech waveform includes a parameter generation unit which generates a parameter sequence for speech synthesis on the basis of a character sequence input by a character sequence input unit, and stores the generated parameter sequence in a parameter storage unit. A waveform generation unit is also provided that generates pitch waveforms each for one pitch period on the basis of synthesis parameters and pitch scales included in the parameter sequence, and generates a speech waveform by connecting the generated pitch waveforms in accordance with frame lengths set by a frame length setting unit.Type: GrantFiled: December 19, 1997Date of Patent: February 1, 2000Assignee: Canon Kabushiki KaishaInventors: Mitsuru Otsuka, Yasunori Ohora, Takashi Aso, Yasuo Okutani