Time Element Patents (Class 704/267)
-
Patent number: 12175963Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.Type: GrantFiled: November 30, 2023Date of Patent: December 24, 2024Assignee: Google LLCInventors: Ye Jia, Zhifeng Chen, Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Ignacio Lopez Moreno, Fei Ren, Yu Zhang, Quan Wang, Patrick An Phu Nguyen
-
Patent number: 12118979Abstract: A method, computer program product, and computer system for text-to-speech synthesis is disclosed. Synthetic speech data for an input text may be generated. The synthetic speech data may be compared to recorded reference speech data corresponding to the input text. Based on, at least in part, the comparison of the synthetic speech data to the recorded reference speech data, at least one feature indicative of at least one difference between the synthetic speech data and the recorded reference speech data may be extracted. A speech gap filling model may be generated based on, at least in part, the at least one feature extracted. A speech output may be generated based on, at least in part, the speech gap filling model.Type: GrantFiled: July 3, 2023Date of Patent: October 15, 2024Assignee: Telepathy Labs, Inc.Inventors: Piero Perucci, Martin Reber, Vijeta Avijeet
-
Patent number: 12046227Abstract: A method for generating frame values using a key frame network includes receiving a text utterance having at least one phoneme, and for each respective phoneme of the at least one phoneme, predicting, using a predictive model, a fixed quantity of key frames. Each respective key frame of the fixed quantity of key frames includes a representation of a component of the respective phoneme. The method also includes generating, using the fixed quantity of key frames, a plurality of frame values. Here, each respective frame value of the plurality of frame values is representative of a fixed-duration of audio.Type: GrantFiled: April 19, 2022Date of Patent: July 23, 2024Assignee: Google LLCInventors: Tom Marius Kenter, Tobias Alexander Hawker, Robert Clark
-
Patent number: 11848002Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.Type: GrantFiled: July 19, 2022Date of Patent: December 19, 2023Assignee: Google LLCInventors: Ye Jia, Zhifeng Chen, Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Ignacio Lopez Moreno, Fei Ren, Yu Zhang, Quan Wang, Patrick An Phu Nguyen
-
Patent number: 11741942Abstract: A method, computer program product, and computer system for text-to-speech synthesis is disclosed. Synthetic speech data for an input text may be generated. The synthetic speech data may be compared to recorded reference speech data corresponding to the input text. Based on, at least in part, the comparison of the synthetic speech data to the recorded reference speech data, at least one feature indicative of at least one difference between the synthetic speech data and the recorded reference speech data may be extracted. A speech gap filling model may be generated based on, at least in part, the at least one feature extracted. A speech output may be generated based on, at least in part, the speech gap filling model.Type: GrantFiled: August 3, 2022Date of Patent: August 29, 2023Assignee: Telepathy Labs, IncInventors: Piero Perucci, Martin Reber, Vijeta Avijeet
-
Patent number: 11302301Abstract: A method, computer program, and computer system is provided for synthesizing speech at one or more speeds. A context associated with one or more phonemes corresponding to a speaking voice is encoded, and the one or more phonemes are aligned to one or more target acoustic frames based on the encoded context. One or more mel-spectrogram features are recursively generated from the aligned phonemes and target acoustic frames, and a voice sample corresponding to the speaking voice is synthesized using the generated mel-spectrogram features.Type: GrantFiled: March 3, 2020Date of Patent: April 12, 2022Assignee: TENCENT AMERICA LLCInventors: Chengzhu Yu, Dong Yu
-
Patent number: 11243254Abstract: A method for operating a test apparatus including a plurality of shared resources is shown, wherein the plurality of shared resources can be used in different instruments. The method includes blocking a first set of resource blockers when a first instrument, which requires a first subset of the shared resources, is to be executed. Furthermore, the method tries to block a second set of resource blockers, when a second instrument, which requires a second subset of the shared resources, is to be executed. Therefore, the first set of resource blockers is different from the second set of resource blockers and a plurality of resource blockers are assigned to a shared resource, which is involved in a conflicting combination of instruments and in a non-conflicting combination of instruments.Type: GrantFiled: September 27, 2017Date of Patent: February 8, 2022Assignee: ADVANTEST CORPORATIONInventor: Wolfgang Horn
-
Patent number: 10832031Abstract: A first set of signals corresponding to a first signal modality (such as the direction of a gaze) during a time interval is collected from an individual. A second set of signals corresponding to a different signal modality (such as hand-pointing gestures made by the individual) is also collected. In response to a command, where the command does not identify a particular object to which the command is directed, the first and second set of signals is used to identify candidate objects of interest, and an operation associated with a selected object from the candidates is performed.Type: GrantFiled: August 14, 2017Date of Patent: November 10, 2020Assignee: Apple Inc.Inventors: Wolf Kienzle, Douglas A. Bowman
-
Patent number: 10810313Abstract: A system and method for preserving the privacy of data while processing of the data in a cloud. The system comprises a computer program application and a client encryption key, The system is operable to encrypt the computer program application and data using the client encryption key; upload the encrypted computer program application and encrypted data in the cloud; enable the computer platform to undertake processing of the encrypted data in the cloud using the encrypted computer program application; output encrypted processing results; and, enable decryption of the encrypted processing results using the client encryption key.Type: GrantFiled: October 3, 2016Date of Patent: October 20, 2020Inventors: Nigel Henry Cannings, Gerard Chollet, Cornelius Glackin, Muttukrishnan Rajarajan
-
Patent number: 10685644Abstract: There is disclosed a method of generating a text-to-speech (TTS) training set for training a Machine Learning Algorithm (MLA) for generating machine-spoken utterances The method is executable by a server. The method includes generating a synthetic word based on merging separate phonemes from each of two words of a corpus of pre-recorded utterances, the merging being done using the common phoneme as a merging anchor, the merging resulting in at least two synthetic words. The synthetic words and assessor labels are used to train a classifier to predict a quality parameter associated with a new synthetic phonemes-based word, the quality parameter being representative of whether the new synthetic phonemes-based word is naturally sounding (based on acoustic features of generated synthetic words utterances). The classifier is then used to generate training objects for the MLA and to use the MLA to process the corpus of pre-recorded utterances into their respective vectors.Type: GrantFiled: July 4, 2018Date of Patent: June 16, 2020Assignee: YANDEX EUROPE AGInventors: Vladimir Vladimirovich Kirichenko, Petr Vladislavovich Luferenko
-
Patent number: 10650810Abstract: Systems and methods of determining phonetic relationships are provided. For instance data indicative of an input text phrase input by a user can be received. An audio output corresponding to a spoken rendering of the input text phrase can be determined. A text transcription of the audio output of the input text phrase can be determined. The text transcription can be a textual representation of the audio output. The text transcription can be compared against a plurality of test phrases to identify a match between the text transcription and at least one test phrase.Type: GrantFiled: September 29, 2017Date of Patent: May 12, 2020Assignee: GOOGLE LLCInventors: Nikhil Chandru Rao, Saisuresh Krishnakumaran
-
Patent number: 10504502Abstract: A sound control device includes: a detection unit that detects a first operation on an operator and a second operation on the operator, the second operation being performed after the first operation; and a control unit that causes output of a second sound to be started, in response to the second operation being detected. The control unit causes output of a first sound to be started before causing the output of the second sound to be started, in response to the first operation being detected.Type: GrantFiled: September 20, 2017Date of Patent: December 10, 2019Assignee: YAMAHA CORPORATIONInventors: Keizo Hamano, Yoshitomo Ota, Kazuki Kashiwase
-
Patent number: 10469623Abstract: A system and method for multi-language phrase identification within spoken interaction audio capable of adjusting for regional pronunciation (accents), cadence differences, and homologs. In this system, a spoken interaction audio data store supplies spoken audio data such as contact center call recordings to be analyzed for a specific phrase or set of phrases. Phrases are entered as natural language text and converted to the phonemes representative of the phrase audio using the invention's language packs and stored in a data store. Spoken interaction and phrase audio are converted to a digital format allowing comparison using multiple characteristics. Phrase matches are stored for subsequent post analysis display and analytics generation.Type: GrantFiled: November 27, 2018Date of Patent: November 5, 2019Assignee: ZOOM International a.s.Inventor: Vaclav Slovacek
-
Patent number: 10395638Abstract: An apparatus and a computer program product for merging incoming alerts for accessibility are described. Two input alerts intended for presentation by a screen reader are received. If the two input alerts have arrived with a specified time interval, the two input alerts are combined into an output alert. The output alert is sent to a screen reader for presentation.Type: GrantFiled: July 8, 2017Date of Patent: August 27, 2019Assignee: International Business Machines CorporationInventors: Stephen A Boxwell, Kyle M Brake, Keith G Frost, Stanley J Vernier
-
Patent number: 10373595Abstract: A musical sound generation device including a first memory having a plurality of waveform data, a second memory which stores waveform data read out from the first memory, and a control processor which controls such that, when a sound emission instruction is provided and specified waveform data is in the second memory, the waveform data is read out by the sound source processor, or controls such that, when a sound emission instruction is provided and specified waveform data is not in the second memory, the specified waveform data is transferred from the first memory to the second memory and read out by the sound source processor, in which the control processor controls such that waveform data satisfying a set condition is not subjected to a waveform data change by the transfer and waveform data not satisfying the set condition is subjected to the waveform data change by the transfer.Type: GrantFiled: March 2, 2018Date of Patent: August 6, 2019Assignee: CASIO COMPUTER CO., LTD.Inventors: Hiroki Sato, Hajime Kawashima
-
Patent number: 10205587Abstract: Provided is a wireless communication system which enables wireless communication in which crosstalk due to multiple access is canceled while using a large number of inexpensive wireless terminals. In order to generate an interference component, an analysis data sequence is generated by applying a Hilbert transform, by a Hilbert transform, to a subcarrier data sequence obtained by extracting a target subcarrier wave component from a finite length data sequence, while a carrier phase difference ? is estimated by using the regression analysis by a carrier wave phase estimation unit. After rotation calculation configured to return the analysis data sequence by the carrier phase difference ? is performed, conversion into an angle is performed. Further, a multiplication by a desired odd number of multiplication is performed, and then an inverse Hilbert transform is applied.Type: GrantFiled: April 24, 2017Date of Patent: February 12, 2019Assignee: KYOWA ELECTRONIC INSTRUMENTS CO., LTD.Inventors: Jin Mitsugi, Yuki Igarashi, Haruhisa Ichikawa, Yuusuke Kawakita, Kiyoshi Egawa
-
Patent number: 10134385Abstract: Systems and methods are provided for associating a phonetic pronunciation with a name by receiving the name, mapping the name to a plurality of monosyllabic components that are combinable to construct the phonetic pronunciation of the name, receiving a user input to select one or more of the plurality, and combining the selected one or more of the plurality of monosyllabic components to construct the phonetic pronunciation of the name.Type: GrantFiled: March 2, 2012Date of Patent: November 20, 2018Assignee: Apple Inc.Inventor: Devang K. Naik
-
Patent number: 10019982Abstract: A speech simulation system adapted for a user to communicate with others. The system has at least one sensor to sense controlled and coordinated body movement. The system has a computer processor connected to the at least one sensor. The system has a database memory connected to the computer processor. The system has software programming to operate the computer processor. The system has a feedback device connected to the computer processor and directed to the user. The system has an outward audio output device connected to the computer processor to provide sound and a speaker connected to the outward audio output device.Type: GrantFiled: October 18, 2016Date of Patent: July 10, 2018Inventor: Mary Elizabeth McCulloch
-
Patent number: 10019069Abstract: A vehicular display input apparatus includes a gesture detection unit, a determiner, and a controller. The gesture detection unit detects a gesture made by a hand of the driver. The determiner determines whether a visual line of the driver is directed within a visual line detection area, which is preliminarily defined to include at least partial display region. The controller switches to one of operations listed in an operation menu, which is to be correlated with the gesture, according to a determination result of the determiner. The determination result indicates whether the visual line is directed within the visual line detection area.Type: GrantFiled: March 12, 2015Date of Patent: July 10, 2018Assignee: DENSO CORPORATIONInventor: Youichi Naruse
-
Patent number: 10008216Abstract: Method and apparatus for reducing a size of databases required for recorded speech data.Type: GrantFiled: April 15, 2014Date of Patent: June 26, 2018Assignee: SPEECH MORPHING SYSTEMS, INC.Inventors: Fathy Yassa, Benjamin Reaves, Steve Pearson
-
Patent number: 9924282Abstract: A system for improving synchronization of an acoustic signal to a video display includes a hearing aid comprising a hearing loss processor configured for signal processing in accordance with a hearing loss of a user of the hearing aid, the hearing aid being configured for receiving a first audio signal for synchronous presentation to the user viewing the video display, the hearing aid being configured for generating a first acoustic signal to be presented to the user of the hearing aid, the first acoustic signal comprising at least a first part being generated in response to the first audio signal. The system also includes a delay unit configured for applying a delay, such that synchronization of the at least first part of the first acoustic signal to the video display is improved.Type: GrantFiled: January 20, 2012Date of Patent: March 20, 2018Assignee: GN RESOUND A/SInventors: Søren C. Ell, Jesper L. Nielsen
-
Patent number: 9900115Abstract: Systems and methods of voice annunciation of signal strength, quality of service, and sensor status for wireless devices are provided. Some methods can include determining a signal strength or range of a radio, determining quality of service events and statistics for a wireless device, or determining a status of a sensor and then verbally annunciating information or instructions relating to the determined signal strength or range, the determined quality of service events and statistics, or the determined sensor status.Type: GrantFiled: February 20, 2015Date of Patent: February 20, 2018Assignee: HONEYWELL INTERNATIONAL INC.Inventors: Timothy A. Rauworth, Douglas L. Hoeferle, Robert J. Selepa, Pardeep Verma
-
Patent number: 9640172Abstract: A sound synthesizing apparatus includes a waveform storing section which stores a plurality of unit waveforms extracted from different positions, on a time axis, of a sound waveform indicating a voiced sound, and a waveform generating section which generates, for each of a first processing period and a second processing period, a synthesized waveform by arranging the plurality of unit waveforms on the time axis, wherein the second processing period is an immediately succeeding processing period after the first processing period.Type: GrantFiled: August 30, 2012Date of Patent: May 2, 2017Assignee: Yamaha CorporationInventor: Hiraku Kayama
-
Patent number: 9552806Abstract: A sound synthesizing apparatus includes a processor coupled to a memory. The processor configured to execute computer-executable units comprising: an information acquirer adapted to acquire synthesis information which specifies a duration and an utterance content for each unit sound; a prolongation setter adapted to set whether prolongation is permitted or inhibited for each of a plurality of phonemes corresponding to the utterance content of the each unit sound; and a sound synthesizer adapted to generate a synthesized sound corresponding to the synthesis information by connecting a plurality of sound fragments corresponding to the utterance content of the each unit sound. The sound synthesizer prolongs a sound fragment corresponding to the phoneme the prolongation of which is permitted in accordance with the duration of the unit sound.Type: GrantFiled: February 26, 2013Date of Patent: January 24, 2017Assignee: Yamaha CorporationInventors: Hiraku Kayama, Motoki Ogasawara
-
Patent number: 9484013Abstract: A speech simulation system adapted for a user to communicate with others. The system has at least one sensor to sense controlled and coordinated body movement. The system has a computer processor connected to the at least one sensor. The system has a database memory connected to the computer processor. The system has software programming to operate the computer processor. The system has a feedback device connected to the computer processor and directed to the user. The system has an outward audio output device connected to the computer processor to provide sound and a speaker connected to the outward audio output device.Type: GrantFiled: February 19, 2013Date of Patent: November 1, 2016Inventor: Mary Elizabeth McCulloch
-
Patent number: 9368126Abstract: A method, system and computer readable storage medium for assessing speech prosody. The method includes the steps of: receiving input speech data; acquiring a prosody constraint; assessing prosody of the input speech data according to the prosody constraint; and providing assessment result where at least of the steps is carried out using a computer device.Type: GrantFiled: April 29, 2011Date of Patent: June 14, 2016Assignee: Nuance Communications, Inc.Inventors: Yong Qin, Qin Shi, Zhiwei Shuang, Shi Lei Zhang
-
Patent number: 9218798Abstract: Provided is a voice assist device in an electronic musical instrument in which tone selection or a sound setting corresponding to a key is performed in advance by pressing the operation button 1 while pressing one of the keys in the keyboard 2, including a changed state recognizing unit 3 that recognizes from a pressed key a changed state of tone selection or a sound setting determined corresponding to the key in advance, a setting item name storing unit 4 that stores a setting item name of the tone selection or sound setting as voice data, and a sound emitting unit 5 that emits a setting item name corresponding to the changed state, and the changed state recognizing unit 3 includes a voice assist recognizing unit 6 that detects a depression for a preset time or more of the operation button 1 prior to a depression of the key.Type: GrantFiled: August 5, 2015Date of Patent: December 22, 2015Assignee: KAWAI MUSICAL INSTRUMENTS MANUFACTURING CO., LTD.Inventors: Takuya Satoh, Kohtaro Ilimura, Sachie Ilimura
-
Patent number: 9135911Abstract: A system, method and program for acquiring from an input text a character string set and generating the pronunciation thereof which should be recognized as a word is disclosed.Type: GrantFiled: September 26, 2014Date of Patent: September 15, 2015Assignees: NEXGEN FLIGHT LLC, DOINITA DIANE SERBANInventors: Doinita Serban, Bhupat Raigaga
-
Patent number: 8983842Abstract: There is provided a speech processing apparatus including: a data obtaining unit which obtains music progression data defining a property of one or more time points or one or more time periods along progression of music; a determining unit which determines an output time point at which a speech is to be output during reproducing the music by utilizing the music progression data obtained by the data obtaining unit; and an audio output unit which outputs the speech at the output time point determined by the determining unit during reproducing the music.Type: GrantFiled: August 12, 2010Date of Patent: March 17, 2015Assignee: Sony CorporationInventors: Tetsuo Ikeda, Ken Miyashita, Tatsushi Nashida
-
Patent number: 8977551Abstract: The present invention provides a parametric speech synthesis method and a parametric speech synthesis system.Type: GrantFiled: October 27, 2011Date of Patent: March 10, 2015Assignee: Goertek Inc.Inventors: Fengliang Wu, Zhenhua Wu
-
Patent number: 8977552Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.Type: GrantFiled: May 28, 2014Date of Patent: March 10, 2015Assignee: AT&T Intellectual Property II, L.P.Inventors: Alistair D. Conkie, Ann K. Syrdal
-
Patent number: 8977550Abstract: Part units of speech information are arranged in a predetermined order to generate a sentence unit of a speech information set. To each of a plurality of speech part units of the speech information, an attribute of “interrupt possible after reproduction” with which reproduction of priority interrupt information can be started after the speech part unit of the speech information is reproduced or another attribute of “interrupt impossible after reproduction” with which reproduction of the priority interrupt information cannot be started even after the speech part unit of the speech information is reproduced is set. When the priority interrupt information having a high priority rank than the speech information set being currently reproduced is inputted, if the attribute of the speech information being reproduced at the point in time is “interrupt impossible after reproduction,” then the priority interrupt information is reproduced after the speech information is reproduced.Type: GrantFiled: May 6, 2011Date of Patent: March 10, 2015Assignee: Honda Motor Co., Ltd.Inventor: Tokujiro Kizaki
-
Publication number: 20150025892Abstract: A system and method for speech-to-singing synthesis is provided. The method includes deriving characteristics of a singing voice for a first individual and modifying vocal characteristics of a voice for a second individual in response to the characteristics of the singing voice of the first individual to generate a synthesized singing voice for the second individual.Type: ApplicationFiled: March 6, 2013Publication date: January 22, 2015Applicant: Agency for Science, Technology and ResearchInventors: Siu Wa Lee, Ling Cen, Haizhou Li, Yaozhu Paul Chan, Minghui Dong
-
Patent number: 8909538Abstract: Improved methods of presenting speech prompts to a user as part of an automated system that employs speech recognition or other voice input are described. The invention improves the user interface by providing in combination with at least one user prompt seeking a voice response, an enhanced user keyword prompt intended to facilitate the user selecting a keyword to speak in response to the user prompt. The enhanced keyword prompts may be the same words as those a user can speak as a reply to the user prompt but presented using a different audio presentation method, e.g., speech rate, audio level, or speaker voice, than used for the user prompt. In some cases, the user keyword prompts are different words from the expected user response keywords, or portions of words, e.g., truncated versions of keywords.Type: GrantFiled: November 11, 2013Date of Patent: December 9, 2014Assignee: Verizon Patent and Licensing Inc.Inventor: James Mark Kondziela
-
Patent number: 8898057Abstract: Disclosed is an encoding apparatus that can efficiently encode a signal that is a broad or extra-broad band signal or the like, thereby improving the quality of a decoded signal. This encoding apparatus includes a band establishing unit (301) that generate, based on the characteristic of the input signal, band establishment information to be used for dividing the band of the input signal to establish a first band part of lower frequency side and a second band part of higher frequency side; a lower frequency encoding unit (302) for encoding, based on the band establishment information, the input signal of the first band part to generate encoded lower frequency part information; and a higher frequency encoding unit (303) for encoding, based on the band establishment information, the input signal of the second band part to generate encoded higher frequency part information.Type: GrantFiled: October 22, 2010Date of Patent: November 25, 2014Assignee: Panasonic Intellectual Property Corporation of AmericaInventor: Tomofumi Yamanashi
-
Patent number: 8888494Abstract: One or more embodiments present a script to a user in an interactive script environment. A digital representation of a manuscript is analyzed. This digital representation includes a set of roles and a set of information associated with each role in the set of roles. An active role in the set of roles that is associated with a given user is identified based on the analyzing. At least a portion of the manuscript is presented to the given user via a user interface. The portion includes at least a subset of information in the set of information. Information within the set of information that is associated with the active role is presented in a visually different manner than information within the set of information that is associated with a non-active role, which is a role that is associated with a user other than the given user.Type: GrantFiled: June 27, 2011Date of Patent: November 18, 2014Inventor: Randall Lee Threewits
-
Patent number: 8868431Abstract: A recognition dictionary creation device identifies the language of a reading of an inputted text which is a target to be registered and adds a reading with phonemes in the language identified thereby to the target text to be registered, and also converts the reading of the target text to be registered from the phonemes in the language identified thereby to phonemes in a language to be recognized which is handled in voice recognition to create a recognition dictionary in which the converted reading of the target text to be registered is registered.Type: GrantFiled: February 5, 2010Date of Patent: October 21, 2014Assignee: Mitsubishi Electric CorporationInventors: Michihiro Yamazaki, Jun Ishii, Yasushi Ishikawa
-
Patent number: 8862477Abstract: A method and a processing device for managing an interactive speech recognition system is provided. Whether a voice input relates to expected input, at least partially, of any one of a group of menus different from a current menu is determined. If the voice input relates to the expected input, at least partially, of any one of the group of menus different from the current menu, skipping to the one of the group of menus is performed. The group of menus is different from the current menu include menus at multiple hierarchical levels.Type: GrantFiled: June 3, 2013Date of Patent: October 14, 2014Assignee: AT&T Intellectual Property II, L.P.Inventor: Hary E. Blanchard
-
Patent number: 8856008Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.Type: GrantFiled: September 18, 2013Date of Patent: October 7, 2014Assignee: Morphism LLCInventor: James H. Stephens, Jr.
-
Patent number: 8838441Abstract: A representation of an audio signal having a first, a second and a third frame is derived by estimating first warp information for the first and second frames and second warp information for the second and third frames, the warp information describing pitch information of the audio signal. First or second spectral coefficients for first and second frames or second and third frames are derived using first or second warp information and a first or second weighted representation of the first and second frames or second and third frames, the first or second weighted representation derived by applying a first or second window function to the first and second frames or second and third frames, wherein the first or second window function depends on the first or second warp information. The representation of the audio signal is generated including the first and the second spectral coefficients.Type: GrantFiled: February 14, 2013Date of Patent: September 16, 2014Assignee: Dolby International ABInventor: Lars Villemoes
-
Publication number: 20140236586Abstract: An method and apparatus that modifies static media, such as music files being played to a user of the device, upon the generation or receipt of an alert, notification or message, so that information in the alert, notification or message can be incorporated into the media files then communicated to the user. In a further embodiment, a user's response to the communicated information can be sensed using one or more sensors and transducers so as to provide feedback to the device, and then optionally to a node in a system.Type: ApplicationFiled: February 18, 2013Publication date: August 21, 2014Applicant: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL)Inventor: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL)
-
Patent number: 8775185Abstract: A method for converting translating text into speech with a speech sample library is provided. The method comprises converting translating an input text to a sequence of triphones; determining musical parameters of each phoneme in the sequence of triphones; detecting, in the speech sample library, speech segments having at least the determined musical parameters; and concatenating the detected speech segments.Type: GrantFiled: November 27, 2012Date of Patent: July 8, 2014Inventors: Gershon Silbert, Andres Hakim
-
Publication number: 20140180696Abstract: A high quality speech is reproduced with a small data amount in speech coding and decoding for performing compression coding and decoding of a speech signal to a digital signal.Type: ApplicationFiled: February 25, 2014Publication date: June 26, 2014Applicant: BlackBerry LimitedInventor: Tadashi YAMAURA
-
Patent number: 8751237Abstract: A sound control section (114) selects and outputs a text-to-speech item from items included in program information multiplexed with a broadcast signal; and starts or stops outputting the text-to-speech item, based on request from a remote controller control section (113). A sound generation section (115) converts the text-to-speech item to a sound signal. A speaker (109) reproduces the sound signal. The sound control section (114) compares each item of information about a program currently selected by user's operation of the remote controller, with each item of information about the previous program selected just before the user's operation. If an item of the currently selected program information is the same as the corresponding item of the operation-prior program information, and text-to-speech processing has been already completed for the item after the last change in the item, the sound control section (114) stops outputting the item to the sound generation section (115).Type: GrantFiled: February 23, 2011Date of Patent: June 10, 2014Assignee: Panasonic CorporationInventor: Koumei Kubota
-
Patent number: 8744841Abstract: An adaptive time/frequency-based encoding mode determination apparatus including a time domain feature extraction unit to generate a time domain feature by analysis of a time domain signal of an input audio signal, a frequency domain feature extraction unit to generate a frequency domain feature corresponding to each frequency band generated by division of a frequency domain corresponding to a frame of the input audio signal into a plurality of frequency domains, by analysis of a frequency domain signal of the input audio signal, and a mode determination unit to determine any one of a time-based encoding mode and a frequency-based encoding mode, with respect to the each frequency band, by use of the time domain feature and the frequency domain feature.Type: GrantFiled: September 21, 2006Date of Patent: June 3, 2014Assignee: SAMSUNG Electronics Co., Ltd.Inventors: Eun Mi Oh, Ki Hyun Choo, Jung-Hoe Kim, Chang Yong Son
-
Patent number: 8744851Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.Type: GrantFiled: August 13, 2013Date of Patent: June 3, 2014Assignee: AT&T Intellectual Property II, L.P.Inventors: Alistair Conkie, Ann K Syrdal
-
Patent number: 8731933Abstract: A speech synthesizing apparatus includes a selector configured to select a plurality of speech units for synthesizing a speech of a phoneme sequence by referring to speech unit information stored in an information memory. Speech unit waveforms corresponding to the speech units are acquired from a plurality of speech unit waveforms stored in a waveform memory, and the speech is synthesized by utilizing the speech unit waveforms acquired. When acquiring the speech unit waveforms, at least two speech unit waveforms from a continuous region of the waveform memory are copied onto a buffer by one access, wherein a data quantity of the at least two speech unit waveforms is less than or equal to a size of the buffer.Type: GrantFiled: April 10, 2013Date of Patent: May 20, 2014Assignee: Kabushiki Kaisha ToshibaInventor: Takehiko Kagoshima
-
Patent number: 8719030Abstract: The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary acoustic waves, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition.Type: GrantFiled: December 3, 2012Date of Patent: May 6, 2014Inventor: Chengjun Julian Chen
-
Patent number: 8706493Abstract: In one embodiment of a controllable prosody re-estimation system, a TTS/STS engine consists of a prosody prediction/estimation module, a prosody re-estimation module and a speech synthesis module. The prosody prediction/estimation module generates predicted or estimated prosody information. And then the prosody re-estimation module re-estimates the predicted or estimated prosody information and produces new prosody information, according to a set of controllable parameters provided by a controllable prosody parameter interface. The new prosody information is provided to the speech synthesis module to produce a synthesized speech.Type: GrantFiled: July 11, 2011Date of Patent: April 22, 2014Assignee: Industrial Technology Research InstituteInventors: Cheng-Yuan Lin, Chien-Hung Huang, Chih-Chung Kuo
-
Patent number: 8706497Abstract: A synthesis filter 106 synthesizes a plurality of wide-band speech signals by combining wide-band phoneme signals and sound source signals from a speech signal code book 105, and a distortion evaluation unit 107 selects one of the wide-band speech signals with a minimum waveform distortion with respect to an up-sampled narrow-band speech signal output from a sampling conversion unit 101. A first bandpass filter 103 extracts a frequency component outside a narrow-band of the wide-band speech signal and a band synthesis unit 104 combines it with the up-sampled narrow-band speech signal.Type: GrantFiled: October 22, 2010Date of Patent: April 22, 2014Assignee: Mitsubishi Electric CorporationInventors: Satoru Furuta, Hirohisa Tasaki