Synthesis Patents (Class 704/258)
-
Patent number: 9026445Abstract: A system and method to allow an author of an instant message to enable and control the production of audible speech to the recipient of the message. The voice of the author of the message is characterized into parameters compatible with a formative or articulative text-to-speech engine such that upon receipt, the receiving client device can generate audible speech signals from the message text according to the characterization of the author's voice. Alternatively, the author can store samples of his or her actual voice in a server so that, upon transmission of a message by the author to a recipient, the server extracts the samples needed only to synthesize the words in the text message, and delivers those to the receiving client device so that they are used by a client-side concatenative text-to-speech engine to generate audible speech signals having a close likeness to the actual voice of the author.Type: GrantFiled: March 20, 2013Date of Patent: May 5, 2015Assignee: Nuance Communications, Inc.Inventors: Terry Wade Niemeyer, Liliana Orozco
-
Publication number: 20150120303Abstract: According to an embodiment, a sentence set generating device includes an importance degree storage, a frequency storage, a calculator, and a selector. The importance degree storage is configured to store therein a degree of importance of each of a plurality of acoustic units. The frequency storage is configured to store therein a frequency of appearance of each of the acoustic units in a second sentence set. The calculator is configured to calculate a score of a first sentence included in a first sentence set, from a degree of rarity corresponding to the frequency of appearance of each acoustic unit in the first sentence and from a degree of importance of the each acoustic unit. The selector is configured to, from sentences included in the first sentence set, select a sentence having a score higher than other sentences, and add the selected sentence to the second sentence set.Type: ApplicationFiled: September 12, 2014Publication date: April 30, 2015Inventor: Yusuke Shinohara
-
Patent number: 9020821Abstract: An acquisition unit analyzes a text, and acquires phonemic and prosodic information. An editing unit edits a part of the phonemic and prosodic information. A speech synthesis unit converts the phonemic and prosodic information before editing the part to a first speech waveform, and converts the phonemic and prosodic information after editing the part to a second speech waveform. A period calculation unit calculates a contrast period corresponding to the part in the first speech waveform and the second speech waveform. A speech generation unit generates an output waveform by connecting a first partial waveform and a second partial waveform. The first partial waveform contains the contrast period of the first speech waveform. The second partial waveform contains the contrast period of the second speech waveform.Type: GrantFiled: September 19, 2011Date of Patent: April 28, 2015Assignee: Kabushiki Kaisha ToshibaInventor: Osamu Nishiyama
-
Publication number: 20150112686Abstract: The time difference calculation unit which calculates a time difference between own terminal and the another terminal, based on the time at which output of the first sound from the audio output module is started, a time at which input of a sound corresponding to the audio data to the audio input module is started, a time indicated by the first information, and a time indicated by the second information.Type: ApplicationFiled: September 26, 2014Publication date: April 23, 2015Applicant: OLYMPUS CORPORATIONInventor: Ryuichi Kiyoshige
-
Patent number: 9015032Abstract: Embodiments of the present invention provide a system, method, and program product to deliver an announcement to people, such as a public announcement. A computer receives input representative of audio from one or more people speaking in one or more natural languages. The computer processes the input to identify the languages being spoken, and identifies a relative proportion of each of the identified languages. Using these proportions, the computer determines one or more languages in which to deliver the announcement. The computer then causes to be delivered the announcement in the determined languages. In other embodiments, the computer can also determine an order in which to deliver the announcement. Further, the computer can transmit the announcement in the determined languages and order for delivery in aural or visual form.Type: GrantFiled: November 28, 2011Date of Patent: April 21, 2015Assignee: International Business Machines CorporationInventors: Sheri G. Daye, Peeyush Jaiswal, Aleksas J. Vitenas
-
Patent number: 9009051Abstract: According to one embodiment, a reading aloud support apparatus includes a reception unit, a first extraction unit, a second extraction unit, an acquisition unit, a generation unit, a presentation unit. The reception unit is configured to receive an instruction. The first extraction unit is configured to extract, as a partial document, a part of a document which corresponds to a range of words. The second extraction unit is configured to perform morphological analysis and to extract words as candidate words. The acquisition unit is configured to acquire attribute information items relates to the candidate words. The generation unit is configured to perform weighting relating to a value corresponding a distance and to determine each of candidate words to be preferentially presented to generate a presentation order. The presentation unit is configured to present the candidate words and the attribute information items in accordance with the presentation order.Type: GrantFiled: March 22, 2011Date of Patent: April 14, 2015Assignee: Kabushiki Kaisha ToshibaInventors: Kosei Fume, Masaru Suzuki, Yuji Shimizu, Tatsuya Izuha
-
Publication number: 20150100318Abstract: A method for decoding a speech signal is described. The method includes obtaining a packet. The method also includes obtaining a previous lag value. The method further includes limiting the previous lag value if the previous lag value is greater than a maximum lag threshold. The method additionally includes disallowing an adjustment to a number of synthesized peaks if a combination of the number of synthesized peaks and an estimated number of peaks is not valid.Type: ApplicationFiled: October 4, 2013Publication date: April 9, 2015Applicant: QUALCOMM IncorporatedInventors: Venkatraman Rajagopalan, Venkatesh Krishnan, Alok K. Gupta
-
Publication number: 20150097706Abstract: A micro-computer based aircraft system that creates aural messages based upon system-detected threats (e.g., low oil pressure). The messages are unique to the make and model of aircraft. Speech recognition allows the pilot to request aircraft-specific, customized aurally-delivered checklists and to respond via a challenge and response protocol. This permits a hands-free, timely, complete and prudent response to the threat or hazardous situation, while allowing the pilot the relative freedom to do what is paramount: first, fly the airplane (with minimum distraction).Type: ApplicationFiled: October 9, 2013Publication date: April 9, 2015Applicant: CMX Avionics, LLCInventors: Warren F. Perger, William G. Abbatt
-
Patent number: 9002703Abstract: The community-based generation of audio narrations for a text-based work leverages collaboration of a community of people to provide human-voiced audio readings. During the community-based generation, a collection of audio recordings for the text-based work may be collected from multiple human readers in a community. An audio recording for each section in the text-based work may be selected from the collection of audio recordings. The selected audio recordings may be then combined to produce an audio reading of at least a portion of the text-based work.Type: GrantFiled: September 28, 2011Date of Patent: April 7, 2015Assignee: Amazon Technologies, Inc.Inventor: Jay A. Crosley
-
Patent number: 9002711Abstract: According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.Type: GrantFiled: December 16, 2010Date of Patent: April 7, 2015Assignee: Kabushiki Kaisha ToshibaInventors: Ryo Morinaka, Takehiko Kagoshima
-
Patent number: 8996387Abstract: For clearing transaction data selected for a processing, there is generated in a portable data carrier (1) a transaction acoustic signal (003; 103; 203) (S007; S107; S207) upon whose acoustic reproduction by an end device (10) at least transaction data selected for the processing are reproduced superimposed acoustically with a melody specific to a user of the data carrier (1) (S009; S109; S209). The generated transaction acoustic signal (003; 103; 203) is electronically transferred to an end device (10) (S108; S208), which processes the selected transaction data (S011; S121; S216) only when the user of the data carrier (1) confirms vis-à-vis the end device (10) an at least partial match both of the acoustically reproduced melody with the user-specific melody and of the acoustically reproduced transaction data with the selected transaction data (S010; S110, S116; S210).Type: GrantFiled: September 8, 2009Date of Patent: March 31, 2015Assignee: Giesecke & Devrient GmbHInventors: Thomas Stocker, Michael Baldischweiler
-
Patent number: 8996377Abstract: A text-to-speech (TTS) engine combines recorded speech with synthesized speech from a TTS synthesizer based on text input. The TTS engine receives the text input and identifies the domain for the speech (e.g. navigation, dialing, . . . ). The identified domain is used in selecting domain specific speech recordings (e.g. pre-recorded static phrases such as “turn left”, “turn right” . . . ) from the input text. The speech recordings are obtained based on the static phrases for the domain that are identified from the input text. The TTS engine blends the static phrases with the TTS output to smooth the acoustic trajectory of the input text. The prosody of the static phrases is used to create similar prosody in the TTS output.Type: GrantFiled: July 12, 2012Date of Patent: March 31, 2015Assignee: Microsoft Technology Licensing, LLCInventors: Sheng Zhao, Peng Wang, Difei Gao, Yijian Wu, Binggong Ding, Shenghua Ye, Max Leung
-
Publication number: 20150088520Abstract: A candidate voice segment sequence generator 1 generates candidate voice segment sequences 102 for an input language information sequence 101 by using DB voice segments 105 in a voice segment database 4. An output voice segment sequence determinator 2 calculates a degree of match between the input language information sequence 101 and each of the candidate voice segment sequences 102 by using a parameter 107 showing a value according to a cooccurrence criterion 106 for cooccurrence between the input language information sequence 101 and a sound parameter showing the attribute of each of a plurality of candidate voice segments in each of the candidate voice segment sequences 102, and determines an output voice segment sequence 103 on the basis of the degree of match.Type: ApplicationFiled: February 21, 2014Publication date: March 26, 2015Applicant: Mitsubishi Electric CorporationInventors: Takahiro OTSUKA, Keigo KAWASHIMA, Satoru FURUTA, Tadashi YAMAURA
-
Publication number: 20150088521Abstract: A speech server includes: a speech terminal-specifying information management unit configured to manage speech terminal-specifying information; a reception unit configured to receive, from an external server, (i) the speech terminal-specifying information or user-specifying information and (ii) speech information indicative of speech content to be outputted as speech; and a speech instruction unit configured to instruct a speech terminal specified by the speech terminal-specifying information to output the speech content as speech.Type: ApplicationFiled: September 25, 2014Publication date: March 26, 2015Applicant: SHARP KABUSHIKI KAISHAInventors: Masahiro CHIBA, Kazunori SHIBATA
-
Patent number: 8990087Abstract: A method for providing text to speech from digital content in an electronic device is described. Digital content including a plurality of words and a pronunciation database is received. Pronunciation instructions are determined for the word using the digital content. Audio or speech is played for the word using the pronunciation instructions. As a result, the method provides text to speech on the electronic device based on the digital content.Type: GrantFiled: September 30, 2008Date of Patent: March 24, 2015Assignee: Amazon Technologies, Inc.Inventors: John Lattyak, John T. Kim, Robert Wai-Chi Chu, Laurent An Minh Nguyen
-
Patent number: 8990089Abstract: A speech output is generated from a text input written in a first language and containing inclusions in a second language. Words in the native language are pronounced with a native pronunciation and words in the foreign language are pronounced with a proficient foreign pronunciation. Language dependent phoneme symbols generated for words of the second language are replaced with language dependent phoneme symbols of the first language, where said replacing includes the steps of assigning to each language dependent phoneme symbol of the second language a language independent target phoneme symbol, mapping to each one language independent target phoneme symbol a language independent substitute phoneme symbol assignable to a language dependent substitute phoneme symbol of the first language, substituting the language dependent phoneme symbols of the second language by the language dependent substitute phoneme symbols of the first language.Type: GrantFiled: November 19, 2012Date of Patent: March 24, 2015Assignee: Nuance Communications, Inc.Inventors: Johan Wouters, Christof Traber, David Hagstrand, Alexis Wilpert, Jürgen Keller, Igor Nozhov
-
Patent number: 8983841Abstract: A network communication node includes an audio outputter that outputs an audible representation of data to be provided to a requester. The network communication node also includes a processor that determines a categorization of the data to be provided to the requester and that varies a pause between segments of the audible representation of the data in accordance with the categorization of the data to be provided to the requester.Type: GrantFiled: July 15, 2008Date of Patent: March 17, 2015Assignee: AT&T Intellectual Property, I, L.P.Inventors: Gregory Pulz, Steven Lewis, Charles Rajnai
-
Patent number: 8983835Abstract: An electronic device includes a voice processing unit, a wireless communication unit, and a combining unit. The voice processing unit receives speech signals. The wireless communication unit sends the speech signals to a server. The server converts the speech signals into a text message. The wireless communication unit receives the text message from the server. The combining unit combines the text message and the speech signals into a combined message. The wireless communication unit further sends the combined message to a recipient. A related server is also provided.Type: GrantFiled: June 30, 2011Date of Patent: March 17, 2015Assignees: Fu Tai Hua Industry (Shenzhen) Co., Ltd, Hon Hai Precision Industry Co., Ltd.Inventors: Shih-Fang Wong, Tsung-Jen Chuang, Bo Zhang
-
Patent number: 8977550Abstract: Part units of speech information are arranged in a predetermined order to generate a sentence unit of a speech information set. To each of a plurality of speech part units of the speech information, an attribute of “interrupt possible after reproduction” with which reproduction of priority interrupt information can be started after the speech part unit of the speech information is reproduced or another attribute of “interrupt impossible after reproduction” with which reproduction of the priority interrupt information cannot be started even after the speech part unit of the speech information is reproduced is set. When the priority interrupt information having a high priority rank than the speech information set being currently reproduced is inputted, if the attribute of the speech information being reproduced at the point in time is “interrupt impossible after reproduction,” then the priority interrupt information is reproduced after the speech information is reproduced.Type: GrantFiled: May 6, 2011Date of Patent: March 10, 2015Assignee: Honda Motor Co., Ltd.Inventor: Tokujiro Kizaki
-
Patent number: 8977551Abstract: The present invention provides a parametric speech synthesis method and a parametric speech synthesis system.Type: GrantFiled: October 27, 2011Date of Patent: March 10, 2015Assignee: Goertek Inc.Inventors: Fengliang Wu, Zhenhua Wu
-
Patent number: 8977552Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.Type: GrantFiled: May 28, 2014Date of Patent: March 10, 2015Assignee: AT&T Intellectual Property II, L.P.Inventors: Alistair D. Conkie, Ann K. Syrdal
-
Patent number: 8972248Abstract: A band broadening apparatus includes a processor configured to analyze a fundamental frequency based on an input signal bandlimited to a first band, generate a signal that includes a second band different from the first band based on the input signal, control a frequency response of the second band based on the fundamental frequency, reflect the frequency response of the second band on the signal that includes the second band and generate a frequency-response-adjusted signal that includes the second band, and synthesize the input signal and the frequency-response-adjusted signal.Type: GrantFiled: September 14, 2012Date of Patent: March 3, 2015Assignee: Fujitsu LimitedInventors: Takeshi Otani, Taro Togawa, Masanao Suzuki, Shusaku Ito
-
Patent number: 8972265Abstract: A content customization service is disclosed. The content customization service may identify one or more speakers in an item of content, and map one or more portions of the item of content to a speaker. A speaker may also be mapped to a voice. In one embodiment, the content customization service obtains portions of audio content synchronized to the mapped portions of the item of content. Each portion of audio content may be associated with a voice to which the speaker of the portion of the item of content is mapped. These portions of audio content may be combined to produce a combined item of audio content with multiple voices.Type: GrantFiled: June 18, 2012Date of Patent: March 3, 2015Assignee: Audible, Inc.Inventor: Kevin S. Lester
-
Patent number: 8965773Abstract: A method is provided for hierarchical coding of a digital audio signal comprising, for a current frame of the input signal: a core coding, delivering a scalar quantization index for each sample of the current frame and at least one enhancement coding delivering indices of scalar quantization for each coded sample of an enhancement signal. The enhancement coding comprises a step of obtaining a filter for shaping the coding noise used to determine a target signal and in that the indices of scalar quantization of said enhancement signal are determined by minimizing the error between a set of possible values of scalar quantization and said target signal. The coding method can also comprise a shaping of the coding noise for the core bitrate coding. A coder implementing the coding method is also provided.Type: GrantFiled: November 17, 2009Date of Patent: February 24, 2015Assignee: OrangeInventors: Balazs Kovesi, Stéphane Ragot, Alain Le Guyader
-
Patent number: 8965768Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for detecting and correcting abnormal stress patterns in unit-selection speech synthesis. A system practicing the method detects incorrect stress patterns in selected acoustic units representing speech to be synthesized, and corrects the incorrect stress patterns in the selected acoustic units to yield corrected stress patterns. The system can further synthesize speech based on the corrected stress patterns. In one aspect, the system also classifies the incorrect stress patterns using a machine learning algorithm such as a classification and regression tree, adaptive boosting, support vector machine, and maximum entropy. In this way a text-to-speech unit selection speech synthesizer can produce more natural sounding speech with suitable stress patterns regardless of the stress of units in a unit selection database.Type: GrantFiled: August 6, 2010Date of Patent: February 24, 2015Assignee: AT&T Intellectual Property I, L.P.Inventors: Yeon-Jun Kim, Mark Charles Beutnagel, Alistair D. Conkie, Ann K. Syrdal
-
Patent number: 8965764Abstract: Disclosed are an electronic apparatus and a voice recognition method for the same. The voice recognition method for the electronic apparatus includes: receiving an input voice of a user; determining characteristics of the user; and recognizing the input voice based on the determined characteristics of the user.Type: GrantFiled: January 7, 2010Date of Patent: February 24, 2015Assignee: Samsung Electronics Co., Ltd.Inventors: Hee-seob Ryu, Seung-kwon Park, Jong-ho Lea, Jong-hyuk Jang
-
Patent number: 8965767Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.Type: GrantFiled: May 20, 2014Date of Patent: February 24, 2015Assignee: AT&T Intellectual Property I, L.P.Inventors: Alistair D. Conkie, Ann K. Syrdal
-
Patent number: 8965769Abstract: According to one embodiment, a markup assistance apparatus includes an acquisition unit, a first calculation unit, a detection unit and a presentation unit. The acquisition unit acquires a feature amount for respective tags, each of the tags being used to control text-to-speech processing of a markup text. The first calculation unit calculates, for respective character strings, a variance of feature amounts of the tags which are assigned to the character string in a markup text. The detection unit detects a first character string assigned a first tag having the variance not less than a first threshold value as a first candidate including the tag to be corrected. The presentation unit presents the first candidate.Type: GrantFiled: September 24, 2012Date of Patent: February 24, 2015Assignee: Kabushiki Kaisha ToshibaInventors: Kouichirou Mori, Masahiro Morita
-
Patent number: 8959022Abstract: A method for determining a relatedness between a query video and a database video is provided. A processor extracts an audio stream from the query video to produce a query audio stream, extracts an audio stream from the database video to produce a database audio stream, produces a first-sized snippet from the query audio stream, and produces a first-sized snippet from the database audio stream. An estimation is made of a first most probable sequence of latent evidence probability vectors generating the first-sized audio snippet of the query audio stream. An estimation is made of a second most probable sequence of latent evidence probability vectors generating the first-sized audio snippet of the database audio stream. A similarity is measured between the first sequence and the second sequence producing a score of relatedness between the two snippets. Finally a relatedness is determined between the query video and a database video.Type: GrantFiled: November 19, 2012Date of Patent: February 17, 2015Assignee: Motorola Solutions, Inc.Inventors: Yang M. Cheng, Dusan Macho
-
Patent number: 8959021Abstract: Features are disclosed for providing a consistent interface for local and distributed text to speech (TTS) systems. Some portions of the TTS system, such as voices and TTS engine components, may be installed on a client device, and some may be present on a remote system accessible via a network link. Determinations can be made regarding which TTS system components to implement on the client device and which to implement on the remote server. The consistent interface facilitates connecting to or otherwise employing the TTS system through use of the same methods and techniques regardless of the which TTS system configuration is implemented.Type: GrantFiled: December 19, 2012Date of Patent: February 17, 2015Assignee: IVONA Software Sp. z.o.o.Inventors: Michal T. Kaszczuk, Lukasz M. Osowski
-
Patent number: 8954328Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. Further disclosed are techniques and systems for providing a plurality of characters at least some of the characters having multiple associated moods for use in document narration.Type: GrantFiled: January 14, 2010Date of Patent: February 10, 2015Assignee: K-NFB Reading Technology, Inc.Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
-
Patent number: 8954335Abstract: Appropriate processing results or appropriate apparatuses can be selected with a control device that selects the most probable speech recognition result by using speech recognition scores received with speech recognition results from two or more speech recognition apparatuses; sends the selected speech recognition result to two or more translation apparatuses respectively; selects the most probable translation result by using translation scores received with translation results from the two or more translation apparatuses; sends the selected translation result to two or more speech synthesis apparatuses respectively; receives a speech synthesis processing result including a speech synthesis result and a speech synthesis score from each of the two or more speech synthesis apparatuses; selects the most probable speech synthesis result by using the scores; and sends the selected speech synthesis result to a second terminal apparatus.Type: GrantFiled: March 3, 2010Date of Patent: February 10, 2015Assignee: National Institute of Information and Communications TechnologyInventors: Satoshi Nakamura, Eiichiro Sumita, Yutaka Ashikari, Noriyuki Kimura, Chiori Hori
-
Patent number: 8949123Abstract: The voice conversion method of a display apparatus includes: in response to the receipt of a first video frame, detecting one or more entities from the first video frame; in response to the selection of one of the detected entities, storing the selected entity; in response to the selection of one of a plurality of previously-stored voice samples, storing the selected voice sample in connection with the selected entity; and in response to the receipt of a second video frame including the selected entity, changing a voice of the selected entity based on the selected voice sample and outputting the changed voice.Type: GrantFiled: April 11, 2012Date of Patent: February 3, 2015Assignee: Samsung Electronics Co., Ltd.Inventors: Aditi Garg, Kasthuri Jayachand Yadlapalli
-
Patent number: 8949128Abstract: Techniques for providing speech output for speech-enabled applications. A synthesis system receives from a speech-enabled application a text input including a text transcription of a desired speech output. The synthesis system selects one or more audio recordings corresponding to one or more portions of the text input. In one aspect, the synthesis system selects from audio recordings provided by a developer of the speech-enabled application. In another aspect, the synthesis system selects an audio recording of a speaker speaking a plurality of words. The synthesis system forms a speech output including the one or more selected audio recordings and provides the speech output for the speech-enabled application.Type: GrantFiled: February 12, 2010Date of Patent: February 3, 2015Assignee: Nuance Communications, Inc.Inventors: Darren C. Meyer, Corinne Bos-Plachez, Martine Marguerite Staessen
-
Patent number: 8942983Abstract: The present invention relates to a method of text-based speech synthesis, wherein at least one portion of a text is specified; the intonation of each portion is determined; target speech sounds are associated with each portion; physical parameters of the target speech sounds are determined; speech sounds most similar in terms of the physical parameters to the target speech sounds are found in a speech database; and speech is synthesized as a sequence of the found speech sounds. The physical parameters of said target speech sounds are determined in accordance with the determined intonation. The present method, when used in a speech synthesizer, allows improved quality of synthesized speech due to precise reproduction of intonation.Type: GrantFiled: November 23, 2011Date of Patent: January 27, 2015Assignee: Speech Technology Centre, LimitedInventor: Mikhail Vasilievich Khitrov
-
Publication number: 20150025892Abstract: A system and method for speech-to-singing synthesis is provided. The method includes deriving characteristics of a singing voice for a first individual and modifying vocal characteristics of a voice for a second individual in response to the characteristics of the singing voice of the first individual to generate a synthesized singing voice for the second individual.Type: ApplicationFiled: March 6, 2013Publication date: January 22, 2015Applicant: Agency for Science, Technology and ResearchInventors: Siu Wa Lee, Ling Cen, Haizhou Li, Yaozhu Paul Chan, Minghui Dong
-
Patent number: 8930200Abstract: A vector joint encoding/decoding method and a vector joint encoder/decoder are provided, more than two vectors are jointly encoded, and an encoding index of at least one vector is split and then combined between different vectors, so that encoding idle spaces of different vectors can be recombined, thereby facilitating saving of encoding bits, and because an encoding index of a vector is split and then shorter split indexes are recombined, thereby facilitating reduction of requirements for the bit width of operating parts in encoding/decoding calculation.Type: GrantFiled: July 24, 2013Date of Patent: January 6, 2015Assignee: Huawei Technologies Co., LtdInventors: Fuwei Ma, Dejun Zhang, Lei Miao, Fengyan Qi
-
Publication number: 20150006180Abstract: A process and system for enhancing and customizing movie theatre sound includes receiving an input audio sound and enhancing the voice audio input in two or more harmonic and dynamic ranges by re-synthesizing the audio into a full range PCM wave. The enhancement includes the parallel processing the input audio via a low pass filter with dynamic offset, an envelope controlled bandpass filter, a high pass filter, adding an amount of dynamic synthesized sub bass to the audio and combining the four treated audio signals in a summing mixer with the original audio.Type: ApplicationFiled: February 21, 2014Publication date: January 1, 2015Applicant: Max Sound CorporationInventor: Lloyd Trammell
-
Patent number: 8924217Abstract: A communication converter is described for converting among speech signals and textual information, permitting communication between telephone users and textual instant communications users.Type: GrantFiled: April 28, 2011Date of Patent: December 30, 2014Assignee: Verizon Patent and Licensing Inc.Inventors: Richard G. Moore, Gregory L. Mumford, Duraisamy Gunasekar
-
Contextual conversion platform for generating prioritized replacement text for spoken content output
Patent number: 8918323Abstract: A contextual conversion platform, and method for converting text-to-speech, are described that can convert content of a target to spoken content. Embodiments of the contextual conversion platform can identify certain contextual characteristics of the content, from which can be generated a spoken content input. This spoken content input can include tokens, e.g., words and abbreviations, to be converted to the spoken content, as well as substitution tokens that are selected from contextual repositories based on the context identified by the contextual conversion platform.Type: GrantFiled: March 15, 2013Date of Patent: December 23, 2014Inventor: Daniel Ben-Ezri -
Patent number: 8918322Abstract: A personalized text-to-speech (pTTS) system provides a method for converting text data to speech data utilizing a pTTS template representing the voice characteristics of an individual. A memory stores executable program code that converts text data to speech data. Text data represents a textual message directed to a system user and speech data represents a spoken form of text data having the characteristics of an individual's voice. A processor executes the program code, and a storage device stores a pTTS template and may store speech data. The pTTS system can be used to provide various services that provide immediate spoken presentation of the speech data converted from text data and/or combine stored speech data with generated speech data for spoken presentation.Type: GrantFiled: June 20, 2007Date of Patent: December 23, 2014Assignee: AT&T Intellectual Property II, L.P.Inventors: Edmund Gale Acker, Frederick Murray Burg
-
Patent number: 8918313Abstract: A method of selectively performing signal processing in a first mode and in a second mode. In the first mode, a noise cancel signal having a signal characteristic to cancel an external noise component is generated based on a voice signal supplied from a microphone, and an input digital audio signal and the noise cancel signal are combined into a voice signal to be output through a speaker. In the second mode, a sound process for vocal voice is performed on a voice signal supplied from a microphone, a vocal voice component is canceled from a digital audio signal of input music to generate a karaoke signal, and the karaoke signal and the vocal signal are combined into a voice signal to be output through a speaker. The first mode corresponds to an audio replay operation accompanied by noise cancel, and the second mode corresponds to a karaoke operation.Type: GrantFiled: May 16, 2012Date of Patent: December 23, 2014Assignee: Sony CorporationInventors: Kazunobu Ookuri, Kohei Asada, Yasunobu Murata
-
Patent number: 8917876Abstract: SPL monitoring systems are provided. A SPL monitoring system includes an audio transducer configured to receive sound pressure, a logic circuit which calculates a safe time duration over which a user can receive current sound pressure values and an indicator element which produces a notification when an indicator level occurs. An SPL monitoring information system includes a database which stores data such as a list of earpiece devices and associated instrument response functions. The logic circuit compares a request with the data in the database and retrieves a subset of data and sends it to an output control unit. The output control unit sends the subset of data to a sending unit.Type: GrantFiled: June 14, 2007Date of Patent: December 23, 2014Assignee: Personics Holdings, LLC.Inventor: Steven W. Goldstein
-
Patent number: 8914290Abstract: Method and apparatus that dynamically adjusts operational parameters of a text-to-speech engine in a speech-based system. A voice engine or other application of a device provides a mechanism to alter the adjustable operational parameters of the text-to-speech engine. In response to one or more environmental conditions, the adjustable operational parameters of the text-to-speech engine are modified to increase the intelligibility of synthesized speech.Type: GrantFiled: May 18, 2012Date of Patent: December 16, 2014Assignee: Vocollect, Inc.Inventors: James Hendrickson, Debra Drylie Scott, Duane Littleton, John Pecorari, Arkadiusz Slusarczyk
-
Patent number: 8914291Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.Type: GrantFiled: September 24, 2013Date of Patent: December 16, 2014Assignee: Nuance Communications, Inc.Inventors: Darren C. Meyer, Stephen R. Springer
-
Patent number: 8909538Abstract: Improved methods of presenting speech prompts to a user as part of an automated system that employs speech recognition or other voice input are described. The invention improves the user interface by providing in combination with at least one user prompt seeking a voice response, an enhanced user keyword prompt intended to facilitate the user selecting a keyword to speak in response to the user prompt. The enhanced keyword prompts may be the same words as those a user can speak as a reply to the user prompt but presented using a different audio presentation method, e.g., speech rate, audio level, or speaker voice, than used for the user prompt. In some cases, the user keyword prompts are different words from the expected user response keywords, or portions of words, e.g., truncated versions of keywords.Type: GrantFiled: November 11, 2013Date of Patent: December 9, 2014Assignee: Verizon Patent and Licensing Inc.Inventor: James Mark Kondziela
-
Publication number: 20140350940Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for unit selection synthesis. The method causes a computing device to add a supplemental phoneset to a speech synthesizer front end having an existing phoneset, modify a unit preselection process based on the supplemental phoneset, preselect units from the supplemental phoneset and the existing phoneset based on the modified unit preselection process, and generate speech based on the preselected units. The supplemental phoneset can be a variation of the existing phoneset, can include a word boundary feature, can include a cluster feature where initial consonant clusters and some word boundaries are marked with diacritics, can include a function word feature which marks units as originating from a function word or a content word, and/or can include a pre-vocalic or post-vocalic feature. The speech synthesizer front end can incorporates the supplemental phoneset as an extra feature.Type: ApplicationFiled: August 7, 2014Publication date: November 27, 2014Inventors: Alistair D. CONKIE, Mark BEUTNAGEL, Yeon-Jun KIM, Ann K. SYRDAL
-
Patent number: 8898066Abstract: A multi-lingual text-to-speech system and method processes a text to be synthesized via an acoustic-prosodic model selection module and an acoustic-prosodic model mergence module, and obtains a phonetic unit transformation table. In an online phase, the acoustic-prosodic model selection module, according to the text and a phonetic unit transcription corresponding to the text, uses at least a set controllable accent weighting parameter to select a transformation combination and find a second and a first acoustic-prosodic models. The acoustic-prosodic model mergence module merges the two acoustic-prosodic models into a merged acoustic-prosodic model, according to the at least a controllable accent weighting parameter, processes all transformations in the transformation combination and generates a merged acoustic-prosodic model sequence. A speech synthesizer and the merged acoustic-prosodic model sequence are further applied to synthesize the text into an L1-accent L2 speech.Type: GrantFiled: August 25, 2011Date of Patent: November 25, 2014Assignee: Industrial Technology Research InstituteInventors: Jen-Yu Li, Jia-Jang Tu, Chih-Chung Kuo
-
Patent number: 8892442Abstract: Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance.Type: GrantFiled: February 17, 2014Date of Patent: November 18, 2014Assignee: AT&T Intellectual Property I, L.P.Inventor: Horst J. Schroeter
-
Patent number: 8892230Abstract: A multicore system 2 includes a main system program 610 that operates on a first processor core 61 and stores synthesized audio data, which is mixed audio data, to a buffer for DMA transfer 63, a standby program 620 that operates on a second processor 62, and an audio output unit 64 that sequentially stores the synthesized audio data transferred from the buffer for DMA transfer 63 and plays the stored synthesized audio data. When an amount of storage of the synthesized audio data stored to the buffer for DMA transfer 63 has not reached a predetermined amount of data determined according to the amount of storage of the synthesized audio data stored to the audio output unit 64, the standby system program 620 takes over and executes the mixing and the storage of the synthesized audio data that is executed by the main system program 610.Type: GrantFiled: August 4, 2010Date of Patent: November 18, 2014Assignee: NEC CorporationInventor: Kentaro Sasagawa