Synthesis Patents (Class 704/258)
  • Patent number: 9026445
    Abstract: A system and method to allow an author of an instant message to enable and control the production of audible speech to the recipient of the message. The voice of the author of the message is characterized into parameters compatible with a formative or articulative text-to-speech engine such that upon receipt, the receiving client device can generate audible speech signals from the message text according to the characterization of the author's voice. Alternatively, the author can store samples of his or her actual voice in a server so that, upon transmission of a message by the author to a recipient, the server extracts the samples needed only to synthesize the words in the text message, and delivers those to the receiving client device so that they are used by a client-side concatenative text-to-speech engine to generate audible speech signals having a close likeness to the actual voice of the author.
    Type: Grant
    Filed: March 20, 2013
    Date of Patent: May 5, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: Terry Wade Niemeyer, Liliana Orozco
  • Publication number: 20150120303
    Abstract: According to an embodiment, a sentence set generating device includes an importance degree storage, a frequency storage, a calculator, and a selector. The importance degree storage is configured to store therein a degree of importance of each of a plurality of acoustic units. The frequency storage is configured to store therein a frequency of appearance of each of the acoustic units in a second sentence set. The calculator is configured to calculate a score of a first sentence included in a first sentence set, from a degree of rarity corresponding to the frequency of appearance of each acoustic unit in the first sentence and from a degree of importance of the each acoustic unit. The selector is configured to, from sentences included in the first sentence set, select a sentence having a score higher than other sentences, and add the selected sentence to the second sentence set.
    Type: Application
    Filed: September 12, 2014
    Publication date: April 30, 2015
    Inventor: Yusuke Shinohara
  • Patent number: 9020821
    Abstract: An acquisition unit analyzes a text, and acquires phonemic and prosodic information. An editing unit edits a part of the phonemic and prosodic information. A speech synthesis unit converts the phonemic and prosodic information before editing the part to a first speech waveform, and converts the phonemic and prosodic information after editing the part to a second speech waveform. A period calculation unit calculates a contrast period corresponding to the part in the first speech waveform and the second speech waveform. A speech generation unit generates an output waveform by connecting a first partial waveform and a second partial waveform. The first partial waveform contains the contrast period of the first speech waveform. The second partial waveform contains the contrast period of the second speech waveform.
    Type: Grant
    Filed: September 19, 2011
    Date of Patent: April 28, 2015
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Osamu Nishiyama
  • Publication number: 20150112686
    Abstract: The time difference calculation unit which calculates a time difference between own terminal and the another terminal, based on the time at which output of the first sound from the audio output module is started, a time at which input of a sound corresponding to the audio data to the audio input module is started, a time indicated by the first information, and a time indicated by the second information.
    Type: Application
    Filed: September 26, 2014
    Publication date: April 23, 2015
    Applicant: OLYMPUS CORPORATION
    Inventor: Ryuichi Kiyoshige
  • Patent number: 9015032
    Abstract: Embodiments of the present invention provide a system, method, and program product to deliver an announcement to people, such as a public announcement. A computer receives input representative of audio from one or more people speaking in one or more natural languages. The computer processes the input to identify the languages being spoken, and identifies a relative proportion of each of the identified languages. Using these proportions, the computer determines one or more languages in which to deliver the announcement. The computer then causes to be delivered the announcement in the determined languages. In other embodiments, the computer can also determine an order in which to deliver the announcement. Further, the computer can transmit the announcement in the determined languages and order for delivery in aural or visual form.
    Type: Grant
    Filed: November 28, 2011
    Date of Patent: April 21, 2015
    Assignee: International Business Machines Corporation
    Inventors: Sheri G. Daye, Peeyush Jaiswal, Aleksas J. Vitenas
  • Patent number: 9009051
    Abstract: According to one embodiment, a reading aloud support apparatus includes a reception unit, a first extraction unit, a second extraction unit, an acquisition unit, a generation unit, a presentation unit. The reception unit is configured to receive an instruction. The first extraction unit is configured to extract, as a partial document, a part of a document which corresponds to a range of words. The second extraction unit is configured to perform morphological analysis and to extract words as candidate words. The acquisition unit is configured to acquire attribute information items relates to the candidate words. The generation unit is configured to perform weighting relating to a value corresponding a distance and to determine each of candidate words to be preferentially presented to generate a presentation order. The presentation unit is configured to present the candidate words and the attribute information items in accordance with the presentation order.
    Type: Grant
    Filed: March 22, 2011
    Date of Patent: April 14, 2015
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Kosei Fume, Masaru Suzuki, Yuji Shimizu, Tatsuya Izuha
  • Publication number: 20150100318
    Abstract: A method for decoding a speech signal is described. The method includes obtaining a packet. The method also includes obtaining a previous lag value. The method further includes limiting the previous lag value if the previous lag value is greater than a maximum lag threshold. The method additionally includes disallowing an adjustment to a number of synthesized peaks if a combination of the number of synthesized peaks and an estimated number of peaks is not valid.
    Type: Application
    Filed: October 4, 2013
    Publication date: April 9, 2015
    Applicant: QUALCOMM Incorporated
    Inventors: Venkatraman Rajagopalan, Venkatesh Krishnan, Alok K. Gupta
  • Publication number: 20150097706
    Abstract: A micro-computer based aircraft system that creates aural messages based upon system-detected threats (e.g., low oil pressure). The messages are unique to the make and model of aircraft. Speech recognition allows the pilot to request aircraft-specific, customized aurally-delivered checklists and to respond via a challenge and response protocol. This permits a hands-free, timely, complete and prudent response to the threat or hazardous situation, while allowing the pilot the relative freedom to do what is paramount: first, fly the airplane (with minimum distraction).
    Type: Application
    Filed: October 9, 2013
    Publication date: April 9, 2015
    Applicant: CMX Avionics, LLC
    Inventors: Warren F. Perger, William G. Abbatt
  • Patent number: 9002703
    Abstract: The community-based generation of audio narrations for a text-based work leverages collaboration of a community of people to provide human-voiced audio readings. During the community-based generation, a collection of audio recordings for the text-based work may be collected from multiple human readers in a community. An audio recording for each section in the text-based work may be selected from the collection of audio recordings. The selected audio recordings may be then combined to produce an audio reading of at least a portion of the text-based work.
    Type: Grant
    Filed: September 28, 2011
    Date of Patent: April 7, 2015
    Assignee: Amazon Technologies, Inc.
    Inventor: Jay A. Crosley
  • Patent number: 9002711
    Abstract: According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.
    Type: Grant
    Filed: December 16, 2010
    Date of Patent: April 7, 2015
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Ryo Morinaka, Takehiko Kagoshima
  • Patent number: 8996387
    Abstract: For clearing transaction data selected for a processing, there is generated in a portable data carrier (1) a transaction acoustic signal (003; 103; 203) (S007; S107; S207) upon whose acoustic reproduction by an end device (10) at least transaction data selected for the processing are reproduced superimposed acoustically with a melody specific to a user of the data carrier (1) (S009; S109; S209). The generated transaction acoustic signal (003; 103; 203) is electronically transferred to an end device (10) (S108; S208), which processes the selected transaction data (S011; S121; S216) only when the user of the data carrier (1) confirms vis-à-vis the end device (10) an at least partial match both of the acoustically reproduced melody with the user-specific melody and of the acoustically reproduced transaction data with the selected transaction data (S010; S110, S116; S210).
    Type: Grant
    Filed: September 8, 2009
    Date of Patent: March 31, 2015
    Assignee: Giesecke & Devrient GmbH
    Inventors: Thomas Stocker, Michael Baldischweiler
  • Patent number: 8996377
    Abstract: A text-to-speech (TTS) engine combines recorded speech with synthesized speech from a TTS synthesizer based on text input. The TTS engine receives the text input and identifies the domain for the speech (e.g. navigation, dialing, . . . ). The identified domain is used in selecting domain specific speech recordings (e.g. pre-recorded static phrases such as “turn left”, “turn right” . . . ) from the input text. The speech recordings are obtained based on the static phrases for the domain that are identified from the input text. The TTS engine blends the static phrases with the TTS output to smooth the acoustic trajectory of the input text. The prosody of the static phrases is used to create similar prosody in the TTS output.
    Type: Grant
    Filed: July 12, 2012
    Date of Patent: March 31, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Sheng Zhao, Peng Wang, Difei Gao, Yijian Wu, Binggong Ding, Shenghua Ye, Max Leung
  • Publication number: 20150088520
    Abstract: A candidate voice segment sequence generator 1 generates candidate voice segment sequences 102 for an input language information sequence 101 by using DB voice segments 105 in a voice segment database 4. An output voice segment sequence determinator 2 calculates a degree of match between the input language information sequence 101 and each of the candidate voice segment sequences 102 by using a parameter 107 showing a value according to a cooccurrence criterion 106 for cooccurrence between the input language information sequence 101 and a sound parameter showing the attribute of each of a plurality of candidate voice segments in each of the candidate voice segment sequences 102, and determines an output voice segment sequence 103 on the basis of the degree of match.
    Type: Application
    Filed: February 21, 2014
    Publication date: March 26, 2015
    Applicant: Mitsubishi Electric Corporation
    Inventors: Takahiro OTSUKA, Keigo KAWASHIMA, Satoru FURUTA, Tadashi YAMAURA
  • Publication number: 20150088521
    Abstract: A speech server includes: a speech terminal-specifying information management unit configured to manage speech terminal-specifying information; a reception unit configured to receive, from an external server, (i) the speech terminal-specifying information or user-specifying information and (ii) speech information indicative of speech content to be outputted as speech; and a speech instruction unit configured to instruct a speech terminal specified by the speech terminal-specifying information to output the speech content as speech.
    Type: Application
    Filed: September 25, 2014
    Publication date: March 26, 2015
    Applicant: SHARP KABUSHIKI KAISHA
    Inventors: Masahiro CHIBA, Kazunori SHIBATA
  • Patent number: 8990087
    Abstract: A method for providing text to speech from digital content in an electronic device is described. Digital content including a plurality of words and a pronunciation database is received. Pronunciation instructions are determined for the word using the digital content. Audio or speech is played for the word using the pronunciation instructions. As a result, the method provides text to speech on the electronic device based on the digital content.
    Type: Grant
    Filed: September 30, 2008
    Date of Patent: March 24, 2015
    Assignee: Amazon Technologies, Inc.
    Inventors: John Lattyak, John T. Kim, Robert Wai-Chi Chu, Laurent An Minh Nguyen
  • Patent number: 8990089
    Abstract: A speech output is generated from a text input written in a first language and containing inclusions in a second language. Words in the native language are pronounced with a native pronunciation and words in the foreign language are pronounced with a proficient foreign pronunciation. Language dependent phoneme symbols generated for words of the second language are replaced with language dependent phoneme symbols of the first language, where said replacing includes the steps of assigning to each language dependent phoneme symbol of the second language a language independent target phoneme symbol, mapping to each one language independent target phoneme symbol a language independent substitute phoneme symbol assignable to a language dependent substitute phoneme symbol of the first language, substituting the language dependent phoneme symbols of the second language by the language dependent substitute phoneme symbols of the first language.
    Type: Grant
    Filed: November 19, 2012
    Date of Patent: March 24, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: Johan Wouters, Christof Traber, David Hagstrand, Alexis Wilpert, Jürgen Keller, Igor Nozhov
  • Patent number: 8983841
    Abstract: A network communication node includes an audio outputter that outputs an audible representation of data to be provided to a requester. The network communication node also includes a processor that determines a categorization of the data to be provided to the requester and that varies a pause between segments of the audible representation of the data in accordance with the categorization of the data to be provided to the requester.
    Type: Grant
    Filed: July 15, 2008
    Date of Patent: March 17, 2015
    Assignee: AT&T Intellectual Property, I, L.P.
    Inventors: Gregory Pulz, Steven Lewis, Charles Rajnai
  • Patent number: 8983835
    Abstract: An electronic device includes a voice processing unit, a wireless communication unit, and a combining unit. The voice processing unit receives speech signals. The wireless communication unit sends the speech signals to a server. The server converts the speech signals into a text message. The wireless communication unit receives the text message from the server. The combining unit combines the text message and the speech signals into a combined message. The wireless communication unit further sends the combined message to a recipient. A related server is also provided.
    Type: Grant
    Filed: June 30, 2011
    Date of Patent: March 17, 2015
    Assignees: Fu Tai Hua Industry (Shenzhen) Co., Ltd, Hon Hai Precision Industry Co., Ltd.
    Inventors: Shih-Fang Wong, Tsung-Jen Chuang, Bo Zhang
  • Patent number: 8977550
    Abstract: Part units of speech information are arranged in a predetermined order to generate a sentence unit of a speech information set. To each of a plurality of speech part units of the speech information, an attribute of “interrupt possible after reproduction” with which reproduction of priority interrupt information can be started after the speech part unit of the speech information is reproduced or another attribute of “interrupt impossible after reproduction” with which reproduction of the priority interrupt information cannot be started even after the speech part unit of the speech information is reproduced is set. When the priority interrupt information having a high priority rank than the speech information set being currently reproduced is inputted, if the attribute of the speech information being reproduced at the point in time is “interrupt impossible after reproduction,” then the priority interrupt information is reproduced after the speech information is reproduced.
    Type: Grant
    Filed: May 6, 2011
    Date of Patent: March 10, 2015
    Assignee: Honda Motor Co., Ltd.
    Inventor: Tokujiro Kizaki
  • Patent number: 8977551
    Abstract: The present invention provides a parametric speech synthesis method and a parametric speech synthesis system.
    Type: Grant
    Filed: October 27, 2011
    Date of Patent: March 10, 2015
    Assignee: Goertek Inc.
    Inventors: Fengliang Wu, Zhenhua Wu
  • Patent number: 8977552
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: May 28, 2014
    Date of Patent: March 10, 2015
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 8972248
    Abstract: A band broadening apparatus includes a processor configured to analyze a fundamental frequency based on an input signal bandlimited to a first band, generate a signal that includes a second band different from the first band based on the input signal, control a frequency response of the second band based on the fundamental frequency, reflect the frequency response of the second band on the signal that includes the second band and generate a frequency-response-adjusted signal that includes the second band, and synthesize the input signal and the frequency-response-adjusted signal.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: March 3, 2015
    Assignee: Fujitsu Limited
    Inventors: Takeshi Otani, Taro Togawa, Masanao Suzuki, Shusaku Ito
  • Patent number: 8972265
    Abstract: A content customization service is disclosed. The content customization service may identify one or more speakers in an item of content, and map one or more portions of the item of content to a speaker. A speaker may also be mapped to a voice. In one embodiment, the content customization service obtains portions of audio content synchronized to the mapped portions of the item of content. Each portion of audio content may be associated with a voice to which the speaker of the portion of the item of content is mapped. These portions of audio content may be combined to produce a combined item of audio content with multiple voices.
    Type: Grant
    Filed: June 18, 2012
    Date of Patent: March 3, 2015
    Assignee: Audible, Inc.
    Inventor: Kevin S. Lester
  • Patent number: 8965773
    Abstract: A method is provided for hierarchical coding of a digital audio signal comprising, for a current frame of the input signal: a core coding, delivering a scalar quantization index for each sample of the current frame and at least one enhancement coding delivering indices of scalar quantization for each coded sample of an enhancement signal. The enhancement coding comprises a step of obtaining a filter for shaping the coding noise used to determine a target signal and in that the indices of scalar quantization of said enhancement signal are determined by minimizing the error between a set of possible values of scalar quantization and said target signal. The coding method can also comprise a shaping of the coding noise for the core bitrate coding. A coder implementing the coding method is also provided.
    Type: Grant
    Filed: November 17, 2009
    Date of Patent: February 24, 2015
    Assignee: Orange
    Inventors: Balazs Kovesi, Stéphane Ragot, Alain Le Guyader
  • Patent number: 8965768
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for detecting and correcting abnormal stress patterns in unit-selection speech synthesis. A system practicing the method detects incorrect stress patterns in selected acoustic units representing speech to be synthesized, and corrects the incorrect stress patterns in the selected acoustic units to yield corrected stress patterns. The system can further synthesize speech based on the corrected stress patterns. In one aspect, the system also classifies the incorrect stress patterns using a machine learning algorithm such as a classification and regression tree, adaptive boosting, support vector machine, and maximum entropy. In this way a text-to-speech unit selection speech synthesizer can produce more natural sounding speech with suitable stress patterns regardless of the stress of units in a unit selection database.
    Type: Grant
    Filed: August 6, 2010
    Date of Patent: February 24, 2015
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Yeon-Jun Kim, Mark Charles Beutnagel, Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 8965764
    Abstract: Disclosed are an electronic apparatus and a voice recognition method for the same. The voice recognition method for the electronic apparatus includes: receiving an input voice of a user; determining characteristics of the user; and recognizing the input voice based on the determined characteristics of the user.
    Type: Grant
    Filed: January 7, 2010
    Date of Patent: February 24, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hee-seob Ryu, Seung-kwon Park, Jong-ho Lea, Jong-hyuk Jang
  • Patent number: 8965767
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.
    Type: Grant
    Filed: May 20, 2014
    Date of Patent: February 24, 2015
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 8965769
    Abstract: According to one embodiment, a markup assistance apparatus includes an acquisition unit, a first calculation unit, a detection unit and a presentation unit. The acquisition unit acquires a feature amount for respective tags, each of the tags being used to control text-to-speech processing of a markup text. The first calculation unit calculates, for respective character strings, a variance of feature amounts of the tags which are assigned to the character string in a markup text. The detection unit detects a first character string assigned a first tag having the variance not less than a first threshold value as a first candidate including the tag to be corrected. The presentation unit presents the first candidate.
    Type: Grant
    Filed: September 24, 2012
    Date of Patent: February 24, 2015
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Kouichirou Mori, Masahiro Morita
  • Patent number: 8959022
    Abstract: A method for determining a relatedness between a query video and a database video is provided. A processor extracts an audio stream from the query video to produce a query audio stream, extracts an audio stream from the database video to produce a database audio stream, produces a first-sized snippet from the query audio stream, and produces a first-sized snippet from the database audio stream. An estimation is made of a first most probable sequence of latent evidence probability vectors generating the first-sized audio snippet of the query audio stream. An estimation is made of a second most probable sequence of latent evidence probability vectors generating the first-sized audio snippet of the database audio stream. A similarity is measured between the first sequence and the second sequence producing a score of relatedness between the two snippets. Finally a relatedness is determined between the query video and a database video.
    Type: Grant
    Filed: November 19, 2012
    Date of Patent: February 17, 2015
    Assignee: Motorola Solutions, Inc.
    Inventors: Yang M. Cheng, Dusan Macho
  • Patent number: 8959021
    Abstract: Features are disclosed for providing a consistent interface for local and distributed text to speech (TTS) systems. Some portions of the TTS system, such as voices and TTS engine components, may be installed on a client device, and some may be present on a remote system accessible via a network link. Determinations can be made regarding which TTS system components to implement on the client device and which to implement on the remote server. The consistent interface facilitates connecting to or otherwise employing the TTS system through use of the same methods and techniques regardless of the which TTS system configuration is implemented.
    Type: Grant
    Filed: December 19, 2012
    Date of Patent: February 17, 2015
    Assignee: IVONA Software Sp. z.o.o.
    Inventors: Michal T. Kaszczuk, Lukasz M. Osowski
  • Patent number: 8954328
    Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. Further disclosed are techniques and systems for providing a plurality of characters at least some of the characters having multiple associated moods for use in document narration.
    Type: Grant
    Filed: January 14, 2010
    Date of Patent: February 10, 2015
    Assignee: K-NFB Reading Technology, Inc.
    Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
  • Patent number: 8954335
    Abstract: Appropriate processing results or appropriate apparatuses can be selected with a control device that selects the most probable speech recognition result by using speech recognition scores received with speech recognition results from two or more speech recognition apparatuses; sends the selected speech recognition result to two or more translation apparatuses respectively; selects the most probable translation result by using translation scores received with translation results from the two or more translation apparatuses; sends the selected translation result to two or more speech synthesis apparatuses respectively; receives a speech synthesis processing result including a speech synthesis result and a speech synthesis score from each of the two or more speech synthesis apparatuses; selects the most probable speech synthesis result by using the scores; and sends the selected speech synthesis result to a second terminal apparatus.
    Type: Grant
    Filed: March 3, 2010
    Date of Patent: February 10, 2015
    Assignee: National Institute of Information and Communications Technology
    Inventors: Satoshi Nakamura, Eiichiro Sumita, Yutaka Ashikari, Noriyuki Kimura, Chiori Hori
  • Patent number: 8949123
    Abstract: The voice conversion method of a display apparatus includes: in response to the receipt of a first video frame, detecting one or more entities from the first video frame; in response to the selection of one of the detected entities, storing the selected entity; in response to the selection of one of a plurality of previously-stored voice samples, storing the selected voice sample in connection with the selected entity; and in response to the receipt of a second video frame including the selected entity, changing a voice of the selected entity based on the selected voice sample and outputting the changed voice.
    Type: Grant
    Filed: April 11, 2012
    Date of Patent: February 3, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Aditi Garg, Kasthuri Jayachand Yadlapalli
  • Patent number: 8949128
    Abstract: Techniques for providing speech output for speech-enabled applications. A synthesis system receives from a speech-enabled application a text input including a text transcription of a desired speech output. The synthesis system selects one or more audio recordings corresponding to one or more portions of the text input. In one aspect, the synthesis system selects from audio recordings provided by a developer of the speech-enabled application. In another aspect, the synthesis system selects an audio recording of a speaker speaking a plurality of words. The synthesis system forms a speech output including the one or more selected audio recordings and provides the speech output for the speech-enabled application.
    Type: Grant
    Filed: February 12, 2010
    Date of Patent: February 3, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: Darren C. Meyer, Corinne Bos-Plachez, Martine Marguerite Staessen
  • Patent number: 8942983
    Abstract: The present invention relates to a method of text-based speech synthesis, wherein at least one portion of a text is specified; the intonation of each portion is determined; target speech sounds are associated with each portion; physical parameters of the target speech sounds are determined; speech sounds most similar in terms of the physical parameters to the target speech sounds are found in a speech database; and speech is synthesized as a sequence of the found speech sounds. The physical parameters of said target speech sounds are determined in accordance with the determined intonation. The present method, when used in a speech synthesizer, allows improved quality of synthesized speech due to precise reproduction of intonation.
    Type: Grant
    Filed: November 23, 2011
    Date of Patent: January 27, 2015
    Assignee: Speech Technology Centre, Limited
    Inventor: Mikhail Vasilievich Khitrov
  • Publication number: 20150025892
    Abstract: A system and method for speech-to-singing synthesis is provided. The method includes deriving characteristics of a singing voice for a first individual and modifying vocal characteristics of a voice for a second individual in response to the characteristics of the singing voice of the first individual to generate a synthesized singing voice for the second individual.
    Type: Application
    Filed: March 6, 2013
    Publication date: January 22, 2015
    Applicant: Agency for Science, Technology and Research
    Inventors: Siu Wa Lee, Ling Cen, Haizhou Li, Yaozhu Paul Chan, Minghui Dong
  • Patent number: 8930200
    Abstract: A vector joint encoding/decoding method and a vector joint encoder/decoder are provided, more than two vectors are jointly encoded, and an encoding index of at least one vector is split and then combined between different vectors, so that encoding idle spaces of different vectors can be recombined, thereby facilitating saving of encoding bits, and because an encoding index of a vector is split and then shorter split indexes are recombined, thereby facilitating reduction of requirements for the bit width of operating parts in encoding/decoding calculation.
    Type: Grant
    Filed: July 24, 2013
    Date of Patent: January 6, 2015
    Assignee: Huawei Technologies Co., Ltd
    Inventors: Fuwei Ma, Dejun Zhang, Lei Miao, Fengyan Qi
  • Publication number: 20150006180
    Abstract: A process and system for enhancing and customizing movie theatre sound includes receiving an input audio sound and enhancing the voice audio input in two or more harmonic and dynamic ranges by re-synthesizing the audio into a full range PCM wave. The enhancement includes the parallel processing the input audio via a low pass filter with dynamic offset, an envelope controlled bandpass filter, a high pass filter, adding an amount of dynamic synthesized sub bass to the audio and combining the four treated audio signals in a summing mixer with the original audio.
    Type: Application
    Filed: February 21, 2014
    Publication date: January 1, 2015
    Applicant: Max Sound Corporation
    Inventor: Lloyd Trammell
  • Patent number: 8924217
    Abstract: A communication converter is described for converting among speech signals and textual information, permitting communication between telephone users and textual instant communications users.
    Type: Grant
    Filed: April 28, 2011
    Date of Patent: December 30, 2014
    Assignee: Verizon Patent and Licensing Inc.
    Inventors: Richard G. Moore, Gregory L. Mumford, Duraisamy Gunasekar
  • Patent number: 8918323
    Abstract: A contextual conversion platform, and method for converting text-to-speech, are described that can convert content of a target to spoken content. Embodiments of the contextual conversion platform can identify certain contextual characteristics of the content, from which can be generated a spoken content input. This spoken content input can include tokens, e.g., words and abbreviations, to be converted to the spoken content, as well as substitution tokens that are selected from contextual repositories based on the context identified by the contextual conversion platform.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: December 23, 2014
    Inventor: Daniel Ben-Ezri
  • Patent number: 8918322
    Abstract: A personalized text-to-speech (pTTS) system provides a method for converting text data to speech data utilizing a pTTS template representing the voice characteristics of an individual. A memory stores executable program code that converts text data to speech data. Text data represents a textual message directed to a system user and speech data represents a spoken form of text data having the characteristics of an individual's voice. A processor executes the program code, and a storage device stores a pTTS template and may store speech data. The pTTS system can be used to provide various services that provide immediate spoken presentation of the speech data converted from text data and/or combine stored speech data with generated speech data for spoken presentation.
    Type: Grant
    Filed: June 20, 2007
    Date of Patent: December 23, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Edmund Gale Acker, Frederick Murray Burg
  • Patent number: 8918313
    Abstract: A method of selectively performing signal processing in a first mode and in a second mode. In the first mode, a noise cancel signal having a signal characteristic to cancel an external noise component is generated based on a voice signal supplied from a microphone, and an input digital audio signal and the noise cancel signal are combined into a voice signal to be output through a speaker. In the second mode, a sound process for vocal voice is performed on a voice signal supplied from a microphone, a vocal voice component is canceled from a digital audio signal of input music to generate a karaoke signal, and the karaoke signal and the vocal signal are combined into a voice signal to be output through a speaker. The first mode corresponds to an audio replay operation accompanied by noise cancel, and the second mode corresponds to a karaoke operation.
    Type: Grant
    Filed: May 16, 2012
    Date of Patent: December 23, 2014
    Assignee: Sony Corporation
    Inventors: Kazunobu Ookuri, Kohei Asada, Yasunobu Murata
  • Patent number: 8917876
    Abstract: SPL monitoring systems are provided. A SPL monitoring system includes an audio transducer configured to receive sound pressure, a logic circuit which calculates a safe time duration over which a user can receive current sound pressure values and an indicator element which produces a notification when an indicator level occurs. An SPL monitoring information system includes a database which stores data such as a list of earpiece devices and associated instrument response functions. The logic circuit compares a request with the data in the database and retrieves a subset of data and sends it to an output control unit. The output control unit sends the subset of data to a sending unit.
    Type: Grant
    Filed: June 14, 2007
    Date of Patent: December 23, 2014
    Assignee: Personics Holdings, LLC.
    Inventor: Steven W. Goldstein
  • Patent number: 8914290
    Abstract: Method and apparatus that dynamically adjusts operational parameters of a text-to-speech engine in a speech-based system. A voice engine or other application of a device provides a mechanism to alter the adjustable operational parameters of the text-to-speech engine. In response to one or more environmental conditions, the adjustable operational parameters of the text-to-speech engine are modified to increase the intelligibility of synthesized speech.
    Type: Grant
    Filed: May 18, 2012
    Date of Patent: December 16, 2014
    Assignee: Vocollect, Inc.
    Inventors: James Hendrickson, Debra Drylie Scott, Duane Littleton, John Pecorari, Arkadiusz Slusarczyk
  • Patent number: 8914291
    Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.
    Type: Grant
    Filed: September 24, 2013
    Date of Patent: December 16, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Darren C. Meyer, Stephen R. Springer
  • Patent number: 8909538
    Abstract: Improved methods of presenting speech prompts to a user as part of an automated system that employs speech recognition or other voice input are described. The invention improves the user interface by providing in combination with at least one user prompt seeking a voice response, an enhanced user keyword prompt intended to facilitate the user selecting a keyword to speak in response to the user prompt. The enhanced keyword prompts may be the same words as those a user can speak as a reply to the user prompt but presented using a different audio presentation method, e.g., speech rate, audio level, or speaker voice, than used for the user prompt. In some cases, the user keyword prompts are different words from the expected user response keywords, or portions of words, e.g., truncated versions of keywords.
    Type: Grant
    Filed: November 11, 2013
    Date of Patent: December 9, 2014
    Assignee: Verizon Patent and Licensing Inc.
    Inventor: James Mark Kondziela
  • Publication number: 20140350940
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for unit selection synthesis. The method causes a computing device to add a supplemental phoneset to a speech synthesizer front end having an existing phoneset, modify a unit preselection process based on the supplemental phoneset, preselect units from the supplemental phoneset and the existing phoneset based on the modified unit preselection process, and generate speech based on the preselected units. The supplemental phoneset can be a variation of the existing phoneset, can include a word boundary feature, can include a cluster feature where initial consonant clusters and some word boundaries are marked with diacritics, can include a function word feature which marks units as originating from a function word or a content word, and/or can include a pre-vocalic or post-vocalic feature. The speech synthesizer front end can incorporates the supplemental phoneset as an extra feature.
    Type: Application
    Filed: August 7, 2014
    Publication date: November 27, 2014
    Inventors: Alistair D. CONKIE, Mark BEUTNAGEL, Yeon-Jun KIM, Ann K. SYRDAL
  • Patent number: 8898066
    Abstract: A multi-lingual text-to-speech system and method processes a text to be synthesized via an acoustic-prosodic model selection module and an acoustic-prosodic model mergence module, and obtains a phonetic unit transformation table. In an online phase, the acoustic-prosodic model selection module, according to the text and a phonetic unit transcription corresponding to the text, uses at least a set controllable accent weighting parameter to select a transformation combination and find a second and a first acoustic-prosodic models. The acoustic-prosodic model mergence module merges the two acoustic-prosodic models into a merged acoustic-prosodic model, according to the at least a controllable accent weighting parameter, processes all transformations in the transformation combination and generates a merged acoustic-prosodic model sequence. A speech synthesizer and the merged acoustic-prosodic model sequence are further applied to synthesize the text into an L1-accent L2 speech.
    Type: Grant
    Filed: August 25, 2011
    Date of Patent: November 25, 2014
    Assignee: Industrial Technology Research Institute
    Inventors: Jen-Yu Li, Jia-Jang Tu, Chih-Chung Kuo
  • Patent number: 8892442
    Abstract: Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance.
    Type: Grant
    Filed: February 17, 2014
    Date of Patent: November 18, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Horst J. Schroeter
  • Patent number: 8892230
    Abstract: A multicore system 2 includes a main system program 610 that operates on a first processor core 61 and stores synthesized audio data, which is mixed audio data, to a buffer for DMA transfer 63, a standby program 620 that operates on a second processor 62, and an audio output unit 64 that sequentially stores the synthesized audio data transferred from the buffer for DMA transfer 63 and plays the stored synthesized audio data. When an amount of storage of the synthesized audio data stored to the buffer for DMA transfer 63 has not reached a predetermined amount of data determined according to the amount of storage of the synthesized audio data stored to the audio output unit 64, the standby system program 620 takes over and executes the mixing and the storage of the synthesized audio data that is executed by the main system program 610.
    Type: Grant
    Filed: August 4, 2010
    Date of Patent: November 18, 2014
    Assignee: NEC Corporation
    Inventor: Kentaro Sasagawa