Vocal Tract Model Patents (Class 704/261)
-
Patent number: 11842720Abstract: An audio processing system and a method thereof generate a synthesis model that can input an audio signal to generate feature data that can be used by a signal generator to generate a modified audio signal. Specifically, a pre-trained synthesis model is first generated using training audio data. Thereafter, a re-trained synthesis model is established by additionally training the pre-trained synthesis model. Based on a received instruction to modify at least one of sounding conditions of an audio signal to be processed, feature data is generated by inputting additional condition data into the re-trained synthesis model. The signal generator generates the modified audio signal from the generated feature data.Type: GrantFiled: May 3, 2021Date of Patent: December 12, 2023Assignee: YAMAHA CORPORATIONInventor: Ryunosuke Daido
-
Patent number: 11514924Abstract: In an aspect, during a presentation of a presentation material, viewers of the presentation material can be monitored. Based on the monitoring, new content can be determined for insertion into the presentation material. The new content can be automatically inserted to the presentation material in real time. In another aspect, during the presentation, a presenter of the presentation material can be monitored. The presenter's speech can be intercepted and analyzed to detect a level of confidence. Based on the detected level of confidence, the presenter's speech can be adjusted and the adjusted speech can be played back automatically, for example, in lieu of the presenter's original speech that is intercepted.Type: GrantFiled: February 21, 2020Date of Patent: November 29, 2022Assignee: International Business Machines CorporationInventors: Samuel Osebe, Charles Muchiri Wachira, Komminist Weldemariam, Celia Cintas
-
Patent number: 11450307Abstract: A method, computer program product, and computer system for text-to-speech synthesis is disclosed. Synthetic speech data for an input text may be generated. The synthetic speech data may be compared to recorded reference speech data corresponding to the input text. Based on, at least in part, the comparison of the synthetic speech data to the recorded reference speech data, at least one feature indicative of at least one difference between the synthetic speech data and the recorded reference speech data may be extracted. A speech gap filling model may be generated based on, at least in part, the at least one feature extracted. A speech output may be generated based on, at least in part, the speech gap filling model.Type: GrantFiled: March 27, 2019Date of Patent: September 20, 2022Assignee: TELEPATHY LABS, INC.Inventors: Piero Perucci, Martin Reber, Vijeta Avijeet
-
Patent number: 11348569Abstract: A speech processing device includes a hardware processor configured to receive input speech and extract speech frames from the input speech. The hardware processor is configured to calculate a spectrum parameter for each of the speech frames, calculate a first phase spectrum for each of the speech frames, calculate a group delay spectrum from the first phase spectrum based on a frequency component of the first phase spectrum, calculate a band group delay parameter in a predetermined frequency band from the group delay spectrum, and calculate a band group delay compensation parameter to compensate a difference between a second phase spectrum reconstructed from the band group delay parameter and the first phase spectrum. The hardware processor is configured to generate a speech waveform based on the spectrum parameter, the band group delay parameter, and the band group delay compensation parameter.Type: GrantFiled: April 7, 2020Date of Patent: May 31, 2022Assignee: KABUSHIKI KAISHA TOSHIBAInventors: Masatsune Tamura, Masahiro Morita
-
Patent number: 11210058Abstract: A sound system for providing independently variable audio outputs is disclosed herein. The sound system may include a display device, an audio system, and a transmitter. The display device may receive an audio signal and transmit the audio signal to the audio system and the transmitter. The audio system may condition the audio signal based on different settings provided by users. The transmitter may wirelessly transmit conditioned audio signals to one or more audio devices.Type: GrantFiled: September 30, 2020Date of Patent: December 28, 2021Assignee: TV Ears, Inc.Inventor: George Joseph Dennis
-
Patent number: 11137601Abstract: Systems and methods according to present principles allow social distancing within themed attractions such as haunted attractions in order to allow the enjoyment of the same in various circumstances. These circumstances include times of pandemic, for customers that are afraid to congregate in large groups, for customers that desire to control aspects of the experience, and so on.Type: GrantFiled: January 13, 2021Date of Patent: October 5, 2021Inventor: Mark D. Wieczorek
-
Patent number: 11094313Abstract: An electronic device for adjusting a speech output rate (speech rate) of speech output data.Type: GrantFiled: June 18, 2019Date of Patent: August 17, 2021Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventor: Piotr Marcinkiewicz
-
Patent number: 10986418Abstract: Methods and systems are described herein for improving audio for hearing impaired content consumers. An example method may comprise determining a content asset. Closed caption data associated with the content asset may be determined. At least a portion of the closed caption data may be determined based on a user setting associated with a hearing impairment. Compensating audio comprising a frequency translation associated with at least the portion of the closed caption data may be generated. The content asset may be caused to be output with audio content comprising the compensating audio and the original audio.Type: GrantFiled: May 17, 2019Date of Patent: April 20, 2021Assignee: Comcast Cable Communications, LLCInventor: Jeff Calkins
-
Patent number: 10909978Abstract: Technologies for secure storage of utterances are disclosed. A computing device captures audio of a human making a verbal utterance. The utterance is provided to a speech-to-text (STT) service that translates the utterance to text. The STT service can also identify various speaker-specific attributes in the utterance. The text and attributes are provided to a text-to-speech (TTS) service that creates speech from the text and a subset of the attributes. The speech is stored in a data store that is less secure than that required for storing the original utterance. The original utterance can then be discarded. The STT service can also translate the speech generated by the TTS service to text. The text generated by the STT service from the speech and the text generated by the STT service from the original utterance are then compared. If the text does not match, the original utterance can be retained.Type: GrantFiled: June 28, 2017Date of Patent: February 2, 2021Assignee: Amazon Technologies, Inc.Inventors: William Frederick Hingle Kruse, Peter Turk, Panagiotis Thomas
-
Patent number: 10902060Abstract: A computer-implemented method includes receiving, from a first network application, a first unbounded list of objects of a first type and a second unbounded list of objects of a second type, wherein the second type is distinct from the first type, and producing a third unbounded list of objects of a third type, wherein the third type is distinct from both the first type and the second type. The computer-implemented method further includes providing the third unbounded list to a second network application. A corresponding computer program product and computer system are also disclosed.Type: GrantFiled: April 15, 2019Date of Patent: January 26, 2021Assignee: International Business Machines CorporationInventors: Robert J. Connolly, Michael J. Hudson
-
Patent number: 10896678Abstract: Typical graphical user interfaces and predefined data fields limit the interaction between a person and a computing system. An oral communication device and a data enablement platform are provided for ingesting oral conversational data from people, and using machine learning to provide intelligence. At the front end, an oral conversational bot, or chatbot, interacts with a user. On the backend, the data enablement platform has a computing architecture that ingests data from various external data sources as well as data from internal applications and databases. These data and algorithms are applied to surface new data, identify trends, provide recommendations, infer new understanding, predict actions and events, and automatically act on this computed information. The chatbot then provides audio data that reflects the information computed by the data enablement platform. The system and the devices, for example, are adaptable to various industries.Type: GrantFiled: August 10, 2018Date of Patent: January 19, 2021Assignee: FACET LABS, LLCInventors: Stuart Ogawa, Lindsay Alexander Sparks, Koichi Nishimura, Wilfred P. So
-
Patent number: 10878802Abstract: A speech processing apparatus includes a specifier, and a modulator. The specifier specifies any one or more of one or more speeches included in speeches to be output, as an emphasis part based on an attribute of the speech. The modulator modulates the emphasis part of at least one of first speech to be output to the first output unit and second speech to be output to the second output unit such that at least one of a pitch and a phase is different between the emphasis part of the first speech and the emphasis part of the second speech.Type: GrantFiled: August 28, 2017Date of Patent: December 29, 2020Assignee: Kabushiki Kaisha ToshibaInventor: Masahiro Yamamoto
-
Patent number: 10803852Abstract: A speech processing apparatus includes a specifier, a determiner, and a modulator. The specifier specifies an emphasis part of speech to be output. The determiner determines, from among a plurality of output units, a first output unit and a second output unit for outputting speech for emphasizing the emphasis part. The modulator modulates the emphasis part of at least one of first speech to be output to the first output unit and second speech to be output to the second output unit such that at least one of a pitch and a phase is different between the emphasis part of the first speech and the emphasis part of the second speech.Type: GrantFiled: August 28, 2017Date of Patent: October 13, 2020Assignee: Kabushiki Kaisha ToshibaInventor: Masahiro Yamamoto
-
Patent number: 10650800Abstract: A speech processing device of an embodiment includes a spectrum parameter calculation unit, a phase spectrum calculation unit, a group delay spectrum calculation unit, a band group delay parameter calculation unit, and a band group delay compensation parameter calculation unit. The spectrum parameter calculation unit calculates a spectrum parameter. The phase spectrum calculation unit calculates a first phase spectrum. The group delay spectrum calculation unit calculates a group delay spectrum from the first phase spectrum based on a frequency component of the first phase spectrum. The band group delay parameter calculation unit calculates a band group delay parameter in a predetermined frequency band from a group delay spectrum. The band group delay compensation parameter calculation unit calculates a band group delay compensation parameter to compensate a difference between a second phase spectrum reconstructed from the band group delay parameter and the first phase spectrum.Type: GrantFiled: February 16, 2018Date of Patent: May 12, 2020Assignee: KABUSHIKI KAISHA TOSHIBAInventors: Masatsune Tamura, Masahiro Morita
-
Patent number: 10652676Abstract: A hearing aid (10, 11) has a memory (123) for storing personal settings for alleviating a hearing loss for the hearing aid user. A user account is created from an Internet enabled computer device (17) on a remote server (25), and the user account includes the personal settings for alleviating a hearing loss for the hearing aid user and personal information. A wireless connection is set up a between the hearing aid (10, 11) and the personal communication device (13), and the personal communication device (13) is identified as a gateway to the Internet for said hearing aid. The user grants access rights to a third party to modify data in a sub-set of the user account stored on the server (25).Type: GrantFiled: November 20, 2014Date of Patent: May 12, 2020Assignee: Widex A/SInventors: Soren Erik Westermann, Svend Vitting Andersen, Anders Westergaard, Niels Erik Boelskift Maretti
-
Patent number: 10535350Abstract: A method for controlling a plurality of environmental factors that trigger a negative emotional state is provided. The method may include analyzing a plurality of user data when a user experiences a plurality of various environmental factors. The method may also include determining an emotional state experienced by the user when each of the plurality of various environmental factors is present based on the plurality of user data. The method may include receiving a plurality of calendar information associated with a user account. The method may also include identifying an upcoming event based on the plurality of calendar information. The method may include identifying an environmental factor within the plurality of various environmental factors is present at the upcoming event. The method may also include, in response to determining the environmental factor causes the user to experience a negative emotional state, executing an accommodation method based on the environmental factor.Type: GrantFiled: April 15, 2019Date of Patent: January 14, 2020Assignee: International Business Machines CorporationInventors: Paul R. Bastide, Matthew E. Broomhall, Robert E. Loredo, Fang Lu
-
Patent number: 10296655Abstract: A computer-implemented method includes receiving, from a first network application, a first unbounded list of objects of a first type and a second unbounded list of objects of a second type, wherein the second type is distinct from the first type, and producing a third unbounded list of objects of a third type, wherein the third type is distinct from both the first type and the second type. The computer-implemented method further includes providing the third unbounded list to a second network application. A corresponding computer program product and computer system are also disclosed.Type: GrantFiled: June 24, 2016Date of Patent: May 21, 2019Assignee: International Business Machines CorporationInventors: Robert J. Connolly, Michael J. Hudson
-
Patent number: 10008198Abstract: A method of segmenting input speech signal into plurality of frames for speech recognition is disclosed. The method includes extracting a low frequency signal from the speech signal, and segmenting the speech signal into a plurality of time-intervals according to a plurality of instantaneous phase-sections of the low frequency signal.Type: GrantFiled: December 30, 2013Date of Patent: June 26, 2018Assignee: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGYInventors: Kwang-Hyun Cho, Byeongwook Lee, Sung Hoon Jung
-
Patent number: 9875735Abstract: Disclosed herein are systems, methods, and computer readable-media for providing an automatic synthetically generated voice describing media content, the method comprising receiving one or more pieces of metadata for a primary media content, selecting at least one piece of metadata for output, and outputting the at least one piece of metadata as synthetically generated speech with the primary media content. Other aspects of the invention involve alternative output, output speech simultaneously with the primary media content, output speech during gaps in the primary media content, translate metadata in foreign language, tailor voice, accent, and language to match the metadata and/or primary media content. A user may control output via a user interface or output may be customized based on preferences in a user profile.Type: GrantFiled: January 27, 2017Date of Patent: January 23, 2018Assignee: AT&T Intellectual Property I, L.P.Inventors: Linda Roberts, Hong Thi Nguyen, Horst J Schroeter
-
Patent number: 9824695Abstract: Embodiments herein include receiving a request to modify an audio characteristic associated with a first user for a voice communication system. One or more suggested modified audio characteristics may be provided for the first user, based on, at least in part, one or more audio preferences established by another user. An input of one or more modified audio characteristics may be received for the first user for the voice communication system. A user-specific audio preference may be associated with the first user for voice communications on the voice communication system, the user-specific audio preference including the one or more modified audio characteristics.Type: GrantFiled: June 18, 2012Date of Patent: November 21, 2017Assignee: International Business Machines CorporationInventors: Ruthie D. Lyle, Patrick Joseph O'Sullivan, Lin Sun
-
Patent number: 9558734Abstract: A voice recipient may request a text-to-speech (TTS) voice that corresponds to an age or age range. An existing TTS voice or existing voice data may be used to create a TTS voice corresponding to the requested age by encoding the voice data to voice parameter values, transforming the voice parameter values using a voice-aging model, synthesizing voice data using the transformed parameter values, and then creating a TTS voice using the transformed voice data. The voice-aging model may model how one or more voice parameters of a voice change with age and may be created from voice data stored in a voice bank.Type: GrantFiled: April 26, 2016Date of Patent: January 31, 2017Assignee: VOCALID, INC.Inventors: Rupal Patel, Geoffrey Seth Meltzner
-
Patent number: 9472199Abstract: The present invention relates to a method and apparatus for processing a voice signal, and the voice signal encoding method according to the present invention comprises the steps of: generating transform coefficients of sine wave components forming an input voice signal by transforming the sine wave components; determining transform coefficients to be encoded from the generated transform coefficients; and transmitting indication information indicating the determined transform coefficients, wherein the indication information may include position information, magnitude information, and sign information of the transform coefficients.Type: GrantFiled: September 28, 2012Date of Patent: October 18, 2016Assignee: LG Electronics Inc.Inventors: Younghan Lee, Gyuhyeok Jeong, Ingyu Kang, Hyejeong Jeon, Lagyoung Kim
-
Patent number: 9240194Abstract: A voice quality conversion system includes: an analysis unit which analyzes sounds of plural vowels of different types to generate first vocal tract shape information for each type of the vowels; a combination unit which combines, for each type of the vowels, the first vocal tract shape information on that type of vowel and the first vocal tract shape information on a different type of vowel to generate second vocal tract shape information on that type of vowel; and a synthesis unit which (i) combines vocal tract shape information on a vowel included in input speech and the second vocal tract shape information on the same type of vowel to convert vocal tract shape information on the input speech, and (ii) generates a synthetic sound using the converted vocal tract shape information and voicing source information on the input speech to convert the voice quality of the input speech.Type: GrantFiled: April 29, 2013Date of Patent: January 19, 2016Assignee: PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO., LTD.Inventors: Takahiro Kamai, Yoshifumi Hirose
-
Patent number: 9224406Abstract: Candidate frequencies per unit segment of an audio signal are identified. First processing section identifies an estimated train that is a time series of candidate frequencies, each selected for a different one of the segments, arranged over a plurality of the unit segments and that has a high likelihood of corresponding to a time series of fundamental frequencies of a target component. Second processing section identifies a state train of states, each indicative of one of sound-generating and non-sound-generating states of the target component in a different one of the segments, arranged over the unit segments. Frequency information which designates, as a fundamental frequency of the target component, a candidate frequency corresponding to the unit segment in the estimated train is generated for each unit segment corresponding to the sound-generating state. Frequency information indicative of no sound generation is generated for each unit segment corresponding to the non-sound-generating state.Type: GrantFiled: October 28, 2011Date of Patent: December 29, 2015Assignee: Yamaha CorporationInventors: Jordi Bonada, Jordi Janer, Ricard Marxer, Yasuyuki Umeyama, Kazunobu Kondo, Francisco Garcia
-
Patent number: 9058816Abstract: Mental state of a person is classified in an automated manner by analysing natural speech of the person. A glottal waveform is extracted from a natural speech signal. Pre-determined parameters defining at least one diagnostic class of a class model are retrieved, the parameters determined from selected training glottal waveform features. The selected glottal waveform features are extracted from the signal. Current mental state of the person is classified by comparing extracted glottal waveform features with the parameters and class model. Feature extraction from a glottal waveform or other natural speech signal may involve determining spectral amplitudes of the signal, setting spectral amplitudes below a pre-defined threshold to zero and, for each of a plurality of sub bands, determining an area under the thresholded spectral amplitudes, and deriving signal feature parameters from the determined areas in accordance with a diagnostic class model.Type: GrantFiled: August 23, 2010Date of Patent: June 16, 2015Assignee: RMIT UniversityInventors: Margaret Lech, Nicholas Brian Allen, Ian Shaw Burnett, Ling He
-
Patent number: 9002703Abstract: The community-based generation of audio narrations for a text-based work leverages collaboration of a community of people to provide human-voiced audio readings. During the community-based generation, a collection of audio recordings for the text-based work may be collected from multiple human readers in a community. An audio recording for each section in the text-based work may be selected from the collection of audio recordings. The selected audio recordings may be then combined to produce an audio reading of at least a portion of the text-based work.Type: GrantFiled: September 28, 2011Date of Patent: April 7, 2015Assignee: Amazon Technologies, Inc.Inventor: Jay A. Crosley
-
Patent number: 8977552Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.Type: GrantFiled: May 28, 2014Date of Patent: March 10, 2015Assignee: AT&T Intellectual Property II, L.P.Inventors: Alistair D. Conkie, Ann K. Syrdal
-
Patent number: 8977555Abstract: Features are disclosed for generating markers for elements or other portions of an audio presentation so that a speech processing system may determine which portion of the audio presentation a user utterance refers to. For example, an utterance may include a pronoun with no explicit antecedent. The marker may be used to associate the utterance with the corresponding content portion for processing. The markers can be provided to a client device with a text-to-speech (“TTS”) presentation. The markers may then be provided to a speech processing system along with a user utterance captured by the client device. The speech processing system, which may include automatic speech recognition (“ASR”) modules and/or natural language understanding (“NLU”) modules, can generate hints based on the marker. The hints can be provided to the ASR and/or NLU modules in order to aid in processing the meaning or intent of a user utterance.Type: GrantFiled: December 20, 2012Date of Patent: March 10, 2015Assignee: Amazon Technologies, Inc.Inventors: Fred Torok, Frédéric Johan Georges Deramat, Vikram Kumar Gundeti
-
Patent number: 8949129Abstract: A method and apparatus are provided for processing a set of communicated signals associated with a set of muscles, such as the muscles near the larynx of the person, or any other muscles the person use to achieve a desired response. The method includes the steps of attaching a single integrated sensor, for example, near the throat of the person proximate to the larynx and detecting an electrical signal through the sensor. The method further includes the steps of extracting features from the detected electrical signal and continuously transforming them into speech sounds without the need for further modulation. The method also includes comparing the extracted features to a set of prototype features and selecting a prototype feature of the set of prototype features providing a smallest relative difference.Type: GrantFiled: August 12, 2013Date of Patent: February 3, 2015Assignee: Ambient CorporationInventors: Michael Callahan, Thomas Coleman
-
Patent number: 8942983Abstract: The present invention relates to a method of text-based speech synthesis, wherein at least one portion of a text is specified; the intonation of each portion is determined; target speech sounds are associated with each portion; physical parameters of the target speech sounds are determined; speech sounds most similar in terms of the physical parameters to the target speech sounds are found in a speech database; and speech is synthesized as a sequence of the found speech sounds. The physical parameters of said target speech sounds are determined in accordance with the determined intonation. The present method, when used in a speech synthesizer, allows improved quality of synthesized speech due to precise reproduction of intonation.Type: GrantFiled: November 23, 2011Date of Patent: January 27, 2015Assignee: Speech Technology Centre, LimitedInventor: Mikhail Vasilievich Khitrov
-
Publication number: 20140379350Abstract: Disclosed herein are systems, methods, and computer readable-media for providing an automatic synthetically generated voice describing media content, the method comprising receiving one or more pieces of metadata for a primary media content, selecting at least one piece of metadata for output, and outputting the at least one piece of metadata as synthetically generated speech with the primary media content. Other aspects of the invention involve alternative output, output speech simultaneously with the primary media content, output speech during gaps in the primary media content, translate metadata in foreign language, tailor voice, accent, and language to match the metadata and/or primary media content. A user may control output via a user interface or output may be customized based on preferences in a user profile.Type: ApplicationFiled: September 9, 2014Publication date: December 25, 2014Inventors: Linda ROBERTS, Hong Thi NGUYEN, Horst J. SCHROETER
-
Publication number: 20140365068Abstract: The present invention is directed to a system and method for personalizing a voice user interface on an electronic device. Voice recordings are made into an electronic device or a computerized system using a software installed onto the device, where a user is prompted to record various dialogues and commands. The recording is then converted into voice data packages, and uploaded onto the electronic device. In this way, users can replace the computerized or preloaded voice in a voice user interface of an electronic device with their own voice or a voice of others. In one embodiment, the electronic device comprises a mobile phone or a tablet computer. In other embodiments, the electronic device comprises a vehicle communication system and a navigation device. The system and method of the present invention enables the user to personalize the voice user interface for each electronic device operated by the user.Type: ApplicationFiled: June 5, 2014Publication date: December 11, 2014Inventors: Melvin Burns, Wanda L. Burns
-
Patent number: 8898055Abstract: A voice quality conversion device including: a target vowel vocal tract information hold unit holding target vowel vocal tract information of each vowel indicating target voice quality; a vowel conversion unit (i) receiving vocal tract information with phoneme boundary information of the speech including information of phonemes and phoneme durations, (ii) approximating a temporal change of vocal tract information of a vowel in the vocal tract information with phoneme boundary information applying a first function, (iii) approximating a temporal change of vocal tract information of the same vowel held in the target vowel vocal tract information hold unit applying a second function, (iv) calculating a third function by combining the first function with the second function, and (v) converting the vocal tract information of the vowel applying the third function; and a synthesis unit synthesizing a speech using the converted information.Type: GrantFiled: May 8, 2008Date of Patent: November 25, 2014Assignee: Panasonic Intellectual Property Corporation of AmericaInventors: Yoshifumi Hirose, Takahiro Kamai, Yumiko Kato
-
Patent number: 8892442Abstract: Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance.Type: GrantFiled: February 17, 2014Date of Patent: November 18, 2014Assignee: AT&T Intellectual Property I, L.P.Inventor: Horst J. Schroeter
-
Patent number: 8868431Abstract: A recognition dictionary creation device identifies the language of a reading of an inputted text which is a target to be registered and adds a reading with phonemes in the language identified thereby to the target text to be registered, and also converts the reading of the target text to be registered from the phonemes in the language identified thereby to phonemes in a language to be recognized which is handled in voice recognition to create a recognition dictionary in which the converted reading of the target text to be registered is registered.Type: GrantFiled: February 5, 2010Date of Patent: October 21, 2014Assignee: Mitsubishi Electric CorporationInventors: Michihiro Yamazaki, Jun Ishii, Yasushi Ishikawa
-
Patent number: 8862472Abstract: The present invention is related to a method for coding excitation signal of a target speech comprising the steps of: extracting from a set of training normalized residual frames, a set of relevant normalized residual frames, said training residual frames being extracted from a training speech, synchronized on Glottal Closure Instant(GCI), pitch and energy normalized; determining the target excitation signal of the target speech; dividing said target excitation signal into GCI synchronized target frames; determining the local pitch and energy of the GCI synchronized target frames; normalizing the GCI synchronized target frames in both energy and pitch, to obtain target normalized residual frames; determining coefficients of linear combination of said extracted set of relevant normalized residual frames to build synthetic normalized residual frames close to each target normalized residual frames; wherein the coding parameters for each target residual frames comprise the determined coefficients.Type: GrantFiled: March 30, 2010Date of Patent: October 14, 2014Assignees: Universite de Mons, Acapela Group S.A.Inventors: Geoffrey Wilfart, Thomas Drugman, Thierry Dutoit
-
Patent number: 8856008Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.Type: GrantFiled: September 18, 2013Date of Patent: October 7, 2014Assignee: Morphism LLCInventor: James H. Stephens, Jr.
-
Publication number: 20140278432Abstract: Various embodiments provide a method and apparatus for providing a silent speech solution which allows the user to speak over an electronic media such as a cell phone without making any noise. In particular, measuring the shape of the vocal tract allows creation of synthesized speech without requiring noise produced by the vocal chords.Type: ApplicationFiled: March 14, 2013Publication date: September 18, 2014Inventor: Dale D. Harman
-
Publication number: 20140278433Abstract: A voice synthesis device includes a sequence data generation unit configured to generate sequence data including a plurality of kinds of parameters for controlling vocalization of a voice to be synthesized based on music information and lyrics information, an output unit configured to output a singing voice based on the sequence data, and a processing content information acquisition unit configured to acquire a plurality of processing content information, associated with each of pieces of preset singing manner information. Each of the content information indicates contents of edit processing for all or part of the parameters. The sequence data generation unit generates a plurality of pieces of sequence data, and the sequence data are obtained by editing the all or part of the parameters included in the sequence data, based on the content information associated with one of the pieces of singing manner information specified by a user.Type: ApplicationFiled: March 5, 2014Publication date: September 18, 2014Applicant: Yamaha CorporationInventor: Tatsuya IRIYAMA
-
Publication number: 20140207463Abstract: An audio signal method of the present disclosure includes: inputting a plurality of variables including at least a first variable indicating an opening degree of a throat, which interiorly includes a vocal cord, with respect to a vocal cord model configured to output a second variable indicating an opening degree of the vocal cord according to reception of input of the plurality of variables, the first variable being greater than the second variable; and generating an audio signal in which a level of a non-integer harmonic sound is changed, by controlling the second variable.Type: ApplicationFiled: January 17, 2014Publication date: July 24, 2014Applicant: PANASONIC CORPORATIONInventor: Masahiro NAKANISHI
-
Patent number: 8775176Abstract: A system, method and computer readable medium that provides an automated web transcription service is disclosed. The method may include receiving input speech from a user using a communications network, recognizing the received input speech, understanding the recognized speech, transcribing the understood speech to text, storing the transcribed text in a database, receiving a request via a web page to display the transcribed text, retrieving transcribed text from the database, and displaying the transcribed text to the requester using the web page.Type: GrantFiled: August 26, 2013Date of Patent: July 8, 2014Assignee: AT&T Intellectual Property II, L.P.Inventors: Mazin Gilbert, Stephan Kanthak
-
Patent number: 8751239Abstract: An apparatus for providing text independent voice conversion may include a first voice conversion model and a second voice conversion model. The first voice conversion model may be trained with respect to conversion of training source speech to synthetic speech corresponding to the training source speech. The second voice conversion model may be trained with respect to conversion to training target speech from synthetic speech corresponding to the training target speech. An output of the first voice conversion model may be communicated to the second voice conversion model to process source speech input into the first voice conversion model into target speech corresponding to the source speech as the output of the second voice conversion model.Type: GrantFiled: October 4, 2007Date of Patent: June 10, 2014Assignee: Core Wireless Licensing, S.a.r.l.Inventors: Jilei Tian, Victor Popa, Jani K. Nurminen
-
Patent number: 8744851Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.Type: GrantFiled: August 13, 2013Date of Patent: June 3, 2014Assignee: AT&T Intellectual Property II, L.P.Inventors: Alistair Conkie, Ann K Syrdal
-
Patent number: 8719030Abstract: The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary acoustic waves, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition.Type: GrantFiled: December 3, 2012Date of Patent: May 6, 2014Inventor: Chengjun Julian Chen
-
Patent number: 8706488Abstract: In one aspect, a method of processing a voice signal to extract information to facilitate training a speech synthesis model is provided. The method comprises acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison. In another aspect, the method is performed by executing a program encoded on a computer readable medium. In another aspect, a speech synthesis model is provided by, at least in part, performing the method.Type: GrantFiled: February 27, 2013Date of Patent: April 22, 2014Assignee: Nuance Communications, Inc.Inventors: Michael D. Edgington, Laurence Gillick, Jordan R. Cohen
-
Patent number: 8706489Abstract: A system and method for selecting audio contents by using the speech recognition to obtain a textual phrase from a series of audio contents are provided. The system includes an output module outputting the audio contents, an input module receiving a speech input from a user, a buffer temporarily storing the audio contents within a desired period and the speech input, and a recognizing module performing a speech recognition between the audio contents within the desired period and the speech input to generate an audio phrase and the corresponding textual phrase matching with the speech input.Type: GrantFiled: August 8, 2006Date of Patent: April 22, 2014Assignee: Delta Electronics Inc.Inventors: Jia-lin Shen, Chien-Chou Hung
-
Publication number: 20140108015Abstract: A voice converting apparatus and a voice converting method are provided. The method of converting a voice using a voice converting apparatus including receiving a voice from a counterpart, analyzing the voice and determining whether the voice abnormal, converting the voice into a normal voice by adjusting a harmonic signal of the voice in response to determining that the voice is abnormal, and transmitting the normal voice.Type: ApplicationFiled: October 11, 2013Publication date: April 17, 2014Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Jong-youb RYU, Yoon-jae LEE, Seoung-hun KIM, Young-tae KIM
-
Patent number: 8694319Abstract: Methods, systems, and products are disclosed for dynamic prosody adjustment for voice-rendering synthesized data that include retrieving synthesized data to be voice-rendered; identifying, for the synthesized data to be voice-rendered, a particular prosody setting; determining, in dependence upon the synthesized data to be voice-rendered and the context information for the context in which the synthesized data is to be voice-rendered, a section of the synthesized data to be rendered; and rendering the section of the synthesized data in dependence upon the identified particular prosody setting.Type: GrantFiled: November 3, 2005Date of Patent: April 8, 2014Assignee: International Business Machines CorporationInventors: William K. Bodin, David Jaramillo, Jerry W. Redman, Derral C. Thorson
-
Patent number: 8655662Abstract: Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance.Type: GrantFiled: November 29, 2012Date of Patent: February 18, 2014Assignee: AT&T Intellectual Property I, L.P.Inventor: Horst Schroeter
-
Patent number: 8650035Abstract: A speech conversion system facilitates voice communications. A database comprises a plurality of conversion heuristics, at least some of the conversion heuristics being associated with identification information for at least one first party. At least one speech converter is configured to convert a first speech signal received from the at least one first party into a converted first speech signal different than the first speech signal.Type: GrantFiled: November 18, 2005Date of Patent: February 11, 2014Assignee: Verizon Laboratories Inc.Inventor: Adrian E. Conway