Vocal Tract Model Patents (Class 704/261)

Audio processing method and audio processing system

Patent number: 11842720

Abstract: An audio processing system and a method thereof generate a synthesis model that can input an audio signal to generate feature data that can be used by a signal generator to generate a modified audio signal. Specifically, a pre-trained synthesis model is first generated using training audio data. Thereafter, a re-trained synthesis model is established by additionally training the pre-trained synthesis model. Based on a received instruction to modify at least one of sounding conditions of an audio signal to be processed, feature data is generated by inputting additional condition data into the re-trained synthesis model. The signal generator generates the modified audio signal from the generated feature data.

Type: Grant

Filed: May 3, 2021

Date of Patent: December 12, 2023

Assignee: YAMAHA CORPORATION

Inventor: Ryunosuke Daido
Dynamic creation and insertion of content

Patent number: 11514924

Abstract: In an aspect, during a presentation of a presentation material, viewers of the presentation material can be monitored. Based on the monitoring, new content can be determined for insertion into the presentation material. The new content can be automatically inserted to the presentation material in real time. In another aspect, during the presentation, a presenter of the presentation material can be monitored. The presenter's speech can be intercepted and analyzed to detect a level of confidence. Based on the detected level of confidence, the presenter's speech can be adjusted and the adjusted speech can be played back automatically, for example, in lieu of the presenter's original speech that is intercepted.

Type: Grant

Filed: February 21, 2020

Date of Patent: November 29, 2022

Assignee: International Business Machines Corporation

Inventors: Samuel Osebe, Charles Muchiri Wachira, Komminist Weldemariam, Celia Cintas
Text-to-speech synthesis system and method

Patent number: 11450307

Abstract: A method, computer program product, and computer system for text-to-speech synthesis is disclosed. Synthetic speech data for an input text may be generated. The synthetic speech data may be compared to recorded reference speech data corresponding to the input text. Based on, at least in part, the comparison of the synthetic speech data to the recorded reference speech data, at least one feature indicative of at least one difference between the synthetic speech data and the recorded reference speech data may be extracted. A speech gap filling model may be generated based on, at least in part, the at least one feature extracted. A speech output may be generated based on, at least in part, the speech gap filling model.

Type: Grant

Filed: March 27, 2019

Date of Patent: September 20, 2022

Assignee: TELEPATHY LABS, INC.

Inventors: Piero Perucci, Martin Reber, Vijeta Avijeet
Speech processing device, speech processing method, and computer program product using compensation parameters

Patent number: 11348569

Abstract: A speech processing device includes a hardware processor configured to receive input speech and extract speech frames from the input speech. The hardware processor is configured to calculate a spectrum parameter for each of the speech frames, calculate a first phase spectrum for each of the speech frames, calculate a group delay spectrum from the first phase spectrum based on a frequency component of the first phase spectrum, calculate a band group delay parameter in a predetermined frequency band from the group delay spectrum, and calculate a band group delay compensation parameter to compensate a difference between a second phase spectrum reconstructed from the band group delay parameter and the first phase spectrum. The hardware processor is configured to generate a speech waveform based on the spectrum parameter, the band group delay parameter, and the band group delay compensation parameter.

Type: Grant

Filed: April 7, 2020

Date of Patent: May 31, 2022

Assignee: KABUSHIKI KAISHA TOSHIBA

Inventors: Masatsune Tamura, Masahiro Morita
Systems and methods for providing independently variable audio outputs

Patent number: 11210058

Abstract: A sound system for providing independently variable audio outputs is disclosed herein. The sound system may include a display device, an audio system, and a transmitter. The display device may receive an audio signal and transmit the audio signal to the audio system and the transmitter. The audio system may condition the audio signal based on different settings provided by users. The transmitter may wirelessly transmit conditioned audio signals to one or more audio devices.

Type: Grant

Filed: September 30, 2020

Date of Patent: December 28, 2021

Assignee: TV Ears, Inc.

Inventor: George Joseph Dennis
System and method for distanced interactive experiences

Patent number: 11137601

Abstract: Systems and methods according to present principles allow social distancing within themed attractions such as haunted attractions in order to allow the enjoyment of the same in various circumstances. These circumstances include times of pandemic, for customers that are afraid to congregate in large groups, for customers that desire to control aspects of the experience, and so on.

Type: Grant

Filed: January 13, 2021

Date of Patent: October 5, 2021

Inventor: Mark D. Wieczorek
Electronic device and method of controlling speech recognition by electronic device

Patent number: 11094313

Abstract: An electronic device for adjusting a speech output rate (speech rate) of speech output data.

Type: Grant

Filed: June 18, 2019

Date of Patent: August 17, 2021

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventor: Piotr Marcinkiewicz
Audio improvement using closed caption data

Patent number: 10986418

Abstract: Methods and systems are described herein for improving audio for hearing impaired content consumers. An example method may comprise determining a content asset. Closed caption data associated with the content asset may be determined. At least a portion of the closed caption data may be determined based on a user setting associated with a hearing impairment. Compensating audio comprising a frequency translation associated with at least the portion of the closed caption data may be generated. The content asset may be caused to be output with audio content comprising the compensating audio and the original audio.

Type: Grant

Filed: May 17, 2019

Date of Patent: April 20, 2021

Assignee: Comcast Cable Communications, LLC

Inventor: Jeff Calkins
Secure utterance storage

Patent number: 10909978

Abstract: Technologies for secure storage of utterances are disclosed. A computing device captures audio of a human making a verbal utterance. The utterance is provided to a speech-to-text (STT) service that translates the utterance to text. The STT service can also identify various speaker-specific attributes in the utterance. The text and attributes are provided to a text-to-speech (TTS) service that creates speech from the text and a subset of the attributes. The speech is stored in a data store that is less secure than that required for storing the original utterance. The original utterance can then be discarded. The STT service can also translate the speech generated by the TTS service to text. The text generated by the STT service from the speech and the text generated by the STT service from the original utterance are then compared. If the text does not match, the original utterance can be retained.

Type: Grant

Filed: June 28, 2017

Date of Patent: February 2, 2021

Assignee: Amazon Technologies, Inc.

Inventors: William Frederick Hingle Kruse, Peter Turk, Panagiotis Thomas
Unbounded list processing

Patent number: 10902060

Abstract: A computer-implemented method includes receiving, from a first network application, a first unbounded list of objects of a first type and a second unbounded list of objects of a second type, wherein the second type is distinct from the first type, and producing a third unbounded list of objects of a third type, wherein the third type is distinct from both the first type and the second type. The computer-implemented method further includes providing the third unbounded list to a second network application. A corresponding computer program product and computer system are also disclosed.

Type: Grant

Filed: April 15, 2019

Date of Patent: January 26, 2021

Assignee: International Business Machines Corporation

Inventors: Robert J. Connolly, Michael J. Hudson
Oral communication device and computing systems for processing data and outputting oral feedback, and related methods

Patent number: 10896678

Abstract: Typical graphical user interfaces and predefined data fields limit the interaction between a person and a computing system. An oral communication device and a data enablement platform are provided for ingesting oral conversational data from people, and using machine learning to provide intelligence. At the front end, an oral conversational bot, or chatbot, interacts with a user. On the backend, the data enablement platform has a computing architecture that ingests data from various external data sources as well as data from internal applications and databases. These data and algorithms are applied to surface new data, identify trends, provide recommendations, infer new understanding, predict actions and events, and automatically act on this computed information. The chatbot then provides audio data that reflects the information computed by the data enablement platform. The system and the devices, for example, are adaptable to various industries.

Type: Grant

Filed: August 10, 2018

Date of Patent: January 19, 2021

Assignee: FACET LABS, LLC

Inventors: Stuart Ogawa, Lindsay Alexander Sparks, Koichi Nishimura, Wilfred P. So
Speech processing apparatus, speech processing method, and computer program product

Patent number: 10878802

Abstract: A speech processing apparatus includes a specifier, and a modulator. The specifier specifies any one or more of one or more speeches included in speeches to be output, as an emphasis part based on an attribute of the speech. The modulator modulates the emphasis part of at least one of first speech to be output to the first output unit and second speech to be output to the second output unit such that at least one of a pitch and a phase is different between the emphasis part of the first speech and the emphasis part of the second speech.

Type: Grant

Filed: August 28, 2017

Date of Patent: December 29, 2020

Assignee: Kabushiki Kaisha Toshiba

Inventor: Masahiro Yamamoto
Speech processing apparatus, speech processing method, and computer program product

Patent number: 10803852

Abstract: A speech processing apparatus includes a specifier, a determiner, and a modulator. The specifier specifies an emphasis part of speech to be output. The determiner determines, from among a plurality of output units, a first output unit and a second output unit for outputting speech for emphasizing the emphasis part. The modulator modulates the emphasis part of at least one of first speech to be output to the first output unit and second speech to be output to the second output unit such that at least one of a pitch and a phase is different between the emphasis part of the first speech and the emphasis part of the second speech.

Type: Grant

Filed: August 28, 2017

Date of Patent: October 13, 2020

Assignee: Kabushiki Kaisha Toshiba

Inventor: Masahiro Yamamoto
Speech processing device, speech processing method, and computer program product

Patent number: 10650800

Abstract: A speech processing device of an embodiment includes a spectrum parameter calculation unit, a phase spectrum calculation unit, a group delay spectrum calculation unit, a band group delay parameter calculation unit, and a band group delay compensation parameter calculation unit. The spectrum parameter calculation unit calculates a spectrum parameter. The phase spectrum calculation unit calculates a first phase spectrum. The group delay spectrum calculation unit calculates a group delay spectrum from the first phase spectrum based on a frequency component of the first phase spectrum. The band group delay parameter calculation unit calculates a band group delay parameter in a predetermined frequency band from a group delay spectrum. The band group delay compensation parameter calculation unit calculates a band group delay compensation parameter to compensate a difference between a second phase spectrum reconstructed from the band group delay parameter and the first phase spectrum.

Type: Grant

Filed: February 16, 2018

Date of Patent: May 12, 2020

Assignee: KABUSHIKI KAISHA TOSHIBA

Inventors: Masatsune Tamura, Masahiro Morita
Granting access rights to a sub-set of the data set in a user account

Patent number: 10652676

Abstract: A hearing aid (10, 11) has a memory (123) for storing personal settings for alleviating a hearing loss for the hearing aid user. A user account is created from an Internet enabled computer device (17) on a remote server (25), and the user account includes the personal settings for alleviating a hearing loss for the hearing aid user and personal information. A wireless connection is set up a between the hearing aid (10, 11) and the personal communication device (13), and the personal communication device (13) is identified as a gateway to the Internet for said hearing aid. The user grants access rights to a third party to modify data in a sub-set of the user account stored on the server (25).

Type: Grant

Filed: November 20, 2014

Date of Patent: May 12, 2020

Assignee: Widex A/S

Inventors: Soren Erik Westermann, Svend Vitting Andersen, Anders Westergaard, Niels Erik Boelskift Maretti
Conflict resolution enhancement system

Patent number: 10535350

Abstract: A method for controlling a plurality of environmental factors that trigger a negative emotional state is provided. The method may include analyzing a plurality of user data when a user experiences a plurality of various environmental factors. The method may also include determining an emotional state experienced by the user when each of the plurality of various environmental factors is present based on the plurality of user data. The method may include receiving a plurality of calendar information associated with a user account. The method may also include identifying an upcoming event based on the plurality of calendar information. The method may include identifying an environmental factor within the plurality of various environmental factors is present at the upcoming event. The method may also include, in response to determining the environmental factor causes the user to experience a negative emotional state, executing an accommodation method based on the environmental factor.

Type: Grant

Filed: April 15, 2019

Date of Patent: January 14, 2020

Assignee: International Business Machines Corporation

Inventors: Paul R. Bastide, Matthew E. Broomhall, Robert E. Loredo, Fang Lu
Unbounded list processing

Patent number: 10296655

Abstract: A computer-implemented method includes receiving, from a first network application, a first unbounded list of objects of a first type and a second unbounded list of objects of a second type, wherein the second type is distinct from the first type, and producing a third unbounded list of objects of a third type, wherein the third type is distinct from both the first type and the second type. The computer-implemented method further includes providing the third unbounded list to a second network application. A corresponding computer program product and computer system are also disclosed.

Type: Grant

Filed: June 24, 2016

Date of Patent: May 21, 2019

Assignee: International Business Machines Corporation

Inventors: Robert J. Connolly, Michael J. Hudson
Nested segmentation method for speech recognition based on sound processing of brain

Patent number: 10008198

Abstract: A method of segmenting input speech signal into plurality of frames for speech recognition is disclosed. The method includes extracting a low frequency signal from the speech signal, and segmenting the speech signal into a plurality of time-intervals according to a plurality of instantaneous phase-sections of the low frequency signal.

Type: Grant

Filed: December 30, 2013

Date of Patent: June 26, 2018

Assignee: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY

Inventors: Kwang-Hyun Cho, Byeongwook Lee, Sung Hoon Jung
System and method for synthetically generated speech describing media content

Patent number: 9875735

Abstract: Disclosed herein are systems, methods, and computer readable-media for providing an automatic synthetically generated voice describing media content, the method comprising receiving one or more pieces of metadata for a primary media content, selecting at least one piece of metadata for output, and outputting the at least one piece of metadata as synthetically generated speech with the primary media content. Other aspects of the invention involve alternative output, output speech simultaneously with the primary media content, output speech during gaps in the primary media content, translate metadata in foreign language, tailor voice, accent, and language to match the metadata and/or primary media content. A user may control output via a user interface or output may be customized based on preferences in a user profile.

Type: Grant

Filed: January 27, 2017

Date of Patent: January 23, 2018

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Linda Roberts, Hong Thi Nguyen, Horst J Schroeter
Enhancing comprehension in voice communications

Patent number: 9824695

Abstract: Embodiments herein include receiving a request to modify an audio characteristic associated with a first user for a voice communication system. One or more suggested modified audio characteristics may be provided for the first user, based on, at least in part, one or more audio preferences established by another user. An input of one or more modified audio characteristics may be received for the first user for the voice communication system. A user-specific audio preference may be associated with the first user for voice communications on the voice communication system, the user-specific audio preference including the one or more modified audio characteristics.

Type: Grant

Filed: June 18, 2012

Date of Patent: November 21, 2017

Assignee: International Business Machines Corporation

Inventors: Ruthie D. Lyle, Patrick Joseph O'Sullivan, Lin Sun
Aging a text-to-speech voice

Patent number: 9558734

Abstract: A voice recipient may request a text-to-speech (TTS) voice that corresponds to an age or age range. An existing TTS voice or existing voice data may be used to create a TTS voice corresponding to the requested age by encoding the voice data to voice parameter values, transforming the voice parameter values using a voice-aging model, synthesizing voice data using the transformed parameter values, and then creating a TTS voice using the transformed voice data. The voice-aging model may model how one or more voice parameters of a voice change with age and may be created from voice data stored in a voice bank.

Type: Grant

Filed: April 26, 2016

Date of Patent: January 31, 2017

Assignee: VOCALID, INC.

Inventors: Rupal Patel, Geoffrey Seth Meltzner
Voice signal encoding method, voice signal decoding method, and apparatus using same

Patent number: 9472199

Abstract: The present invention relates to a method and apparatus for processing a voice signal, and the voice signal encoding method according to the present invention comprises the steps of: generating transform coefficients of sine wave components forming an input voice signal by transforming the sine wave components; determining transform coefficients to be encoded from the generated transform coefficients; and transmitting indication information indicating the determined transform coefficients, wherein the indication information may include position information, magnitude information, and sign information of the transform coefficients.

Type: Grant

Filed: September 28, 2012

Date of Patent: October 18, 2016

Assignee: LG Electronics Inc.

Inventors: Younghan Lee, Gyuhyeok Jeong, Ingyu Kang, Hyejeong Jeon, Lagyoung Kim
Voice quality conversion system, voice quality conversion device, voice quality conversion method, vocal tract information generation device, and vocal tract information generation method

Patent number: 9240194

Abstract: A voice quality conversion system includes: an analysis unit which analyzes sounds of plural vowels of different types to generate first vocal tract shape information for each type of the vowels; a combination unit which combines, for each type of the vowels, the first vocal tract shape information on that type of vowel and the first vocal tract shape information on a different type of vowel to generate second vocal tract shape information on that type of vowel; and a synthesis unit which (i) combines vocal tract shape information on a vowel included in input speech and the second vocal tract shape information on the same type of vowel to convert vocal tract shape information on the input speech, and (ii) generates a synthetic sound using the converted vocal tract shape information and voicing source information on the input speech to convert the voice quality of the input speech.

Type: Grant

Filed: April 29, 2013

Date of Patent: January 19, 2016

Assignee: PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO., LTD.

Inventors: Takahiro Kamai, Yoshifumi Hirose
Technique for estimating particular audio component

Patent number: 9224406

Abstract: Candidate frequencies per unit segment of an audio signal are identified. First processing section identifies an estimated train that is a time series of candidate frequencies, each selected for a different one of the segments, arranged over a plurality of the unit segments and that has a high likelihood of corresponding to a time series of fundamental frequencies of a target component. Second processing section identifies a state train of states, each indicative of one of sound-generating and non-sound-generating states of the target component in a different one of the segments, arranged over the unit segments. Frequency information which designates, as a fundamental frequency of the target component, a candidate frequency corresponding to the unit segment in the estimated train is generated for each unit segment corresponding to the sound-generating state. Frequency information indicative of no sound generation is generated for each unit segment corresponding to the non-sound-generating state.

Type: Grant

Filed: October 28, 2011

Date of Patent: December 29, 2015

Assignee: Yamaha Corporation

Inventors: Jordi Bonada, Jordi Janer, Ricard Marxer, Yasuyuki Umeyama, Kazunobu Kondo, Francisco Garcia
Emotional and/or psychiatric state detection

Patent number: 9058816

Abstract: Mental state of a person is classified in an automated manner by analysing natural speech of the person. A glottal waveform is extracted from a natural speech signal. Pre-determined parameters defining at least one diagnostic class of a class model are retrieved, the parameters determined from selected training glottal waveform features. The selected glottal waveform features are extracted from the signal. Current mental state of the person is classified by comparing extracted glottal waveform features with the parameters and class model. Feature extraction from a glottal waveform or other natural speech signal may involve determining spectral amplitudes of the signal, setting spectral amplitudes below a pre-defined threshold to zero and, for each of a plurality of sub bands, determining an area under the thresholded spectral amplitudes, and deriving signal feature parameters from the determined areas in accordance with a diagnostic class model.

Type: Grant

Filed: August 23, 2010

Date of Patent: June 16, 2015

Assignee: RMIT University

Inventors: Margaret Lech, Nicholas Brian Allen, Ian Shaw Burnett, Ling He
Community audio narration generation

Patent number: 9002703

Abstract: The community-based generation of audio narrations for a text-based work leverages collaboration of a community of people to provide human-voiced audio readings. During the community-based generation, a collection of audio recordings for the text-based work may be collected from multiple human readers in a community. An audio recording for each section in the text-based work may be selected from the collection of audio recordings. The selected audio recordings may be then combined to produce an audio reading of at least a portion of the text-based work.

Type: Grant

Filed: September 28, 2011

Date of Patent: April 7, 2015

Assignee: Amazon Technologies, Inc.

Inventor: Jay A. Crosley
Method and system for enhancing a speech database

Patent number: 8977552

Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

Type: Grant

Filed: May 28, 2014

Date of Patent: March 10, 2015

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Alistair D. Conkie, Ann K. Syrdal
Identification of utterance subjects

Patent number: 8977555

Abstract: Features are disclosed for generating markers for elements or other portions of an audio presentation so that a speech processing system may determine which portion of the audio presentation a user utterance refers to. For example, an utterance may include a pronoun with no explicit antecedent. The marker may be used to associate the utterance with the corresponding content portion for processing. The markers can be provided to a client device with a text-to-speech (“TTS”) presentation. The markers may then be provided to a speech processing system along with a user utterance captured by the client device. The speech processing system, which may include automatic speech recognition (“ASR”) modules and/or natural language understanding (“NLU”) modules, can generate hints based on the marker. The hints can be provided to the ASR and/or NLU modules in order to aid in processing the meaning or intent of a user utterance.

Type: Grant

Filed: December 20, 2012

Date of Patent: March 10, 2015

Assignee: Amazon Technologies, Inc.

Inventors: Fred Torok, Frédéric Johan Georges Deramat, Vikram Kumar Gundeti
Neural translator

Patent number: 8949129

Abstract: A method and apparatus are provided for processing a set of communicated signals associated with a set of muscles, such as the muscles near the larynx of the person, or any other muscles the person use to achieve a desired response. The method includes the steps of attaching a single integrated sensor, for example, near the throat of the person proximate to the larynx and detecting an electrical signal through the sensor. The method further includes the steps of extracting features from the detected electrical signal and continuously transforming them into speech sounds without the need for further modulation. The method also includes comparing the extracted features to a set of prototype features and selecting a prototype feature of the set of prototype features providing a smallest relative difference.

Type: Grant

Filed: August 12, 2013

Date of Patent: February 3, 2015

Assignee: Ambient Corporation

Inventors: Michael Callahan, Thomas Coleman
Method of speech synthesis

Patent number: 8942983

Abstract: The present invention relates to a method of text-based speech synthesis, wherein at least one portion of a text is specified; the intonation of each portion is determined; target speech sounds are associated with each portion; physical parameters of the target speech sounds are determined; speech sounds most similar in terms of the physical parameters to the target speech sounds are found in a speech database; and speech is synthesized as a sequence of the found speech sounds. The physical parameters of said target speech sounds are determined in accordance with the determined intonation. The present method, when used in a speech synthesizer, allows improved quality of synthesized speech due to precise reproduction of intonation.

Type: Grant

Filed: November 23, 2011

Date of Patent: January 27, 2015

Assignee: Speech Technology Centre, Limited

Inventor: Mikhail Vasilievich Khitrov
System and Method for Synthetically Generated Speech Describing Media Content

Publication number: 20140379350

Abstract: Disclosed herein are systems, methods, and computer readable-media for providing an automatic synthetically generated voice describing media content, the method comprising receiving one or more pieces of metadata for a primary media content, selecting at least one piece of metadata for output, and outputting the at least one piece of metadata as synthetically generated speech with the primary media content. Other aspects of the invention involve alternative output, output speech simultaneously with the primary media content, output speech during gaps in the primary media content, translate metadata in foreign language, tailor voice, accent, and language to match the metadata and/or primary media content. A user may control output via a user interface or output may be customized based on preferences in a user profile.

Type: Application

Filed: September 9, 2014

Publication date: December 25, 2014

Inventors: Linda ROBERTS, Hong Thi NGUYEN, Horst J. SCHROETER
Personalized Voice User Interface System and Method

Publication number: 20140365068

Abstract: The present invention is directed to a system and method for personalizing a voice user interface on an electronic device. Voice recordings are made into an electronic device or a computerized system using a software installed onto the device, where a user is prompted to record various dialogues and commands. The recording is then converted into voice data packages, and uploaded onto the electronic device. In this way, users can replace the computerized or preloaded voice in a voice user interface of an electronic device with their own voice or a voice of others. In one embodiment, the electronic device comprises a mobile phone or a tablet computer. In other embodiments, the electronic device comprises a vehicle communication system and a navigation device. The system and method of the present invention enables the user to personalize the voice user interface for each electronic device operated by the user.

Type: Application

Filed: June 5, 2014

Publication date: December 11, 2014

Inventors: Melvin Burns, Wanda L. Burns
Voice quality conversion device and voice quality conversion method for converting voice quality of an input speech using target vocal tract information and received vocal tract information corresponding to the input speech

Patent number: 8898055

Abstract: A voice quality conversion device including: a target vowel vocal tract information hold unit holding target vowel vocal tract information of each vowel indicating target voice quality; a vowel conversion unit (i) receiving vocal tract information with phoneme boundary information of the speech including information of phonemes and phoneme durations, (ii) approximating a temporal change of vocal tract information of a vowel in the vocal tract information with phoneme boundary information applying a first function, (iii) approximating a temporal change of vocal tract information of the same vowel held in the target vowel vocal tract information hold unit applying a second function, (iv) calculating a third function by combining the first function with the second function, and (v) converting the vocal tract information of the vowel applying the third function; and a synthesis unit synthesizing a speech using the converted information.

Type: Grant

Filed: May 8, 2008

Date of Patent: November 25, 2014

Assignee: Panasonic Intellectual Property Corporation of America

Inventors: Yoshifumi Hirose, Takahiro Kamai, Yumiko Kato
System and method for answering a communication notification

Patent number: 8892442

Abstract: Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance.

Type: Grant

Filed: February 17, 2014

Date of Patent: November 18, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Horst J. Schroeter
Recognition dictionary creation device and voice recognition device

Patent number: 8868431

Abstract: A recognition dictionary creation device identifies the language of a reading of an inputted text which is a target to be registered and adds a reading with phonemes in the language identified thereby to the target text to be registered, and also converts the reading of the target text to be registered from the phonemes in the language identified thereby to phonemes in a language to be recognized which is handled in voice recognition to create a recognition dictionary in which the converted reading of the target text to be registered is registered.

Type: Grant

Filed: February 5, 2010

Date of Patent: October 21, 2014

Assignee: Mitsubishi Electric Corporation

Inventors: Michihiro Yamazaki, Jun Ishii, Yasushi Ishikawa
Speech synthesis and coding methods

Patent number: 8862472

Abstract: The present invention is related to a method for coding excitation signal of a target speech comprising the steps of: extracting from a set of training normalized residual frames, a set of relevant normalized residual frames, said training residual frames being extracted from a training speech, synchronized on Glottal Closure Instant(GCI), pitch and energy normalized; determining the target excitation signal of the target speech; dividing said target excitation signal into GCI synchronized target frames; determining the local pitch and energy of the GCI synchronized target frames; normalizing the GCI synchronized target frames in both energy and pitch, to obtain target normalized residual frames; determining coefficients of linear combination of said extracted set of relevant normalized residual frames to build synthetic normalized residual frames close to each target normalized residual frames; wherein the coding parameters for each target residual frames comprise the determined coefficients.

Type: Grant

Filed: March 30, 2010

Date of Patent: October 14, 2014

Assignees: Universite de Mons, Acapela Group S.A.

Inventors: Geoffrey Wilfart, Thomas Drugman, Thierry Dutoit
Training and applying prosody models

Patent number: 8856008

Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.

Type: Grant

Filed: September 18, 2013

Date of Patent: October 7, 2014

Assignee: Morphism LLC

Inventor: James H. Stephens, Jr.
Method And Apparatus For Providing Silent Speech

Publication number: 20140278432

Abstract: Various embodiments provide a method and apparatus for providing a silent speech solution which allows the user to speak over an electronic media such as a cell phone without making any noise. In particular, measuring the shape of the vocal tract allows creation of synthesized speech without requiring noise produced by the vocal chords.

Type: Application

Filed: March 14, 2013

Publication date: September 18, 2014

Inventor: Dale D. Harman
VOICE SYNTHESIS DEVICE, VOICE SYNTHESIS METHOD, AND RECORDING MEDIUM HAVING A VOICE SYNTHESIS PROGRAM STORED THEREON

Publication number: 20140278433

Abstract: A voice synthesis device includes a sequence data generation unit configured to generate sequence data including a plurality of kinds of parameters for controlling vocalization of a voice to be synthesized based on music information and lyrics information, an output unit configured to output a singing voice based on the sequence data, and a processing content information acquisition unit configured to acquire a plurality of processing content information, associated with each of pieces of preset singing manner information. Each of the content information indicates contents of edit processing for all or part of the parameters. The sequence data generation unit generates a plurality of pieces of sequence data, and the sequence data are obtained by editing the all or part of the parameters included in the sequence data, based on the content information associated with one of the pieces of singing manner information specified by a user.

Type: Application

Filed: March 5, 2014

Publication date: September 18, 2014

Applicant: Yamaha Corporation

Inventor: Tatsuya IRIYAMA
GENERATION METHOD OF AUDIO SIGNAL, AUDIO SYNTHESIZING DEVICE

Publication number: 20140207463

Abstract: An audio signal method of the present disclosure includes: inputting a plurality of variables including at least a first variable indicating an opening degree of a throat, which interiorly includes a vocal cord, with respect to a vocal cord model configured to output a second variable indicating an opening degree of the vocal cord according to reception of input of the plurality of variables, the first variable being greater than the second variable; and generating an audio signal in which a level of a non-integer harmonic sound is changed, by controlling the second variable.

Type: Application

Filed: January 17, 2014

Publication date: July 24, 2014

Applicant: PANASONIC CORPORATION

Inventor: Masahiro NAKANISHI
Method and system for providing an automated web transcription service

Patent number: 8775176

Abstract: A system, method and computer readable medium that provides an automated web transcription service is disclosed. The method may include receiving input speech from a user using a communications network, recognizing the received input speech, understanding the recognized speech, transcribing the understood speech to text, storing the transcribed text in a database, receiving a request via a web page to display the transcribed text, retrieving transcribed text from the database, and displaying the transcribed text to the requester using the web page.

Type: Grant

Filed: August 26, 2013

Date of Patent: July 8, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Mazin Gilbert, Stephan Kanthak
Method, apparatus and computer program product for providing text independent voice conversion

Patent number: 8751239

Abstract: An apparatus for providing text independent voice conversion may include a first voice conversion model and a second voice conversion model. The first voice conversion model may be trained with respect to conversion of training source speech to synthetic speech corresponding to the training source speech. The second voice conversion model may be trained with respect to conversion to training target speech from synthetic speech corresponding to the training target speech. An output of the first voice conversion model may be communicated to the second voice conversion model to process source speech input into the first voice conversion model into target speech corresponding to the source speech as the output of the second voice conversion model.

Type: Grant

Filed: October 4, 2007

Date of Patent: June 10, 2014

Assignee: Core Wireless Licensing, S.a.r.l.

Inventors: Jilei Tian, Victor Popa, Jani K. Nurminen
Method and system for enhancing a speech database

Patent number: 8744851

Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

Type: Grant

Filed: August 13, 2013

Date of Patent: June 3, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Alistair Conkie, Ann K Syrdal
System and method for speech synthesis

Patent number: 8719030

Abstract: The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary acoustic waves, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition.

Type: Grant

Filed: December 3, 2012

Date of Patent: May 6, 2014

Inventor: Chengjun Julian Chen
Methods and apparatus for formant-based voice synthesis

Patent number: 8706488

Abstract: In one aspect, a method of processing a voice signal to extract information to facilitate training a speech synthesis model is provided. The method comprises acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison. In another aspect, the method is performed by executing a program encoded on a computer readable medium. In another aspect, a speech synthesis model is provided by, at least in part, performing the method.

Type: Grant

Filed: February 27, 2013

Date of Patent: April 22, 2014

Assignee: Nuance Communications, Inc.

Inventors: Michael D. Edgington, Laurence Gillick, Jordan R. Cohen
System and method for selecting audio contents by using speech recognition

Patent number: 8706489

Abstract: A system and method for selecting audio contents by using the speech recognition to obtain a textual phrase from a series of audio contents are provided. The system includes an output module outputting the audio contents, an input module receiving a speech input from a user, a buffer temporarily storing the audio contents within a desired period and the speech input, and a recognizing module performing a speech recognition between the audio contents within the desired period and the speech input to generate an audio phrase and the corresponding textual phrase matching with the speech input.

Type: Grant

Filed: August 8, 2006

Date of Patent: April 22, 2014

Assignee: Delta Electronics Inc.

Inventors: Jia-lin Shen, Chien-Chou Hung
VOICE CONVERTING APPARATUS AND METHOD FOR CONVERTING USER VOICE THEREOF

Publication number: 20140108015

Abstract: A voice converting apparatus and a voice converting method are provided. The method of converting a voice using a voice converting apparatus including receiving a voice from a counterpart, analyzing the voice and determining whether the voice abnormal, converting the voice into a normal voice by adjusting a harmonic signal of the voice in response to determining that the voice is abnormal, and transmitting the normal voice.

Type: Application

Filed: October 11, 2013

Publication date: April 17, 2014

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Jong-youb RYU, Yoon-jae LEE, Seoung-hun KIM, Young-tae KIM
Dynamic prosody adjustment for voice-rendering synthesized data

Patent number: 8694319

Abstract: Methods, systems, and products are disclosed for dynamic prosody adjustment for voice-rendering synthesized data that include retrieving synthesized data to be voice-rendered; identifying, for the synthesized data to be voice-rendered, a particular prosody setting; determining, in dependence upon the synthesized data to be voice-rendered and the context information for the context in which the synthesized data is to be voice-rendered, a section of the synthesized data to be rendered; and rendering the section of the synthesized data in dependence upon the identified particular prosody setting.

Type: Grant

Filed: November 3, 2005

Date of Patent: April 8, 2014

Assignee: International Business Machines Corporation

Inventors: William K. Bodin, David Jaramillo, Jerry W. Redman, Derral C. Thorson
System and method for answering a communication notification

Patent number: 8655662

Abstract: Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance.

Type: Grant

Filed: November 29, 2012

Date of Patent: February 18, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Horst Schroeter
Speech conversion

Patent number: 8650035

Abstract: A speech conversion system facilitates voice communications. A database comprises a plurality of conversion heuristics, at least some of the conversion heuristics being associated with identification information for at least one first party. At least one speech converter is configured to convert a first speech signal received from the at least one first party into a converted first speech signal different than the first speech signal.

Type: Grant

Filed: November 18, 2005

Date of Patent: February 11, 2014

Assignee: Verizon Laboratories Inc.

Inventor: Adrian E. Conway

1 2 3 4 next