Vocal Tract Model Patents (Class 704/261)
  • Patent number: 11842720
    Abstract: An audio processing system and a method thereof generate a synthesis model that can input an audio signal to generate feature data that can be used by a signal generator to generate a modified audio signal. Specifically, a pre-trained synthesis model is first generated using training audio data. Thereafter, a re-trained synthesis model is established by additionally training the pre-trained synthesis model. Based on a received instruction to modify at least one of sounding conditions of an audio signal to be processed, feature data is generated by inputting additional condition data into the re-trained synthesis model. The signal generator generates the modified audio signal from the generated feature data.
    Type: Grant
    Filed: May 3, 2021
    Date of Patent: December 12, 2023
    Assignee: YAMAHA CORPORATION
    Inventor: Ryunosuke Daido
  • Patent number: 11514924
    Abstract: In an aspect, during a presentation of a presentation material, viewers of the presentation material can be monitored. Based on the monitoring, new content can be determined for insertion into the presentation material. The new content can be automatically inserted to the presentation material in real time. In another aspect, during the presentation, a presenter of the presentation material can be monitored. The presenter's speech can be intercepted and analyzed to detect a level of confidence. Based on the detected level of confidence, the presenter's speech can be adjusted and the adjusted speech can be played back automatically, for example, in lieu of the presenter's original speech that is intercepted.
    Type: Grant
    Filed: February 21, 2020
    Date of Patent: November 29, 2022
    Assignee: International Business Machines Corporation
    Inventors: Samuel Osebe, Charles Muchiri Wachira, Komminist Weldemariam, Celia Cintas
  • Patent number: 11450307
    Abstract: A method, computer program product, and computer system for text-to-speech synthesis is disclosed. Synthetic speech data for an input text may be generated. The synthetic speech data may be compared to recorded reference speech data corresponding to the input text. Based on, at least in part, the comparison of the synthetic speech data to the recorded reference speech data, at least one feature indicative of at least one difference between the synthetic speech data and the recorded reference speech data may be extracted. A speech gap filling model may be generated based on, at least in part, the at least one feature extracted. A speech output may be generated based on, at least in part, the speech gap filling model.
    Type: Grant
    Filed: March 27, 2019
    Date of Patent: September 20, 2022
    Assignee: TELEPATHY LABS, INC.
    Inventors: Piero Perucci, Martin Reber, Vijeta Avijeet
  • Patent number: 11348569
    Abstract: A speech processing device includes a hardware processor configured to receive input speech and extract speech frames from the input speech. The hardware processor is configured to calculate a spectrum parameter for each of the speech frames, calculate a first phase spectrum for each of the speech frames, calculate a group delay spectrum from the first phase spectrum based on a frequency component of the first phase spectrum, calculate a band group delay parameter in a predetermined frequency band from the group delay spectrum, and calculate a band group delay compensation parameter to compensate a difference between a second phase spectrum reconstructed from the band group delay parameter and the first phase spectrum. The hardware processor is configured to generate a speech waveform based on the spectrum parameter, the band group delay parameter, and the band group delay compensation parameter.
    Type: Grant
    Filed: April 7, 2020
    Date of Patent: May 31, 2022
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masatsune Tamura, Masahiro Morita
  • Patent number: 11210058
    Abstract: A sound system for providing independently variable audio outputs is disclosed herein. The sound system may include a display device, an audio system, and a transmitter. The display device may receive an audio signal and transmit the audio signal to the audio system and the transmitter. The audio system may condition the audio signal based on different settings provided by users. The transmitter may wirelessly transmit conditioned audio signals to one or more audio devices.
    Type: Grant
    Filed: September 30, 2020
    Date of Patent: December 28, 2021
    Assignee: TV Ears, Inc.
    Inventor: George Joseph Dennis
  • Patent number: 11137601
    Abstract: Systems and methods according to present principles allow social distancing within themed attractions such as haunted attractions in order to allow the enjoyment of the same in various circumstances. These circumstances include times of pandemic, for customers that are afraid to congregate in large groups, for customers that desire to control aspects of the experience, and so on.
    Type: Grant
    Filed: January 13, 2021
    Date of Patent: October 5, 2021
    Inventor: Mark D. Wieczorek
  • Patent number: 11094313
    Abstract: An electronic device for adjusting a speech output rate (speech rate) of speech output data.
    Type: Grant
    Filed: June 18, 2019
    Date of Patent: August 17, 2021
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Piotr Marcinkiewicz
  • Patent number: 10986418
    Abstract: Methods and systems are described herein for improving audio for hearing impaired content consumers. An example method may comprise determining a content asset. Closed caption data associated with the content asset may be determined. At least a portion of the closed caption data may be determined based on a user setting associated with a hearing impairment. Compensating audio comprising a frequency translation associated with at least the portion of the closed caption data may be generated. The content asset may be caused to be output with audio content comprising the compensating audio and the original audio.
    Type: Grant
    Filed: May 17, 2019
    Date of Patent: April 20, 2021
    Assignee: Comcast Cable Communications, LLC
    Inventor: Jeff Calkins
  • Patent number: 10909978
    Abstract: Technologies for secure storage of utterances are disclosed. A computing device captures audio of a human making a verbal utterance. The utterance is provided to a speech-to-text (STT) service that translates the utterance to text. The STT service can also identify various speaker-specific attributes in the utterance. The text and attributes are provided to a text-to-speech (TTS) service that creates speech from the text and a subset of the attributes. The speech is stored in a data store that is less secure than that required for storing the original utterance. The original utterance can then be discarded. The STT service can also translate the speech generated by the TTS service to text. The text generated by the STT service from the speech and the text generated by the STT service from the original utterance are then compared. If the text does not match, the original utterance can be retained.
    Type: Grant
    Filed: June 28, 2017
    Date of Patent: February 2, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: William Frederick Hingle Kruse, Peter Turk, Panagiotis Thomas
  • Patent number: 10902060
    Abstract: A computer-implemented method includes receiving, from a first network application, a first unbounded list of objects of a first type and a second unbounded list of objects of a second type, wherein the second type is distinct from the first type, and producing a third unbounded list of objects of a third type, wherein the third type is distinct from both the first type and the second type. The computer-implemented method further includes providing the third unbounded list to a second network application. A corresponding computer program product and computer system are also disclosed.
    Type: Grant
    Filed: April 15, 2019
    Date of Patent: January 26, 2021
    Assignee: International Business Machines Corporation
    Inventors: Robert J. Connolly, Michael J. Hudson
  • Patent number: 10896678
    Abstract: Typical graphical user interfaces and predefined data fields limit the interaction between a person and a computing system. An oral communication device and a data enablement platform are provided for ingesting oral conversational data from people, and using machine learning to provide intelligence. At the front end, an oral conversational bot, or chatbot, interacts with a user. On the backend, the data enablement platform has a computing architecture that ingests data from various external data sources as well as data from internal applications and databases. These data and algorithms are applied to surface new data, identify trends, provide recommendations, infer new understanding, predict actions and events, and automatically act on this computed information. The chatbot then provides audio data that reflects the information computed by the data enablement platform. The system and the devices, for example, are adaptable to various industries.
    Type: Grant
    Filed: August 10, 2018
    Date of Patent: January 19, 2021
    Assignee: FACET LABS, LLC
    Inventors: Stuart Ogawa, Lindsay Alexander Sparks, Koichi Nishimura, Wilfred P. So
  • Patent number: 10878802
    Abstract: A speech processing apparatus includes a specifier, and a modulator. The specifier specifies any one or more of one or more speeches included in speeches to be output, as an emphasis part based on an attribute of the speech. The modulator modulates the emphasis part of at least one of first speech to be output to the first output unit and second speech to be output to the second output unit such that at least one of a pitch and a phase is different between the emphasis part of the first speech and the emphasis part of the second speech.
    Type: Grant
    Filed: August 28, 2017
    Date of Patent: December 29, 2020
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Masahiro Yamamoto
  • Patent number: 10803852
    Abstract: A speech processing apparatus includes a specifier, a determiner, and a modulator. The specifier specifies an emphasis part of speech to be output. The determiner determines, from among a plurality of output units, a first output unit and a second output unit for outputting speech for emphasizing the emphasis part. The modulator modulates the emphasis part of at least one of first speech to be output to the first output unit and second speech to be output to the second output unit such that at least one of a pitch and a phase is different between the emphasis part of the first speech and the emphasis part of the second speech.
    Type: Grant
    Filed: August 28, 2017
    Date of Patent: October 13, 2020
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Masahiro Yamamoto
  • Patent number: 10650800
    Abstract: A speech processing device of an embodiment includes a spectrum parameter calculation unit, a phase spectrum calculation unit, a group delay spectrum calculation unit, a band group delay parameter calculation unit, and a band group delay compensation parameter calculation unit. The spectrum parameter calculation unit calculates a spectrum parameter. The phase spectrum calculation unit calculates a first phase spectrum. The group delay spectrum calculation unit calculates a group delay spectrum from the first phase spectrum based on a frequency component of the first phase spectrum. The band group delay parameter calculation unit calculates a band group delay parameter in a predetermined frequency band from a group delay spectrum. The band group delay compensation parameter calculation unit calculates a band group delay compensation parameter to compensate a difference between a second phase spectrum reconstructed from the band group delay parameter and the first phase spectrum.
    Type: Grant
    Filed: February 16, 2018
    Date of Patent: May 12, 2020
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masatsune Tamura, Masahiro Morita
  • Patent number: 10652676
    Abstract: A hearing aid (10, 11) has a memory (123) for storing personal settings for alleviating a hearing loss for the hearing aid user. A user account is created from an Internet enabled computer device (17) on a remote server (25), and the user account includes the personal settings for alleviating a hearing loss for the hearing aid user and personal information. A wireless connection is set up a between the hearing aid (10, 11) and the personal communication device (13), and the personal communication device (13) is identified as a gateway to the Internet for said hearing aid. The user grants access rights to a third party to modify data in a sub-set of the user account stored on the server (25).
    Type: Grant
    Filed: November 20, 2014
    Date of Patent: May 12, 2020
    Assignee: Widex A/S
    Inventors: Soren Erik Westermann, Svend Vitting Andersen, Anders Westergaard, Niels Erik Boelskift Maretti
  • Patent number: 10535350
    Abstract: A method for controlling a plurality of environmental factors that trigger a negative emotional state is provided. The method may include analyzing a plurality of user data when a user experiences a plurality of various environmental factors. The method may also include determining an emotional state experienced by the user when each of the plurality of various environmental factors is present based on the plurality of user data. The method may include receiving a plurality of calendar information associated with a user account. The method may also include identifying an upcoming event based on the plurality of calendar information. The method may include identifying an environmental factor within the plurality of various environmental factors is present at the upcoming event. The method may also include, in response to determining the environmental factor causes the user to experience a negative emotional state, executing an accommodation method based on the environmental factor.
    Type: Grant
    Filed: April 15, 2019
    Date of Patent: January 14, 2020
    Assignee: International Business Machines Corporation
    Inventors: Paul R. Bastide, Matthew E. Broomhall, Robert E. Loredo, Fang Lu
  • Patent number: 10296655
    Abstract: A computer-implemented method includes receiving, from a first network application, a first unbounded list of objects of a first type and a second unbounded list of objects of a second type, wherein the second type is distinct from the first type, and producing a third unbounded list of objects of a third type, wherein the third type is distinct from both the first type and the second type. The computer-implemented method further includes providing the third unbounded list to a second network application. A corresponding computer program product and computer system are also disclosed.
    Type: Grant
    Filed: June 24, 2016
    Date of Patent: May 21, 2019
    Assignee: International Business Machines Corporation
    Inventors: Robert J. Connolly, Michael J. Hudson
  • Patent number: 10008198
    Abstract: A method of segmenting input speech signal into plurality of frames for speech recognition is disclosed. The method includes extracting a low frequency signal from the speech signal, and segmenting the speech signal into a plurality of time-intervals according to a plurality of instantaneous phase-sections of the low frequency signal.
    Type: Grant
    Filed: December 30, 2013
    Date of Patent: June 26, 2018
    Assignee: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY
    Inventors: Kwang-Hyun Cho, Byeongwook Lee, Sung Hoon Jung
  • Patent number: 9875735
    Abstract: Disclosed herein are systems, methods, and computer readable-media for providing an automatic synthetically generated voice describing media content, the method comprising receiving one or more pieces of metadata for a primary media content, selecting at least one piece of metadata for output, and outputting the at least one piece of metadata as synthetically generated speech with the primary media content. Other aspects of the invention involve alternative output, output speech simultaneously with the primary media content, output speech during gaps in the primary media content, translate metadata in foreign language, tailor voice, accent, and language to match the metadata and/or primary media content. A user may control output via a user interface or output may be customized based on preferences in a user profile.
    Type: Grant
    Filed: January 27, 2017
    Date of Patent: January 23, 2018
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Linda Roberts, Hong Thi Nguyen, Horst J Schroeter
  • Patent number: 9824695
    Abstract: Embodiments herein include receiving a request to modify an audio characteristic associated with a first user for a voice communication system. One or more suggested modified audio characteristics may be provided for the first user, based on, at least in part, one or more audio preferences established by another user. An input of one or more modified audio characteristics may be received for the first user for the voice communication system. A user-specific audio preference may be associated with the first user for voice communications on the voice communication system, the user-specific audio preference including the one or more modified audio characteristics.
    Type: Grant
    Filed: June 18, 2012
    Date of Patent: November 21, 2017
    Assignee: International Business Machines Corporation
    Inventors: Ruthie D. Lyle, Patrick Joseph O'Sullivan, Lin Sun
  • Patent number: 9558734
    Abstract: A voice recipient may request a text-to-speech (TTS) voice that corresponds to an age or age range. An existing TTS voice or existing voice data may be used to create a TTS voice corresponding to the requested age by encoding the voice data to voice parameter values, transforming the voice parameter values using a voice-aging model, synthesizing voice data using the transformed parameter values, and then creating a TTS voice using the transformed voice data. The voice-aging model may model how one or more voice parameters of a voice change with age and may be created from voice data stored in a voice bank.
    Type: Grant
    Filed: April 26, 2016
    Date of Patent: January 31, 2017
    Assignee: VOCALID, INC.
    Inventors: Rupal Patel, Geoffrey Seth Meltzner
  • Patent number: 9472199
    Abstract: The present invention relates to a method and apparatus for processing a voice signal, and the voice signal encoding method according to the present invention comprises the steps of: generating transform coefficients of sine wave components forming an input voice signal by transforming the sine wave components; determining transform coefficients to be encoded from the generated transform coefficients; and transmitting indication information indicating the determined transform coefficients, wherein the indication information may include position information, magnitude information, and sign information of the transform coefficients.
    Type: Grant
    Filed: September 28, 2012
    Date of Patent: October 18, 2016
    Assignee: LG Electronics Inc.
    Inventors: Younghan Lee, Gyuhyeok Jeong, Ingyu Kang, Hyejeong Jeon, Lagyoung Kim
  • Patent number: 9240194
    Abstract: A voice quality conversion system includes: an analysis unit which analyzes sounds of plural vowels of different types to generate first vocal tract shape information for each type of the vowels; a combination unit which combines, for each type of the vowels, the first vocal tract shape information on that type of vowel and the first vocal tract shape information on a different type of vowel to generate second vocal tract shape information on that type of vowel; and a synthesis unit which (i) combines vocal tract shape information on a vowel included in input speech and the second vocal tract shape information on the same type of vowel to convert vocal tract shape information on the input speech, and (ii) generates a synthetic sound using the converted vocal tract shape information and voicing source information on the input speech to convert the voice quality of the input speech.
    Type: Grant
    Filed: April 29, 2013
    Date of Patent: January 19, 2016
    Assignee: PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO., LTD.
    Inventors: Takahiro Kamai, Yoshifumi Hirose
  • Patent number: 9224406
    Abstract: Candidate frequencies per unit segment of an audio signal are identified. First processing section identifies an estimated train that is a time series of candidate frequencies, each selected for a different one of the segments, arranged over a plurality of the unit segments and that has a high likelihood of corresponding to a time series of fundamental frequencies of a target component. Second processing section identifies a state train of states, each indicative of one of sound-generating and non-sound-generating states of the target component in a different one of the segments, arranged over the unit segments. Frequency information which designates, as a fundamental frequency of the target component, a candidate frequency corresponding to the unit segment in the estimated train is generated for each unit segment corresponding to the sound-generating state. Frequency information indicative of no sound generation is generated for each unit segment corresponding to the non-sound-generating state.
    Type: Grant
    Filed: October 28, 2011
    Date of Patent: December 29, 2015
    Assignee: Yamaha Corporation
    Inventors: Jordi Bonada, Jordi Janer, Ricard Marxer, Yasuyuki Umeyama, Kazunobu Kondo, Francisco Garcia
  • Patent number: 9058816
    Abstract: Mental state of a person is classified in an automated manner by analysing natural speech of the person. A glottal waveform is extracted from a natural speech signal. Pre-determined parameters defining at least one diagnostic class of a class model are retrieved, the parameters determined from selected training glottal waveform features. The selected glottal waveform features are extracted from the signal. Current mental state of the person is classified by comparing extracted glottal waveform features with the parameters and class model. Feature extraction from a glottal waveform or other natural speech signal may involve determining spectral amplitudes of the signal, setting spectral amplitudes below a pre-defined threshold to zero and, for each of a plurality of sub bands, determining an area under the thresholded spectral amplitudes, and deriving signal feature parameters from the determined areas in accordance with a diagnostic class model.
    Type: Grant
    Filed: August 23, 2010
    Date of Patent: June 16, 2015
    Assignee: RMIT University
    Inventors: Margaret Lech, Nicholas Brian Allen, Ian Shaw Burnett, Ling He
  • Patent number: 9002703
    Abstract: The community-based generation of audio narrations for a text-based work leverages collaboration of a community of people to provide human-voiced audio readings. During the community-based generation, a collection of audio recordings for the text-based work may be collected from multiple human readers in a community. An audio recording for each section in the text-based work may be selected from the collection of audio recordings. The selected audio recordings may be then combined to produce an audio reading of at least a portion of the text-based work.
    Type: Grant
    Filed: September 28, 2011
    Date of Patent: April 7, 2015
    Assignee: Amazon Technologies, Inc.
    Inventor: Jay A. Crosley
  • Patent number: 8977552
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: May 28, 2014
    Date of Patent: March 10, 2015
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 8977555
    Abstract: Features are disclosed for generating markers for elements or other portions of an audio presentation so that a speech processing system may determine which portion of the audio presentation a user utterance refers to. For example, an utterance may include a pronoun with no explicit antecedent. The marker may be used to associate the utterance with the corresponding content portion for processing. The markers can be provided to a client device with a text-to-speech (“TTS”) presentation. The markers may then be provided to a speech processing system along with a user utterance captured by the client device. The speech processing system, which may include automatic speech recognition (“ASR”) modules and/or natural language understanding (“NLU”) modules, can generate hints based on the marker. The hints can be provided to the ASR and/or NLU modules in order to aid in processing the meaning or intent of a user utterance.
    Type: Grant
    Filed: December 20, 2012
    Date of Patent: March 10, 2015
    Assignee: Amazon Technologies, Inc.
    Inventors: Fred Torok, Frédéric Johan Georges Deramat, Vikram Kumar Gundeti
  • Patent number: 8949129
    Abstract: A method and apparatus are provided for processing a set of communicated signals associated with a set of muscles, such as the muscles near the larynx of the person, or any other muscles the person use to achieve a desired response. The method includes the steps of attaching a single integrated sensor, for example, near the throat of the person proximate to the larynx and detecting an electrical signal through the sensor. The method further includes the steps of extracting features from the detected electrical signal and continuously transforming them into speech sounds without the need for further modulation. The method also includes comparing the extracted features to a set of prototype features and selecting a prototype feature of the set of prototype features providing a smallest relative difference.
    Type: Grant
    Filed: August 12, 2013
    Date of Patent: February 3, 2015
    Assignee: Ambient Corporation
    Inventors: Michael Callahan, Thomas Coleman
  • Patent number: 8942983
    Abstract: The present invention relates to a method of text-based speech synthesis, wherein at least one portion of a text is specified; the intonation of each portion is determined; target speech sounds are associated with each portion; physical parameters of the target speech sounds are determined; speech sounds most similar in terms of the physical parameters to the target speech sounds are found in a speech database; and speech is synthesized as a sequence of the found speech sounds. The physical parameters of said target speech sounds are determined in accordance with the determined intonation. The present method, when used in a speech synthesizer, allows improved quality of synthesized speech due to precise reproduction of intonation.
    Type: Grant
    Filed: November 23, 2011
    Date of Patent: January 27, 2015
    Assignee: Speech Technology Centre, Limited
    Inventor: Mikhail Vasilievich Khitrov
  • Publication number: 20140379350
    Abstract: Disclosed herein are systems, methods, and computer readable-media for providing an automatic synthetically generated voice describing media content, the method comprising receiving one or more pieces of metadata for a primary media content, selecting at least one piece of metadata for output, and outputting the at least one piece of metadata as synthetically generated speech with the primary media content. Other aspects of the invention involve alternative output, output speech simultaneously with the primary media content, output speech during gaps in the primary media content, translate metadata in foreign language, tailor voice, accent, and language to match the metadata and/or primary media content. A user may control output via a user interface or output may be customized based on preferences in a user profile.
    Type: Application
    Filed: September 9, 2014
    Publication date: December 25, 2014
    Inventors: Linda ROBERTS, Hong Thi NGUYEN, Horst J. SCHROETER
  • Publication number: 20140365068
    Abstract: The present invention is directed to a system and method for personalizing a voice user interface on an electronic device. Voice recordings are made into an electronic device or a computerized system using a software installed onto the device, where a user is prompted to record various dialogues and commands. The recording is then converted into voice data packages, and uploaded onto the electronic device. In this way, users can replace the computerized or preloaded voice in a voice user interface of an electronic device with their own voice or a voice of others. In one embodiment, the electronic device comprises a mobile phone or a tablet computer. In other embodiments, the electronic device comprises a vehicle communication system and a navigation device. The system and method of the present invention enables the user to personalize the voice user interface for each electronic device operated by the user.
    Type: Application
    Filed: June 5, 2014
    Publication date: December 11, 2014
    Inventors: Melvin Burns, Wanda L. Burns
  • Patent number: 8898055
    Abstract: A voice quality conversion device including: a target vowel vocal tract information hold unit holding target vowel vocal tract information of each vowel indicating target voice quality; a vowel conversion unit (i) receiving vocal tract information with phoneme boundary information of the speech including information of phonemes and phoneme durations, (ii) approximating a temporal change of vocal tract information of a vowel in the vocal tract information with phoneme boundary information applying a first function, (iii) approximating a temporal change of vocal tract information of the same vowel held in the target vowel vocal tract information hold unit applying a second function, (iv) calculating a third function by combining the first function with the second function, and (v) converting the vocal tract information of the vowel applying the third function; and a synthesis unit synthesizing a speech using the converted information.
    Type: Grant
    Filed: May 8, 2008
    Date of Patent: November 25, 2014
    Assignee: Panasonic Intellectual Property Corporation of America
    Inventors: Yoshifumi Hirose, Takahiro Kamai, Yumiko Kato
  • Patent number: 8892442
    Abstract: Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance.
    Type: Grant
    Filed: February 17, 2014
    Date of Patent: November 18, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Horst J. Schroeter
  • Patent number: 8868431
    Abstract: A recognition dictionary creation device identifies the language of a reading of an inputted text which is a target to be registered and adds a reading with phonemes in the language identified thereby to the target text to be registered, and also converts the reading of the target text to be registered from the phonemes in the language identified thereby to phonemes in a language to be recognized which is handled in voice recognition to create a recognition dictionary in which the converted reading of the target text to be registered is registered.
    Type: Grant
    Filed: February 5, 2010
    Date of Patent: October 21, 2014
    Assignee: Mitsubishi Electric Corporation
    Inventors: Michihiro Yamazaki, Jun Ishii, Yasushi Ishikawa
  • Patent number: 8862472
    Abstract: The present invention is related to a method for coding excitation signal of a target speech comprising the steps of: extracting from a set of training normalized residual frames, a set of relevant normalized residual frames, said training residual frames being extracted from a training speech, synchronized on Glottal Closure Instant(GCI), pitch and energy normalized; determining the target excitation signal of the target speech; dividing said target excitation signal into GCI synchronized target frames; determining the local pitch and energy of the GCI synchronized target frames; normalizing the GCI synchronized target frames in both energy and pitch, to obtain target normalized residual frames; determining coefficients of linear combination of said extracted set of relevant normalized residual frames to build synthetic normalized residual frames close to each target normalized residual frames; wherein the coding parameters for each target residual frames comprise the determined coefficients.
    Type: Grant
    Filed: March 30, 2010
    Date of Patent: October 14, 2014
    Assignees: Universite de Mons, Acapela Group S.A.
    Inventors: Geoffrey Wilfart, Thomas Drugman, Thierry Dutoit
  • Patent number: 8856008
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: September 18, 2013
    Date of Patent: October 7, 2014
    Assignee: Morphism LLC
    Inventor: James H. Stephens, Jr.
  • Publication number: 20140278432
    Abstract: Various embodiments provide a method and apparatus for providing a silent speech solution which allows the user to speak over an electronic media such as a cell phone without making any noise. In particular, measuring the shape of the vocal tract allows creation of synthesized speech without requiring noise produced by the vocal chords.
    Type: Application
    Filed: March 14, 2013
    Publication date: September 18, 2014
    Inventor: Dale D. Harman
  • Publication number: 20140278433
    Abstract: A voice synthesis device includes a sequence data generation unit configured to generate sequence data including a plurality of kinds of parameters for controlling vocalization of a voice to be synthesized based on music information and lyrics information, an output unit configured to output a singing voice based on the sequence data, and a processing content information acquisition unit configured to acquire a plurality of processing content information, associated with each of pieces of preset singing manner information. Each of the content information indicates contents of edit processing for all or part of the parameters. The sequence data generation unit generates a plurality of pieces of sequence data, and the sequence data are obtained by editing the all or part of the parameters included in the sequence data, based on the content information associated with one of the pieces of singing manner information specified by a user.
    Type: Application
    Filed: March 5, 2014
    Publication date: September 18, 2014
    Applicant: Yamaha Corporation
    Inventor: Tatsuya IRIYAMA
  • Publication number: 20140207463
    Abstract: An audio signal method of the present disclosure includes: inputting a plurality of variables including at least a first variable indicating an opening degree of a throat, which interiorly includes a vocal cord, with respect to a vocal cord model configured to output a second variable indicating an opening degree of the vocal cord according to reception of input of the plurality of variables, the first variable being greater than the second variable; and generating an audio signal in which a level of a non-integer harmonic sound is changed, by controlling the second variable.
    Type: Application
    Filed: January 17, 2014
    Publication date: July 24, 2014
    Applicant: PANASONIC CORPORATION
    Inventor: Masahiro NAKANISHI
  • Patent number: 8775176
    Abstract: A system, method and computer readable medium that provides an automated web transcription service is disclosed. The method may include receiving input speech from a user using a communications network, recognizing the received input speech, understanding the recognized speech, transcribing the understood speech to text, storing the transcribed text in a database, receiving a request via a web page to display the transcribed text, retrieving transcribed text from the database, and displaying the transcribed text to the requester using the web page.
    Type: Grant
    Filed: August 26, 2013
    Date of Patent: July 8, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Mazin Gilbert, Stephan Kanthak
  • Patent number: 8751239
    Abstract: An apparatus for providing text independent voice conversion may include a first voice conversion model and a second voice conversion model. The first voice conversion model may be trained with respect to conversion of training source speech to synthetic speech corresponding to the training source speech. The second voice conversion model may be trained with respect to conversion to training target speech from synthetic speech corresponding to the training target speech. An output of the first voice conversion model may be communicated to the second voice conversion model to process source speech input into the first voice conversion model into target speech corresponding to the source speech as the output of the second voice conversion model.
    Type: Grant
    Filed: October 4, 2007
    Date of Patent: June 10, 2014
    Assignee: Core Wireless Licensing, S.a.r.l.
    Inventors: Jilei Tian, Victor Popa, Jani K. Nurminen
  • Patent number: 8744851
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: August 13, 2013
    Date of Patent: June 3, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair Conkie, Ann K Syrdal
  • Patent number: 8719030
    Abstract: The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary acoustic waves, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition.
    Type: Grant
    Filed: December 3, 2012
    Date of Patent: May 6, 2014
    Inventor: Chengjun Julian Chen
  • Patent number: 8706488
    Abstract: In one aspect, a method of processing a voice signal to extract information to facilitate training a speech synthesis model is provided. The method comprises acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison. In another aspect, the method is performed by executing a program encoded on a computer readable medium. In another aspect, a speech synthesis model is provided by, at least in part, performing the method.
    Type: Grant
    Filed: February 27, 2013
    Date of Patent: April 22, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Michael D. Edgington, Laurence Gillick, Jordan R. Cohen
  • Patent number: 8706489
    Abstract: A system and method for selecting audio contents by using the speech recognition to obtain a textual phrase from a series of audio contents are provided. The system includes an output module outputting the audio contents, an input module receiving a speech input from a user, a buffer temporarily storing the audio contents within a desired period and the speech input, and a recognizing module performing a speech recognition between the audio contents within the desired period and the speech input to generate an audio phrase and the corresponding textual phrase matching with the speech input.
    Type: Grant
    Filed: August 8, 2006
    Date of Patent: April 22, 2014
    Assignee: Delta Electronics Inc.
    Inventors: Jia-lin Shen, Chien-Chou Hung
  • Publication number: 20140108015
    Abstract: A voice converting apparatus and a voice converting method are provided. The method of converting a voice using a voice converting apparatus including receiving a voice from a counterpart, analyzing the voice and determining whether the voice abnormal, converting the voice into a normal voice by adjusting a harmonic signal of the voice in response to determining that the voice is abnormal, and transmitting the normal voice.
    Type: Application
    Filed: October 11, 2013
    Publication date: April 17, 2014
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jong-youb RYU, Yoon-jae LEE, Seoung-hun KIM, Young-tae KIM
  • Patent number: 8694319
    Abstract: Methods, systems, and products are disclosed for dynamic prosody adjustment for voice-rendering synthesized data that include retrieving synthesized data to be voice-rendered; identifying, for the synthesized data to be voice-rendered, a particular prosody setting; determining, in dependence upon the synthesized data to be voice-rendered and the context information for the context in which the synthesized data is to be voice-rendered, a section of the synthesized data to be rendered; and rendering the section of the synthesized data in dependence upon the identified particular prosody setting.
    Type: Grant
    Filed: November 3, 2005
    Date of Patent: April 8, 2014
    Assignee: International Business Machines Corporation
    Inventors: William K. Bodin, David Jaramillo, Jerry W. Redman, Derral C. Thorson
  • Patent number: 8655662
    Abstract: Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance.
    Type: Grant
    Filed: November 29, 2012
    Date of Patent: February 18, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Horst Schroeter
  • Patent number: 8650035
    Abstract: A speech conversion system facilitates voice communications. A database comprises a plurality of conversion heuristics, at least some of the conversion heuristics being associated with identification information for at least one first party. At least one speech converter is configured to convert a first speech signal received from the at least one first party into a converted first speech signal different than the first speech signal.
    Type: Grant
    Filed: November 18, 2005
    Date of Patent: February 11, 2014
    Assignee: Verizon Laboratories Inc.
    Inventor: Adrian E. Conway