Vocal Tract Model Patents (Class 704/261)
  • Patent number: 10803852
    Abstract: A speech processing apparatus includes a specifier, a determiner, and a modulator. The specifier specifies an emphasis part of speech to be output. The determiner determines, from among a plurality of output units, a first output unit and a second output unit for outputting speech for emphasizing the emphasis part. The modulator modulates the emphasis part of at least one of first speech to be output to the first output unit and second speech to be output to the second output unit such that at least one of a pitch and a phase is different between the emphasis part of the first speech and the emphasis part of the second speech.
    Type: Grant
    Filed: August 28, 2017
    Date of Patent: October 13, 2020
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Masahiro Yamamoto
  • Patent number: 10652676
    Abstract: A hearing aid (10, 11) has a memory (123) for storing personal settings for alleviating a hearing loss for the hearing aid user. A user account is created from an Internet enabled computer device (17) on a remote server (25), and the user account includes the personal settings for alleviating a hearing loss for the hearing aid user and personal information. A wireless connection is set up a between the hearing aid (10, 11) and the personal communication device (13), and the personal communication device (13) is identified as a gateway to the Internet for said hearing aid. The user grants access rights to a third party to modify data in a sub-set of the user account stored on the server (25).
    Type: Grant
    Filed: November 20, 2014
    Date of Patent: May 12, 2020
    Assignee: Widex A/S
    Inventors: Soren Erik Westermann, Svend Vitting Andersen, Anders Westergaard, Niels Erik Boelskift Maretti
  • Patent number: 10650800
    Abstract: A speech processing device of an embodiment includes a spectrum parameter calculation unit, a phase spectrum calculation unit, a group delay spectrum calculation unit, a band group delay parameter calculation unit, and a band group delay compensation parameter calculation unit. The spectrum parameter calculation unit calculates a spectrum parameter. The phase spectrum calculation unit calculates a first phase spectrum. The group delay spectrum calculation unit calculates a group delay spectrum from the first phase spectrum based on a frequency component of the first phase spectrum. The band group delay parameter calculation unit calculates a band group delay parameter in a predetermined frequency band from a group delay spectrum. The band group delay compensation parameter calculation unit calculates a band group delay compensation parameter to compensate a difference between a second phase spectrum reconstructed from the band group delay parameter and the first phase spectrum.
    Type: Grant
    Filed: February 16, 2018
    Date of Patent: May 12, 2020
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masatsune Tamura, Masahiro Morita
  • Patent number: 10535350
    Abstract: A method for controlling a plurality of environmental factors that trigger a negative emotional state is provided. The method may include analyzing a plurality of user data when a user experiences a plurality of various environmental factors. The method may also include determining an emotional state experienced by the user when each of the plurality of various environmental factors is present based on the plurality of user data. The method may include receiving a plurality of calendar information associated with a user account. The method may also include identifying an upcoming event based on the plurality of calendar information. The method may include identifying an environmental factor within the plurality of various environmental factors is present at the upcoming event. The method may also include, in response to determining the environmental factor causes the user to experience a negative emotional state, executing an accommodation method based on the environmental factor.
    Type: Grant
    Filed: April 15, 2019
    Date of Patent: January 14, 2020
    Assignee: International Business Machines Corporation
    Inventors: Paul R. Bastide, Matthew E. Broomhall, Robert E. Loredo, Fang Lu
  • Patent number: 10296655
    Abstract: A computer-implemented method includes receiving, from a first network application, a first unbounded list of objects of a first type and a second unbounded list of objects of a second type, wherein the second type is distinct from the first type, and producing a third unbounded list of objects of a third type, wherein the third type is distinct from both the first type and the second type. The computer-implemented method further includes providing the third unbounded list to a second network application. A corresponding computer program product and computer system are also disclosed.
    Type: Grant
    Filed: June 24, 2016
    Date of Patent: May 21, 2019
    Assignee: International Business Machines Corporation
    Inventors: Robert J. Connolly, Michael J. Hudson
  • Patent number: 10008198
    Abstract: A method of segmenting input speech signal into plurality of frames for speech recognition is disclosed. The method includes extracting a low frequency signal from the speech signal, and segmenting the speech signal into a plurality of time-intervals according to a plurality of instantaneous phase-sections of the low frequency signal.
    Type: Grant
    Filed: December 30, 2013
    Date of Patent: June 26, 2018
    Assignee: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY
    Inventors: Kwang-Hyun Cho, Byeongwook Lee, Sung Hoon Jung
  • Patent number: 9875735
    Abstract: Disclosed herein are systems, methods, and computer readable-media for providing an automatic synthetically generated voice describing media content, the method comprising receiving one or more pieces of metadata for a primary media content, selecting at least one piece of metadata for output, and outputting the at least one piece of metadata as synthetically generated speech with the primary media content. Other aspects of the invention involve alternative output, output speech simultaneously with the primary media content, output speech during gaps in the primary media content, translate metadata in foreign language, tailor voice, accent, and language to match the metadata and/or primary media content. A user may control output via a user interface or output may be customized based on preferences in a user profile.
    Type: Grant
    Filed: January 27, 2017
    Date of Patent: January 23, 2018
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Linda Roberts, Hong Thi Nguyen, Horst J Schroeter
  • Patent number: 9824695
    Abstract: Embodiments herein include receiving a request to modify an audio characteristic associated with a first user for a voice communication system. One or more suggested modified audio characteristics may be provided for the first user, based on, at least in part, one or more audio preferences established by another user. An input of one or more modified audio characteristics may be received for the first user for the voice communication system. A user-specific audio preference may be associated with the first user for voice communications on the voice communication system, the user-specific audio preference including the one or more modified audio characteristics.
    Type: Grant
    Filed: June 18, 2012
    Date of Patent: November 21, 2017
    Assignee: International Business Machines Corporation
    Inventors: Ruthie D. Lyle, Patrick Joseph O'Sullivan, Lin Sun
  • Patent number: 9558734
    Abstract: A voice recipient may request a text-to-speech (TTS) voice that corresponds to an age or age range. An existing TTS voice or existing voice data may be used to create a TTS voice corresponding to the requested age by encoding the voice data to voice parameter values, transforming the voice parameter values using a voice-aging model, synthesizing voice data using the transformed parameter values, and then creating a TTS voice using the transformed voice data. The voice-aging model may model how one or more voice parameters of a voice change with age and may be created from voice data stored in a voice bank.
    Type: Grant
    Filed: April 26, 2016
    Date of Patent: January 31, 2017
    Assignee: VOCALID, INC.
    Inventors: Rupal Patel, Geoffrey Seth Meltzner
  • Patent number: 9472199
    Abstract: The present invention relates to a method and apparatus for processing a voice signal, and the voice signal encoding method according to the present invention comprises the steps of: generating transform coefficients of sine wave components forming an input voice signal by transforming the sine wave components; determining transform coefficients to be encoded from the generated transform coefficients; and transmitting indication information indicating the determined transform coefficients, wherein the indication information may include position information, magnitude information, and sign information of the transform coefficients.
    Type: Grant
    Filed: September 28, 2012
    Date of Patent: October 18, 2016
    Assignee: LG Electronics Inc.
    Inventors: Younghan Lee, Gyuhyeok Jeong, Ingyu Kang, Hyejeong Jeon, Lagyoung Kim
  • Patent number: 9240194
    Abstract: A voice quality conversion system includes: an analysis unit which analyzes sounds of plural vowels of different types to generate first vocal tract shape information for each type of the vowels; a combination unit which combines, for each type of the vowels, the first vocal tract shape information on that type of vowel and the first vocal tract shape information on a different type of vowel to generate second vocal tract shape information on that type of vowel; and a synthesis unit which (i) combines vocal tract shape information on a vowel included in input speech and the second vocal tract shape information on the same type of vowel to convert vocal tract shape information on the input speech, and (ii) generates a synthetic sound using the converted vocal tract shape information and voicing source information on the input speech to convert the voice quality of the input speech.
    Type: Grant
    Filed: April 29, 2013
    Date of Patent: January 19, 2016
    Assignee: PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO., LTD.
    Inventors: Takahiro Kamai, Yoshifumi Hirose
  • Patent number: 9224406
    Abstract: Candidate frequencies per unit segment of an audio signal are identified. First processing section identifies an estimated train that is a time series of candidate frequencies, each selected for a different one of the segments, arranged over a plurality of the unit segments and that has a high likelihood of corresponding to a time series of fundamental frequencies of a target component. Second processing section identifies a state train of states, each indicative of one of sound-generating and non-sound-generating states of the target component in a different one of the segments, arranged over the unit segments. Frequency information which designates, as a fundamental frequency of the target component, a candidate frequency corresponding to the unit segment in the estimated train is generated for each unit segment corresponding to the sound-generating state. Frequency information indicative of no sound generation is generated for each unit segment corresponding to the non-sound-generating state.
    Type: Grant
    Filed: October 28, 2011
    Date of Patent: December 29, 2015
    Assignee: Yamaha Corporation
    Inventors: Jordi Bonada, Jordi Janer, Ricard Marxer, Yasuyuki Umeyama, Kazunobu Kondo, Francisco Garcia
  • Patent number: 9058816
    Abstract: Mental state of a person is classified in an automated manner by analysing natural speech of the person. A glottal waveform is extracted from a natural speech signal. Pre-determined parameters defining at least one diagnostic class of a class model are retrieved, the parameters determined from selected training glottal waveform features. The selected glottal waveform features are extracted from the signal. Current mental state of the person is classified by comparing extracted glottal waveform features with the parameters and class model. Feature extraction from a glottal waveform or other natural speech signal may involve determining spectral amplitudes of the signal, setting spectral amplitudes below a pre-defined threshold to zero and, for each of a plurality of sub bands, determining an area under the thresholded spectral amplitudes, and deriving signal feature parameters from the determined areas in accordance with a diagnostic class model.
    Type: Grant
    Filed: August 23, 2010
    Date of Patent: June 16, 2015
    Assignee: RMIT University
    Inventors: Margaret Lech, Nicholas Brian Allen, Ian Shaw Burnett, Ling He
  • Patent number: 9002703
    Abstract: The community-based generation of audio narrations for a text-based work leverages collaboration of a community of people to provide human-voiced audio readings. During the community-based generation, a collection of audio recordings for the text-based work may be collected from multiple human readers in a community. An audio recording for each section in the text-based work may be selected from the collection of audio recordings. The selected audio recordings may be then combined to produce an audio reading of at least a portion of the text-based work.
    Type: Grant
    Filed: September 28, 2011
    Date of Patent: April 7, 2015
    Assignee: Amazon Technologies, Inc.
    Inventor: Jay A. Crosley
  • Patent number: 8977555
    Abstract: Features are disclosed for generating markers for elements or other portions of an audio presentation so that a speech processing system may determine which portion of the audio presentation a user utterance refers to. For example, an utterance may include a pronoun with no explicit antecedent. The marker may be used to associate the utterance with the corresponding content portion for processing. The markers can be provided to a client device with a text-to-speech (“TTS”) presentation. The markers may then be provided to a speech processing system along with a user utterance captured by the client device. The speech processing system, which may include automatic speech recognition (“ASR”) modules and/or natural language understanding (“NLU”) modules, can generate hints based on the marker. The hints can be provided to the ASR and/or NLU modules in order to aid in processing the meaning or intent of a user utterance.
    Type: Grant
    Filed: December 20, 2012
    Date of Patent: March 10, 2015
    Assignee: Amazon Technologies, Inc.
    Inventors: Fred Torok, Frédéric Johan Georges Deramat, Vikram Kumar Gundeti
  • Patent number: 8977552
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: May 28, 2014
    Date of Patent: March 10, 2015
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 8949129
    Abstract: A method and apparatus are provided for processing a set of communicated signals associated with a set of muscles, such as the muscles near the larynx of the person, or any other muscles the person use to achieve a desired response. The method includes the steps of attaching a single integrated sensor, for example, near the throat of the person proximate to the larynx and detecting an electrical signal through the sensor. The method further includes the steps of extracting features from the detected electrical signal and continuously transforming them into speech sounds without the need for further modulation. The method also includes comparing the extracted features to a set of prototype features and selecting a prototype feature of the set of prototype features providing a smallest relative difference.
    Type: Grant
    Filed: August 12, 2013
    Date of Patent: February 3, 2015
    Assignee: Ambient Corporation
    Inventors: Michael Callahan, Thomas Coleman
  • Patent number: 8942983
    Abstract: The present invention relates to a method of text-based speech synthesis, wherein at least one portion of a text is specified; the intonation of each portion is determined; target speech sounds are associated with each portion; physical parameters of the target speech sounds are determined; speech sounds most similar in terms of the physical parameters to the target speech sounds are found in a speech database; and speech is synthesized as a sequence of the found speech sounds. The physical parameters of said target speech sounds are determined in accordance with the determined intonation. The present method, when used in a speech synthesizer, allows improved quality of synthesized speech due to precise reproduction of intonation.
    Type: Grant
    Filed: November 23, 2011
    Date of Patent: January 27, 2015
    Assignee: Speech Technology Centre, Limited
    Inventor: Mikhail Vasilievich Khitrov
  • Publication number: 20140379350
    Abstract: Disclosed herein are systems, methods, and computer readable-media for providing an automatic synthetically generated voice describing media content, the method comprising receiving one or more pieces of metadata for a primary media content, selecting at least one piece of metadata for output, and outputting the at least one piece of metadata as synthetically generated speech with the primary media content. Other aspects of the invention involve alternative output, output speech simultaneously with the primary media content, output speech during gaps in the primary media content, translate metadata in foreign language, tailor voice, accent, and language to match the metadata and/or primary media content. A user may control output via a user interface or output may be customized based on preferences in a user profile.
    Type: Application
    Filed: September 9, 2014
    Publication date: December 25, 2014
    Inventors: Linda ROBERTS, Hong Thi NGUYEN, Horst J. SCHROETER
  • Publication number: 20140365068
    Abstract: The present invention is directed to a system and method for personalizing a voice user interface on an electronic device. Voice recordings are made into an electronic device or a computerized system using a software installed onto the device, where a user is prompted to record various dialogues and commands. The recording is then converted into voice data packages, and uploaded onto the electronic device. In this way, users can replace the computerized or preloaded voice in a voice user interface of an electronic device with their own voice or a voice of others. In one embodiment, the electronic device comprises a mobile phone or a tablet computer. In other embodiments, the electronic device comprises a vehicle communication system and a navigation device. The system and method of the present invention enables the user to personalize the voice user interface for each electronic device operated by the user.
    Type: Application
    Filed: June 5, 2014
    Publication date: December 11, 2014
    Inventors: Melvin Burns, Wanda L. Burns
  • Patent number: 8898055
    Abstract: A voice quality conversion device including: a target vowel vocal tract information hold unit holding target vowel vocal tract information of each vowel indicating target voice quality; a vowel conversion unit (i) receiving vocal tract information with phoneme boundary information of the speech including information of phonemes and phoneme durations, (ii) approximating a temporal change of vocal tract information of a vowel in the vocal tract information with phoneme boundary information applying a first function, (iii) approximating a temporal change of vocal tract information of the same vowel held in the target vowel vocal tract information hold unit applying a second function, (iv) calculating a third function by combining the first function with the second function, and (v) converting the vocal tract information of the vowel applying the third function; and a synthesis unit synthesizing a speech using the converted information.
    Type: Grant
    Filed: May 8, 2008
    Date of Patent: November 25, 2014
    Assignee: Panasonic Intellectual Property Corporation of America
    Inventors: Yoshifumi Hirose, Takahiro Kamai, Yumiko Kato
  • Patent number: 8892442
    Abstract: Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance.
    Type: Grant
    Filed: February 17, 2014
    Date of Patent: November 18, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Horst J. Schroeter
  • Patent number: 8868431
    Abstract: A recognition dictionary creation device identifies the language of a reading of an inputted text which is a target to be registered and adds a reading with phonemes in the language identified thereby to the target text to be registered, and also converts the reading of the target text to be registered from the phonemes in the language identified thereby to phonemes in a language to be recognized which is handled in voice recognition to create a recognition dictionary in which the converted reading of the target text to be registered is registered.
    Type: Grant
    Filed: February 5, 2010
    Date of Patent: October 21, 2014
    Assignee: Mitsubishi Electric Corporation
    Inventors: Michihiro Yamazaki, Jun Ishii, Yasushi Ishikawa
  • Patent number: 8862472
    Abstract: The present invention is related to a method for coding excitation signal of a target speech comprising the steps of: extracting from a set of training normalized residual frames, a set of relevant normalized residual frames, said training residual frames being extracted from a training speech, synchronized on Glottal Closure Instant(GCI), pitch and energy normalized; determining the target excitation signal of the target speech; dividing said target excitation signal into GCI synchronized target frames; determining the local pitch and energy of the GCI synchronized target frames; normalizing the GCI synchronized target frames in both energy and pitch, to obtain target normalized residual frames; determining coefficients of linear combination of said extracted set of relevant normalized residual frames to build synthetic normalized residual frames close to each target normalized residual frames; wherein the coding parameters for each target residual frames comprise the determined coefficients.
    Type: Grant
    Filed: March 30, 2010
    Date of Patent: October 14, 2014
    Assignees: Universite de Mons, Acapela Group S.A.
    Inventors: Geoffrey Wilfart, Thomas Drugman, Thierry Dutoit
  • Patent number: 8856008
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: September 18, 2013
    Date of Patent: October 7, 2014
    Assignee: Morphism LLC
    Inventor: James H. Stephens, Jr.
  • Publication number: 20140278433
    Abstract: A voice synthesis device includes a sequence data generation unit configured to generate sequence data including a plurality of kinds of parameters for controlling vocalization of a voice to be synthesized based on music information and lyrics information, an output unit configured to output a singing voice based on the sequence data, and a processing content information acquisition unit configured to acquire a plurality of processing content information, associated with each of pieces of preset singing manner information. Each of the content information indicates contents of edit processing for all or part of the parameters. The sequence data generation unit generates a plurality of pieces of sequence data, and the sequence data are obtained by editing the all or part of the parameters included in the sequence data, based on the content information associated with one of the pieces of singing manner information specified by a user.
    Type: Application
    Filed: March 5, 2014
    Publication date: September 18, 2014
    Applicant: Yamaha Corporation
    Inventor: Tatsuya IRIYAMA
  • Publication number: 20140278432
    Abstract: Various embodiments provide a method and apparatus for providing a silent speech solution which allows the user to speak over an electronic media such as a cell phone without making any noise. In particular, measuring the shape of the vocal tract allows creation of synthesized speech without requiring noise produced by the vocal chords.
    Type: Application
    Filed: March 14, 2013
    Publication date: September 18, 2014
    Inventor: Dale D. Harman
  • Publication number: 20140207463
    Abstract: An audio signal method of the present disclosure includes: inputting a plurality of variables including at least a first variable indicating an opening degree of a throat, which interiorly includes a vocal cord, with respect to a vocal cord model configured to output a second variable indicating an opening degree of the vocal cord according to reception of input of the plurality of variables, the first variable being greater than the second variable; and generating an audio signal in which a level of a non-integer harmonic sound is changed, by controlling the second variable.
    Type: Application
    Filed: January 17, 2014
    Publication date: July 24, 2014
    Applicant: PANASONIC CORPORATION
    Inventor: Masahiro NAKANISHI
  • Patent number: 8775176
    Abstract: A system, method and computer readable medium that provides an automated web transcription service is disclosed. The method may include receiving input speech from a user using a communications network, recognizing the received input speech, understanding the recognized speech, transcribing the understood speech to text, storing the transcribed text in a database, receiving a request via a web page to display the transcribed text, retrieving transcribed text from the database, and displaying the transcribed text to the requester using the web page.
    Type: Grant
    Filed: August 26, 2013
    Date of Patent: July 8, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Mazin Gilbert, Stephan Kanthak
  • Patent number: 8751239
    Abstract: An apparatus for providing text independent voice conversion may include a first voice conversion model and a second voice conversion model. The first voice conversion model may be trained with respect to conversion of training source speech to synthetic speech corresponding to the training source speech. The second voice conversion model may be trained with respect to conversion to training target speech from synthetic speech corresponding to the training target speech. An output of the first voice conversion model may be communicated to the second voice conversion model to process source speech input into the first voice conversion model into target speech corresponding to the source speech as the output of the second voice conversion model.
    Type: Grant
    Filed: October 4, 2007
    Date of Patent: June 10, 2014
    Assignee: Core Wireless Licensing, S.a.r.l.
    Inventors: Jilei Tian, Victor Popa, Jani K. Nurminen
  • Patent number: 8744851
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: August 13, 2013
    Date of Patent: June 3, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair Conkie, Ann K Syrdal
  • Patent number: 8719030
    Abstract: The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary acoustic waves, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition.
    Type: Grant
    Filed: December 3, 2012
    Date of Patent: May 6, 2014
    Inventor: Chengjun Julian Chen
  • Patent number: 8706488
    Abstract: In one aspect, a method of processing a voice signal to extract information to facilitate training a speech synthesis model is provided. The method comprises acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison. In another aspect, the method is performed by executing a program encoded on a computer readable medium. In another aspect, a speech synthesis model is provided by, at least in part, performing the method.
    Type: Grant
    Filed: February 27, 2013
    Date of Patent: April 22, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Michael D. Edgington, Laurence Gillick, Jordan R. Cohen
  • Patent number: 8706489
    Abstract: A system and method for selecting audio contents by using the speech recognition to obtain a textual phrase from a series of audio contents are provided. The system includes an output module outputting the audio contents, an input module receiving a speech input from a user, a buffer temporarily storing the audio contents within a desired period and the speech input, and a recognizing module performing a speech recognition between the audio contents within the desired period and the speech input to generate an audio phrase and the corresponding textual phrase matching with the speech input.
    Type: Grant
    Filed: August 8, 2006
    Date of Patent: April 22, 2014
    Assignee: Delta Electronics Inc.
    Inventors: Jia-lin Shen, Chien-Chou Hung
  • Publication number: 20140108015
    Abstract: A voice converting apparatus and a voice converting method are provided. The method of converting a voice using a voice converting apparatus including receiving a voice from a counterpart, analyzing the voice and determining whether the voice abnormal, converting the voice into a normal voice by adjusting a harmonic signal of the voice in response to determining that the voice is abnormal, and transmitting the normal voice.
    Type: Application
    Filed: October 11, 2013
    Publication date: April 17, 2014
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jong-youb RYU, Yoon-jae LEE, Seoung-hun KIM, Young-tae KIM
  • Patent number: 8694319
    Abstract: Methods, systems, and products are disclosed for dynamic prosody adjustment for voice-rendering synthesized data that include retrieving synthesized data to be voice-rendered; identifying, for the synthesized data to be voice-rendered, a particular prosody setting; determining, in dependence upon the synthesized data to be voice-rendered and the context information for the context in which the synthesized data is to be voice-rendered, a section of the synthesized data to be rendered; and rendering the section of the synthesized data in dependence upon the identified particular prosody setting.
    Type: Grant
    Filed: November 3, 2005
    Date of Patent: April 8, 2014
    Assignee: International Business Machines Corporation
    Inventors: William K. Bodin, David Jaramillo, Jerry W. Redman, Derral C. Thorson
  • Patent number: 8655662
    Abstract: Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance.
    Type: Grant
    Filed: November 29, 2012
    Date of Patent: February 18, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Horst Schroeter
  • Patent number: 8650035
    Abstract: A speech conversion system facilitates voice communications. A database comprises a plurality of conversion heuristics, at least some of the conversion heuristics being associated with identification information for at least one first party. At least one speech converter is configured to convert a first speech signal received from the at least one first party into a converted first speech signal different than the first speech signal.
    Type: Grant
    Filed: November 18, 2005
    Date of Patent: February 11, 2014
    Assignee: Verizon Laboratories Inc.
    Inventor: Adrian E. Conway
  • Patent number: 8639511
    Abstract: A robot may include a driving control unit configured to control a driving of a movable unit that is connected movably to a body unit, a voice generating unit configured to generate a voice, and a voice output unit configured to output the voice, which has been generated by the voice generating unit. The voice generating unit may correct the voice, which is generated, based on a bearing of the movable unit, which is controlled by the driving control unit, to the body unit.
    Type: Grant
    Filed: September 14, 2010
    Date of Patent: January 28, 2014
    Assignee: Honda Motor Co., Ltd.
    Inventors: Kazuhiro Nakadai, Takuma Otsuka, Hiroshi Okuno
  • Publication number: 20140012584
    Abstract: There is provided a prosody generator that generates prosody information for implementing highly natural speech synthesis without unnecessarily collecting large quantities of learning data. A data dividing means 81 divides into subspaces the data space of a learning database as an assembly of learning data indicative of the feature quantities of speech waveforms. A density information extracting means 82 extracts density information indicative of the density state in terms of information quantity of the learning data in each of the subspaces divided by the data dividing means 81. A prosody information generating method selecting means 83 selects either a first method or a second method as a prosody information generating method based on the density information, the first method involving generating the prosody information using a statistical technique, the second method involving generating the prosody information using rules based on heuristics.
    Type: Application
    Filed: May 10, 2012
    Publication date: January 9, 2014
    Applicant: NEC Corporation
    Inventors: Yasuyuki Mitsui, Reishi Kondo, Masanori Kato
  • Patent number: 8612228
    Abstract: A section corresponding to a given duration is sampled from sound data that indicates the voice of a player collected by a microphone, and a vocal tract cross-sectional area function of the sampled section is calculated. The vertical dimension of the mouth is calculated from a throat-side average cross-sectional area of the vocal tract cross-sectional area function, and the area of the mouth is calculated from a mouth-side average cross-sectional area. The transverse dimension of the mouth is calculated from the area of the mouth and the vertical dimension of the mouth.
    Type: Grant
    Filed: March 26, 2010
    Date of Patent: December 17, 2013
    Assignee: Namco Bandai Games Inc.
    Inventor: Hiroyuki Hiraishi
  • Patent number: 8565458
    Abstract: A media player includes a processor configured to receive media content from a content source and to process the media content produce an audio signal. The media player further includes a transmitter coupled to the processor and configured to transmit the audio signal to a hearing aid through a communication channel.
    Type: Grant
    Filed: March 3, 2011
    Date of Patent: October 22, 2013
    Assignee: Audiotoniq, Inc.
    Inventors: Andrew L. Eisenberg, David Matthew Landry
  • Patent number: 8560315
    Abstract: A conference support device includes an image receiving portion that receives captured images from conference terminals, a voice receiving portion that receives, from one of the conference terminals, a voice that is generated by a first participant, a first storage portion that stores the captured images and the voice, a voice recognition portion that recognizes the voice, a text data creation portion that creates text data that express the words that are included in the voice, an addressee specification portion that specifies a second participant, whom the voice is addressing, an image creation portion that creates a display image that is configured from the captured images and in which the text data are associated with the first participant and a specified image is associated with at least one of the first participant and the second participant, and a transmission portion that transmits the display image to the conference terminals.
    Type: Grant
    Filed: March 12, 2010
    Date of Patent: October 15, 2013
    Assignee: Brother Kogyo Kabushiki Kaisha
    Inventor: Mizuho Yasoshima
  • Patent number: 8554541
    Abstract: A virtual pet system includes: a virtual pet client, adapted to receive a sentence in natural language and send the sentence to a Q&A server; the Q&A server, adapted to receive the sentence, process the sentence through natural language comprehension, generate an answer in natural language based on a result of natural language comprehension and reasoning knowledge, and send the answer in natural language to the virtual pet client. A method for virtual pet chatting includes: receiving a sentence in natural language, perform natural language comprehension for the sentence, and generating an answer in natural language based on a result of natural language comprehension and reasoning knowledge.
    Type: Grant
    Filed: September 18, 2008
    Date of Patent: October 8, 2013
    Assignee: Tencent Technology (Shenzhen) Company Ltd.
    Inventors: Haisong Yang, Zhiyuan Liu, Yunfeng Liu, Rongling Yu
  • Patent number: 8554566
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: November 29, 2012
    Date of Patent: October 8, 2013
    Assignee: Morphism LLC
    Inventor: James H. Stephens, Jr.
  • Patent number: 8529473
    Abstract: A method and apparatus are provided for processing a set of communicated signals associated with a set of muscles, such as the muscles near the larynx of the person, or any other muscles the person use to achieve a desired response. The method includes the steps of attaching a single integrated sensor, for example, near the throat of the person proximate to the larynx and detecting an electrical signal through the sensor. The method further includes the steps of extracting features from the detected electrical signal and continuously transforming them into speech sounds without the need for further modulation. The method also includes comparing the extracted features to a set of prototype features and selecting a prototype feature of the set of prototype features providing a smallest relative difference.
    Type: Grant
    Filed: July 27, 2012
    Date of Patent: September 10, 2013
    Inventors: Michael Callahan, Thomas Coleman
  • Patent number: 8532995
    Abstract: A method, system and machine-readable medium are provided. Speech input is received at a speech recognition component and recognized output is produced. A common dialog cue from the received speech input or input from a second source is recognized. An action is performed corresponding to the recognized common dialog cue. The performed action includes sending a communication from the speech recognition component to the speech generation component while bypassing a dialog component.
    Type: Grant
    Filed: May 21, 2012
    Date of Patent: September 10, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Vincent J. Goffin, Sarangarajan Parthasarathy
  • Patent number: 8521510
    Abstract: A system, method and computer readable medium that provides an automated web transcription service is disclosed. The method may include receiving input speech from a user using a communications network, recognizing the received input speech, understanding the recognized speech, transcribing the understood speech to text, storing the transcribed text in a database, receiving a request via a web page to display the transcribed text, retrieving transcribed text from the database, and displaying the transcribed text to the requester using the web page.
    Type: Grant
    Filed: August 31, 2006
    Date of Patent: August 27, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Mazin Gilbert, Stephan Kanthak
  • Patent number: 8510113
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: August 31, 2006
    Date of Patent: August 13, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair Conkie, Ann K. Syrdal
  • Patent number: 8510112
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, modifying the identified segments in the primary speech database using selected mappings, enhancing the primary speech database by substituting the modified segments for the corresponding identified database segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: August 31, 2006
    Date of Patent: August 13, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair Conkie, Ann Syrdal