Synchronization Of Speech With Image Or Synthesis Of The Lips Movement From Speech, E.g., For "talking Heads," Etc.(epo) Patents (Class 704/E21.02)
  • Patent number: 8965011
    Abstract: A method of attenuating an input signal to obtain an output signal is described. The method comprises receiving the input signal, attenuating the input signal with a gain factor to obtain the output signal, applying a filter having a frequency response with a frequency-dependent filter gain to at least one of a copy of the input signal and a copy of the output signal to obtain a filtered signal, the frequency-dependent filter gain being arranged to emphasize frequencies within a number N of predetermined frequency ranges, N>1; wherein the filter comprises a sequence of N sub-filters, each one of the N sub-filters having a frequency response adapted to emphasize frequencies within a corresponding one of the N predetermined frequency ranges; determining a signal strength of the filtered signal, and determining the gain factor from at least the signal strength.
    Type: Grant
    Filed: December 20, 2011
    Date of Patent: February 24, 2015
    Assignee: Dialog Semiconductor B.V.
    Inventor: Michiel Andre Helsloot
  • Patent number: 8818131
    Abstract: Three dimensional models corresponding to a target image and a reference image are selected based on a set of feature points defining facial features in the target image and the reference image. The set of feature points defining the facial features in the target image and the reference image are associated with corresponding 3-dimensional models. A 3D motion flow between the 3-dimensional models is computed. The 3D motion flow is projected onto a 2D image plane to create a 2D optical field flow. The target image and the reference image are warped using the 2D optical field flow. A selected feature from the reference image is copied to the target image.
    Type: Grant
    Filed: November 24, 2010
    Date of Patent: August 26, 2014
    Assignee: Adobe Systems Incorporated
    Inventors: Jue Wang, Elya Shechtman, Lubomir D. Bourdev, Fei Yang
  • Patent number: 8438035
    Abstract: When there are missing voice-transmission-signals, a repetition-section calculating unit sets a plurality of repetition sections of different lengths that are determined to be similar to the voice-transmission-signals preceding the missing voice-transmission-signal, the repetition sections being determined with respect to stationary voice-transmission-signals stored in a normal signal storage unit, the stationary voice-transmission-signals being selected from the previously input voice-transmission-signals. A controller generates a concealment signal using the repetition sections.
    Type: Grant
    Filed: December 31, 2007
    Date of Patent: May 7, 2013
    Assignee: Fujitsu Limited
    Inventors: Kaori Endo, Yasuji Ota, Chikako Matsumoto
  • Publication number: 20120284029
    Abstract: Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which a synthesized image sequence will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with lip movements synchronized with the desired speech.
    Type: Application
    Filed: May 2, 2011
    Publication date: November 8, 2012
    Inventors: Lijuan Wang, Frank Soong
  • Publication number: 20120203555
    Abstract: An electronic device configured for encoding a watermarked signal is described. The electronic device includes modeler circuitry. The modeler circuitry determines parameters based on a first signal and a first-pass coded signal. The electronic device also includes coder circuitry coupled to the modeler circuitry. The coder circuitry performs a first-pass coding on a second signal to obtain the first-pass coded signal and performs a second-pass coding based on the parameters to obtain a watermarked signal.
    Type: Application
    Filed: October 18, 2011
    Publication date: August 9, 2012
    Applicant: QUALCOMM Incorporated
    Inventors: Stephane Pierre Villette, Daniel J. Sinder
  • Publication number: 20120203556
    Abstract: A method for decoding a signal on an electronic device is described. The method includes receiving a signal. The method also includes extracting a bitstream from the signal. The method further includes performing watermark error checking on the bitstream for multiple frames. The method additionally includes determining whether watermark data is detected based on the watermark error checking. The method also includes decoding the bitstream to obtain a decoded second signal if the watermark data is not detected.
    Type: Application
    Filed: October 18, 2011
    Publication date: August 9, 2012
    Applicant: QUALCOMM Incorporated
    Inventors: Stephane Pierre Villette, Daniel J. Sinder
  • Publication number: 20120130720
    Abstract: An information providing device takes an image of a predetermined area and obtains the taken image in the form of image data, while externally obtaining voice data representing speech. The information providing device obtains text in a preset language corresponding to the speech in the form of text data, based on the obtained voice data, generates a composite image including the taken image and the text in the form of composite image data, based on the image data and the text data, and outputs the composite image data.
    Type: Application
    Filed: November 14, 2011
    Publication date: May 24, 2012
    Inventor: Yasushi Suda
  • Publication number: 20100082345
    Abstract: An “Animation Synthesizer” uses trainable probabilistic models, such as Hidden Markov Models (HMM), Artificial Neural Networks (ANN), etc., to provide speech and text driven body animation synthesis. Probabilistic models are trained using synchronized motion and speech inputs (e.g., live or recorded audio/video feeds) at various speech levels, such as sentences, phrases, words, phonemes, sub-phonemes, etc., depending upon the available data, and the motion type or body part being modeled. The Animation Synthesizer then uses the trainable probabilistic model for selecting animation trajectories for one or more different body parts (e.g., face, head, hands, arms, etc.) based on an arbitrary text and/or speech input. These animation trajectories are then used to synthesize a sequence of animations for digital avatars, cartoon characters, computer generated anthropomorphic persons or creatures, actual motions for physical robots, etc.
    Type: Application
    Filed: September 26, 2008
    Publication date: April 1, 2010
    Inventors: Lijuan Wang, Lei Ma, Frank Kao-Ping Soong
  • Publication number: 20090138270
    Abstract: The provision of speech therapy to a learner (76) entails receiving a speech signal (156) from the learner (76) at a computing system (24). The speech signal (156) corresponds to an utterance (116) made by the learner (76). A set of parameters (166) is ascertained from the speech signal (156). The parameters (166) represent a contact pattern (52) between a tongue and palate of the learner (156) during the utterance (116). For each parameter in the set of parameters (166), a deviation measure (188) is calculated relative to a corresponding parameter from a set of normative parameters (138) characterizing an ideal pronunciation of the utterance (116). An accuracy score (56) for the utterance (116), relative to its ideal pronunciation, is generated from the deviation measure (188). The accuracy score (56) is provided to the learner (76) to visualize accuracy of the utterance (116) relative to its ideal pronunciation.
    Type: Application
    Filed: November 26, 2007
    Publication date: May 28, 2009
    Inventors: Samuel G. Fletcher, Dah-Jye Lee, Jared Darrell Turpin
  • Publication number: 20080221904
    Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data. second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.
    Type: Application
    Filed: May 19, 2008
    Publication date: September 11, 2008
    Applicant: AT&T Corp.
    Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
  • Publication number: 20080177530
    Abstract: Exemplary methods, systems, and products are disclosed for synchronizing visual and speech events in a multimodal application, including receiving from a user speech; determining a semantic interpretation of the speech; calling a global application update handler; identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation; and executing the additional function. Typical embodiments may include updating a visual element after executing the additional function. Typical embodiments may include updating a voice form after executing the additional function. Typical embodiments also may include updating a state table after updating the voice form. Typical embodiments also may include restarting the voice form after executing the additional function.
    Type: Application
    Filed: April 3, 2008
    Publication date: July 24, 2008
    Inventors: Charles W. Cross, Michael C. Hollinger, Igor R. Jablokov, Benjamin D. Lewis, Hillary A. Pike, Daniel M. Smith, David W. Wintermute, Michael A. Zaitzeff
  • Publication number: 20080154605
    Abstract: The present invention discloses a solution that dynamically adapts quality settings of a real-time speech synthesis system based upon load, which results in a proportional change in consumed resources. For example, when quantity of available CPU cycles is low, a quality of speech can be automatically lowered. When a quantity of available CPU cycles is high, a quality of speech can be automatically increased. Accordingly, the solution discloses an adaptive speech synthesis system that provides a highest possible quality of speech in a real-time environment experiencing rapid changes in request volume and/or complexity.
    Type: Application
    Filed: December 21, 2006
    Publication date: June 26, 2008
    Inventor: KENNETH H. MORGAN