Patents by Inventor Hans Peter Graf

Hans Peter Graf has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System for low-latency animation of talking heads

Patent number: 7260539

Abstract: Methods and apparatus for rendering a talking head on a client device are disclosed. The client device has a client cache capable of storing audio/visual data associated with rendering the talking head. The method comprises storing sentences in a client cache of a client device that relate to bridging delays in a dialog, storing sentence templates to be used in dialogs, generating a talking head response to a user inquiry from the client device, and determining whether sentences or stored templates stored in the client cache relate to the talking head response. If the stored sentences or stored templates relate to the talking head response, the method comprises instructing the client device to use the appropriate stored sentence or template from the client cache to render at least a part of the talking head response and transmitting a portion of the talking head response not stored in the client cache, if any, to the client device to render a complete talking head response.

Type: Grant

Filed: April 25, 2003

Date of Patent: August 21, 2007

Assignee: AT&T Corp.

Inventors: Eric Cosatto, Hans Peter Graf, Joern Ostermann
Digitally-generated lighting for video conferencing applications

Patent number: 7231099

Abstract: A method of improving the lighting conditions of a real scene or video sequence. Digitally generated light is added to a scene for video conferencing over telecommunication networks. A virtual illumination equation takes into account light attenuation, lambertian and specular reflection. An image of an object is captured, a virtual light source illuminates the object within the image. In addition, the object can be the head of the user. The position of the head of the user is dynamically tracked so that an three-dimensional model is generated which is representative of the head of the user. Synthetic light is applied to a position on the model to form an illuminated model.

Type: Grant

Filed: August 31, 2005

Date of Patent: June 12, 2007

Assignee: AT&T

Inventors: Andrea Basso, Eric Cosatto, David Crawford Gibbon, Hans Peter Graf, Shan Liu
System and method for triphone-based unit selection for visual speech synthesis

Patent number: 7209882

Abstract: A system and method for generating a video sequence having mouth movements synchronized with speech sounds are disclosed. The system utilizes a database of n-phones as the smallest selectable unit, wherein n is larger than 1 and preferably 3. The system calculates a target cost for each candidate n-phone for a target frame using a phonetic distance, coarticulation parameter, and speech rate. For each n-phone in a target sequence, the system searches for candidate n-phones that are visually similar according to the target cost. The system samples each candidate n-phone to get a same number of frames as in the target sequence and builds a video frame lattice of candidate video frames. The system assigns a joint cost to each pair of adjacent frames and searches the video frame lattice to construct the video sequence by finding the optimal path through the lattice according to the minimum of the sum of the target cost and the joint cost over the sequence.

Type: Grant

Filed: May 10, 2002

Date of Patent: April 24, 2007

Assignee: AT&T Corp.

Inventors: Eric Cosatto, Hans Peter Graf, Fu Jie Huang
Method for sending multi-media messages using customizable background images

Patent number: 7177811

Abstract: A method is provided for customizing a multi-media message created by a sender for a recipient, in which the multi-media message includes an animated entity audibly presenting speech converted from text by the sender. At least one image is received from the sender. Each of the at least one image is associated with a tag. The sender is presented with options to insert the tag associated with one of the at least one image into the sender text.

Type: Grant

Filed: March 6, 2006

Date of Patent: February 13, 2007

Assignee: AT&T Corp.

Inventors: Joern Ostermann, Barbara Buda, Mehmet Reha Civanlar, Eric Cosatto, Hans Peter Graf, Thomas M. Isaacson, Yann Andre LeCun
System and method of providing conversational visual prosody for talking heads

Patent number: 7136818

Abstract: A system and method of controlling the movement of a virtual agent while the agent is speaking to a human user during a conversation is disclosed. The method comprises receiving speech data to be spoken by the virtual agent, performing a prosodic analysis of the speech data, selecting matching prosody patterns from a speaking database and controlling the virtual agent movement according to the selected prosody patterns.

Type: Grant

Filed: June 17, 2002

Date of Patent: November 14, 2006

Assignee: AT&T Corp.

Inventors: Eric Cosatto, Hans Peter Graf, Volker Franz Strom
Coarticulation method for audio-visual text-to-speech synthesis

Patent number: 7117155

Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. Representative parameters are extracted from the image samples and stored in an animation library. The processor also samples a plurality of multiphones comprising images together with their associated sounds. The processor extracts parameters from these images comprising data characterizing mouth shapes, maps, rules, or equations, and stores the resulting parameters and sound information in a coarticulation library. The animated sequence begins with the processor considering an input phoneme sequence, recalling from the coarticulation library parameters associated with that sequence, and selecting appropriate image samples from the animation library based on that sequence. The image samples are concatenated together, and the corresponding sound is output, to form the animated synthesis.

Type: Grant

Filed: October 1, 2003

Date of Patent: October 3, 2006

Assignee: AT&T Corp.

Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
System and method of providing conversational visual prosody for talking heads

Patent number: 7076430

Abstract: A system and method of controlling the movement of a virtual agent while the agent is listening to a human user during a conversation is disclosed. The method comprises receiving speech data from the user, performing a prosodic analysis of the speech data and controlling the virtual agent movement according to the prosodic analysis.

Type: Grant

Filed: June 17, 2002

Date of Patent: July 11, 2006

Assignee: AT&T Corp.

Inventors: Eric Cosatto, Hans Peter Graf, Thomas M. Isaacson, Volker Franz Strom
Method for sending multi-media messages using customizable background images

Patent number: 7035803

Abstract: A system and method of providing sender customization of multi-media messages through the use of inserted images or video. The images or video may be sender-created or predefined and available to the sender via a web server. The method relates to customizing a multi-media message created by a sender for a recipient, the multi-media message having an animated entity audibly presenting speech converted from text created by the sender. The method comprises receiving at least one image from the sender, associating each at least one image with a tag, presenting the sender with options to insert the tag associated with one of the at least one image into the sender text, and after the sender inserts the tag associated with one of the at least one images into the sender text, delivering the multi-media message with the at least one image presented as background to the animated entity according to a position of the tag associated with the at least one image in the sender text.

Type: Grant

Filed: November 2, 2001

Date of Patent: April 25, 2006

Assignee: AT&T Corp.

Inventors: Joern Ostermann, Barbara Buda, Mehmet Reha Civanlar, Eric Cosatto, Hans Peter Graf, Thomas M. Isaacson, Yann Andre LeCun
Method for sending multi-media messages using emoticons

Patent number: 6990452

Abstract: A system and method of providing sender-customization of multi-media messages through the use of emoticons is disclosed. The sender inserts the emoticons into a text message. As an animated face audibly delivers the text, emoticons associated with the message are started a predetermined period of time or number of words prior to the position of the emoticon in the message text and completed a predetermined length of time or number of words following the location of the emoticon. The sender may insert emoticons through the use of emoticon buttons that are icons available for choosing. Upon sender selections of an emoticon, an icon representing the emoticon is inserted into the text at the position of the cursor. Once an emoticon is chosen, the sender may also choose the amplitude for the emoticon and increased or decreased amplitude will be displayed in the icon inserted into the message text.

Type: Grant

Filed: November 2, 2001

Date of Patent: January 24, 2006

Assignee: AT&T Corp.

Inventors: Joern Ostermann, Mehmet Reha Civanlar, Eric Cosatto, Hans Peter Graf, Yann Andre LeCun
Digitally-generated lighting for video conferencing applications

Patent number: 6980697

Abstract: A method of improving the lighting conditions of a real scene or video sequence. Digitally generated light is added to a scene for video conferencing over telecommunication networks. A virtual illumination equation takes into account light attenuation, lambertian and specular reflection. An image of an object is captured, a virtual light source illuminates the object within the image. In addition, the object can be the head of the user. The position of the head of the user is dynamically tracked so that an three-dimensional model is generated which is representative of the head of the user. Synthetic light is applied to a position on the model to form an illuminated model.

Type: Grant

Filed: January 25, 2002

Date of Patent: December 27, 2005

Assignee: AT&T Corp.

Inventors: Andrea Basso, Eric Cosatto, David Crawford Gibbon, Hans Peter Graf, Shan Liu
System and method of controlling sound in a multi-media communication application

Patent number: 6963839

Abstract: A method for customizing a voice in a multi-media message created by a sender for a recipient is disclosed. The multi-media message comprises a text message from the sender to be delivered by an animated entity. The method comprises presenting an option to the sender to insert voice emoticons into the text message associated with parameters of a voice used by the animated entity to deliver the text message. The message is then delivered wherein the voice of the animated entity is modified throughout the message according to the voice emoticons. The voice emoticons may relate to features such as voice stress, volume, pauses, emotion, yelling, or whispering. After the sender inserts various voice emoticons into the text of the message, the animated entity delivers the multi-media message giving effect to each voice emoticon in the text. A volume or intensity of the voice emoticons may be given effect by repeating the tags.

Type: Grant

Filed: November 2, 2001

Date of Patent: November 8, 2005

Assignee: AT&T Corp.

Inventors: Joern Ostermann, Mehmet Reha Civanlar, Hans Peter Graf, Thomas M. Isaacson
System for low-latency animation of talking heads

Publication number: 20040215460

Abstract: Methods and apparatus for rendering a talking head on a client device are disclosed. The client device has a client cache capable of storing audio/visual data associated with rendering the talking head. The method comprises storing sentences in a client cache of a client device that relate to bridging delays in a dialog, storing sentence templates to be used in dialogs, generating a talking head response to a user inquiry from the client device, and determining whether sentences or stored templates stored in the client cache relate to the talking head response. If the stored sentences or stored templates relate to the talking head response, the method comprises instructing the client device to use the appropriate stored sentence or template from the client cache to render at least a part of the talking head response and transmitting a portion of the talking head response not stored in the client cache, if any, to the client device to render a complete talking head response.

Type: Application

Filed: April 25, 2003

Publication date: October 28, 2004

Inventors: Eric Cosatto, Hans Peter Graf, Joern Ostermann
Coarticulation method for audio-visual text-to-speech synthesis

Publication number: 20040064321

Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. Representative parameters are extracted from the image samples and stored in an animation library. The processor also samples a plurality of multiphones comprising images together with their associated sounds. The processor extracts parameters from these images comprising data characterizing mouth shapes, maps, rules, or equations, and stores the resulting parameters and sound information in a coarticulation library. The animated sequence begins with the processor considering an input phoneme sequence, recalling from the coarticulation library parameters associated with that sequence, and selecting appropriate image samples from the animation library based on that sequence. The image samples are concatenated together, and the corresponding sound is output, to form the animated synthesis.

Type: Application

Filed: October 1, 2003

Publication date: April 1, 2004

Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
Coarticulation method for audio-visual text-to-speech synthesis

Patent number: 6662161

Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. Representative parameters are extracted from the image samples and stored in an animation library. The processor also samples a plurality of multiphones comprising images together with their associated sounds. The processor extracts parameters from these images comprising data characterizing mouth shapes, maps, rules, or equations, and stores the resulting parameters and sound information in a coarticulation library. The animated sequence begins with the processor considering an input phoneme sequence, recalling from the coarticulation library parameters associated with that sequence, and selecting appropriate image samples from the animation library based on that sequence. The image samples are concatenated together, and the corresponding sound is output, to form the animated synthesis.

Type: Grant

Filed: September 7, 1999

Date of Patent: December 9, 2003

Assignee: AT&T Corp.

Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
Audio-visual selection process for the synthesis of photo-realistic talking-head animations

Patent number: 6654018

Abstract: A system and method for generating photo-realistic talking-head animation from a text input utilizes an audio-visual unit selection process. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mouth area. The unit selection process utilizes the acoustic data to determine the target costs for the candidate images and utilizes the visual data to determine the concatenation costs. The image database is prepared in a hierarchical fashion, including high-level features (such as a full 3D modeling of the head, geometric size and position of elements) and pixel-based, low-level features (such as a PCA-based metric for labeling the various feature bitmaps).

Type: Grant

Filed: March 29, 2001

Date of Patent: November 25, 2003

Assignee: AT&T Corp.

Inventors: Eric Cosatto, Hans Peter Graf, Gerasimos Potamianos, Juergen Schroeter
Method of modeling objects to synthesize three-dimensional, photo-realistic animations

Patent number: 6504546

Abstract: A method for modeling three-dimensional objects to create photo-realistic animations using a data-driven approach. The three-dimensional object is defined by a set of separate three-dimensional planes, each plane enclosing an area of the object that undergoes visual changes during animation. Recorded video is used to create bitmap data to populate a database for each three-dimensional plane. The video is analyzed in terms of both rigid movements (changes in pose) and plastic deformation (changes in expression) to create the bitmaps. The modeling is particularly well-suited for animations of a human face, where an audio track generated by a text-to-speech synthesizer can be added to the animation to create a photo-realistic “talking head”.

Type: Grant

Filed: February 8, 2000

Date of Patent: January 7, 2003

Assignee: AT&T Corp.

Inventors: Eric Cosatto, Hans Peter Graf
Robust multi-modal method for recognizing objects

Patent number: 6118887

Abstract: A method for tracking heads and faces is disclosed wherein a variety of different representation models can be used to define individual heads and facial features in a multi-channel capable tracking algorithm. The representation models generated by the channels during a sequence of frames are ultimately combined into a representation comprising a highly robust and accurate tracked output. In a preferred embodiment, the method conducts an initial overview procedure to establish the optimal tracking strategy to be used in light of the particular characteristics of the tracking application.

Type: Grant

Filed: October 10, 1997

Date of Patent: September 12, 2000

Assignee: AT&T Corp.

Inventors: Eric Cosatto, Hans Peter Graf, Gerasimos Potamianos
Coarticulation method for audio-visual text-to-speech synthesis

Patent number: 6112177

Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. Representative parameters are extracted from the image samples and stored in an animation library. The processor also samples a plurality of multiphones comprising images together with their associated sounds. The processor extracts parameters from these images comprising data characterizing mouth shapes, maps, rules, or equations, and stores the resulting parameters and sound information in a coarticulation library. The animated sequence begins with the processor considering an input phoneme sequence, recalling from the coarticulation library parameters associated with that sequence, and selecting appropriate image samples from the animation library based on that sequence. The image samples are concatenated together, and the corresponding sound is output, to form the animated synthesis.

Type: Grant

Filed: November 7, 1997

Date of Patent: August 29, 2000

Assignee: AT&T Corp.

Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
Face feature analysis for automatic lipreading and character animation

Patent number: 6028960

Abstract: A face feature analysis which begins by generating multiple face feature candidates, e.g., eyes and nose positions, using an isolated frame face analysis. Then, a nostril tracking window is defined around a nose candidate and tests are applied to the pixels therein based on percentages of skin color area pixels and nostril area pixels to determine whether the nose candidate represents an actual nose. Once actual nostrils are identified, size, separation and contiguity of the actual nostrils is determined by projecting the nostril pixels within the nostril tracking window. A mouth window is defined around the mouth region and mouth detail analysis is then applied to the pixels within the mouth window to identify inner mouth and teeth pixels and therefrom generate an inner mouth contour. The nostril position and inner mouth contour are used to generate a synthetic model head.

Type: Grant

Filed: September 20, 1996

Date of Patent: February 22, 2000

Assignee: Lucent Technologies Inc.

Inventors: Hans Peter Graf, Eric David Petajan
Method for generating photo-realistic animated characters

Patent number: 5995119

Abstract: A method for generating photo-realistic characters wherein one or more pictures of an individual are decomposed into a plurality of parameterized facial parts. The facial parts are stored in memory. To create animated frames, the individual facial parts are recalled from memory in a defined manner and overlaid onto a base face to form a whole face, which, in turn, may be overlaid onto a background image to form an animated frame.

Type: Grant

Filed: June 6, 1997

Date of Patent: November 30, 1999

Assignee: AT&T Corp.

Inventors: Eric Cosatto, Hans Peter Graf

prev … 3 4 5 6 7 8 next