Synchronization Of Speech With Image Or Synthesis Of The Lips Movement From Speech, E.g., For "talking Heads," Etc.(epo) Patents (Class 704/E21.02)

Wearable speech input-based to moving lips display overlay

Patent number: 12243552

Abstract: Eyewear having a speech to moving lips algorithm that receives and translates speech and utterances of a person viewed through the eyewear, and then displays an overlay of moving lips corresponding to the speech and utterances on a mask of the viewed person. A database having text to moving lips information is utilized to translate the speech and generate the moving lips in near-real time with little latency. This translation provides the deaf/hearing impaired users the ability to understand and communicate with the person viewed through the eyewear when they are wearing a mask. The translation may include automatic speech recognition (ASR) and natural language understanding (NLU) as a sound recognition engine.

Type: Grant

Filed: April 2, 2024

Date of Patent: March 4, 2025

Assignee: Snap Inc.

Inventor: Kathleen Worthington McMahon
Systems and methods for automated customized voice filtering

Patent number: 12230288

Abstract: Systems and methods for audio processing are described. An audio processing system receives audio content that includes a voice sample. The audio processing system analyzes the voice sample to identify a sound type in the voice sample. The sound type corresponds to pronunciation of at least one specified character in the voice sample. The audio processing system generates a filtered voice sample at least in part by filtering the voice sample to modify the sound type. The audio processing system outputs the filtered voice sample.

Type: Grant

Filed: May 31, 2022

Date of Patent: February 18, 2025

Assignees: SONY INTERACTIVE ENTERTAINMENT LLC, SONY INTERACTIVE ENTERTAINMENT INC.

Inventors: Jin Zhang, Celeste Bean, Sepideh Karimi, Sudha Krishnamurthy
Method for synthesizing video, terminal and storage medium

Patent number: 12223936

Abstract: A method for synthesizing a video includes: acquiring audio data and dotting data corresponding to the audio data, the dotting data including a beat time point and a beat value corresponding to the beat time point of the audio data; acquiring a plurality of material images from a local source; and synthesizing, based on the dotting data, the plurality of material images and the audio data to acquire a synthesized video, a switching time point of each of the material images in the synthesized video being the beat time point of the audio data.

Type: Grant

Filed: November 22, 2019

Date of Patent: February 11, 2025

Assignee: Guangzhou Kugou Computer Technology Co., Ltd.

Inventors: Han Wu, Wentao Li
System and method for lip-syncing a face to target speech using a machine learning model

Patent number: 12154548

Abstract: A processor-implemented method for generating a lip-sync for a face to a target speech of a live session to a speech in one or more languages in-sync with improved visual quality using a machine learning model and a pre-trained lip-sync model is provided. The method includes (i) determining a visual representation of the face and an audio representation, the visual representation includes crops of the face; (ii) modifying the crops of the face to obtain masked crops; (iii) obtaining a reference frame from the visual representation at a second timestamp; (iv) combining the masked crops at the first timestamp with the reference to obtain lower half crops; (v) training the machine learning model by providing historical lower half crops and historical audio representations as training data; (vi) generating lip-synced frames for the face to the target speech, and (vii) generating an in-sync lip-synced frames by the pre-trained lip-sync model.

Type: Grant

Filed: January 1, 2022

Date of Patent: November 26, 2024

Assignee: INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY, HYDERABAD

Inventors: C. V. Jawahar, Rudrabha Mukhopadhyay, K R Prajwal, Vinay Namboodiri
Artificial intelligence-based animation character drive method and related apparatus

Patent number: 12112417

Abstract: This application discloses an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After target text information is obtained, an acoustic feature and a target expression parameter corresponding to the target text information are determined according to the target text information, the foregoing acquired media data, and the first expression base. A second animation character having a second expression base may be driven according to the acoustic feature and the target expression parameter, so that the second animation character may simulate the speaker's sound and facial expression when saying the target text information, thereby improving experience of interaction between the user and the animation character.

Type: Grant

Filed: December 13, 2022

Date of Patent: October 8, 2024

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Linchao Bao, Shiyin Kang, Sheng Wang, Xiangkai Lin, Xing Ji, Zhantu Zhu, Kuongchi Lei, Deyi Tuo, Peng Liu
Selectively augmenting communications transmitted by a communication device

Patent number: 12105928

Abstract: Technologies for selectively augmenting communications transmitted by a communication device include a communication device configured to acquire new user environment information relating to the environment of the user if such new user environment information becomes available. The communication device is further configured to create one or more user environment indicators based on the new user environment information, to display the one or more created user environment indicators via a display of the communication device and include the created user environment indicator in a communication to be transmitted by the communication device if the created user environment indicator is selected for inclusion in the communication.

Type: Grant

Filed: February 10, 2022

Date of Patent: October 1, 2024

Assignee: Tahoe Research, Ltd.

Inventors: Glen J. Anderson, Jose K. Sia, Jr., Wendy March
Animated emoticon generation method, computer-readable storage medium, and computer device

Patent number: 12094047

Abstract: An animated emoticon generation method, a computer-readable storage medium, and a computer device are provided. The method includes: displaying an emoticon input panel on a chat page; detecting whether a video shooting event is triggered in the emoticon input panel; acquiring video data in response to detecting the video shooting event; obtaining an edit operation for the video data; processing video frames in the video data according to the edit operation to synthesize an animated emoticon; and adding an emoticon thumbnail corresponding to the animated emoticon to the emoticon input panel, the emoticon thumbnail displaying the animated emoticon to be used as a message on the chat page based on a user selecting the emoticon thumbnail in the emoticon input panel.

Type: Grant

Filed: March 23, 2023

Date of Patent: September 17, 2024

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LTD

Inventors: Dong Huang, Tian Yi Liang, Jia Wen Zhong, Jun Jie Zhou, Jin Jiang, Ying Qi, Si Kun Liang
System providing visual assistants with artificial intelligence

Patent number: 12045639

Abstract: Embodiments of the present disclosure may include a system providing visual assistants with artificial intelligence, including an artificial intelligence large language model engine (LLM)coupled to a computer system.

Type: Grant

Filed: August 23, 2023

Date of Patent: July 23, 2024

Assignee: BitHuman Inc

Inventors: Yun Fu, Steve Gu
System and method for talking avatar

Patent number: 12039997

Abstract: Aspects of this disclosure provide techniques for generating a viseme and corresponding intensity pair. In some embodiments, the method includes generating, by a server, a viseme and corresponding intensity pair based at least on one of a clean vocal track or corresponding transcription. The method may include generating, by the server, a compressed audio file based at least on one of the viseme, the corresponding intensity, music, or visual offset. The method may further include generating, by the server or a client end application, a buffer of raw pulse-code modulated (PCM) data based on decoding at least a part of the compressed audio file, where the viseme is scheduled to align with a corresponding phoneme.

Type: Grant

Filed: March 7, 2023

Date of Patent: July 16, 2024

Assignee: LEXIA LEARNING SYSTEMS LLC

Inventor: Carl Adrian Woffenden
Automated conversation content items from natural language

Patent number: 12033258

Abstract: A conversation augmentation system can automatically augment a conversation with content items based on natural language from the conversation. The conversation augmentation system can select content items to add to the conversation based on determined user “intents” generated using machine learning models. The conversation augmentation system can generate intents for natural language from various sources, such as video chats, audio conversations, textual conversations, virtual reality environments, etc. The conversation augmentation system can identify constraints for mapping the intents to content items or context signals for selecting appropriate content items. In various implementations, the conversation augmentation system can add selected content items to a storyline the conversation describes or can augment a platform in which an unstructured conversation is occurring.

Type: Grant

Filed: June 5, 2020

Date of Patent: July 9, 2024

Assignee: Meta Platforms Technologies, LLC

Inventors: Maheen Sohail, Hyunbin Park, Ruoni Wang, Vincent Charles Cheung
Interferometry based systems to detect small mass changes of an object in a solution

Patent number: 12019067

Abstract: A computer-implemented method of using interferometry to detect mass changes of objects in a solution includes obtaining a time series of images using interferometry, and performing background correction on each image by classifying pixels of the image as background pixels or object pixels, fitting only the background pixels of the image to a function to generate a background fitted function, and subtracting the background fitted function from the image to generate a background corrected image. The method includes performing segmentation on the background corrected images to resolve boundaries of one or more objects, performing motion tracking on the objects to track changes in position of the objects, determining respective masses of the motion tracked objects and determining, for each image in the time series, an aggregate mass based on the respective masses to determine whether the aggregate mass of the motion tracked objects is increasing or decreasing.

Type: Grant

Filed: May 20, 2021

Date of Patent: June 25, 2024

Assignee: NantBio, Inc.

Inventors: Kayvan Niazi, Krsto Sbutega
Method and electronic device for automatically animating graphical object

Patent number: 12014453

Abstract: A method for animating a graphical object by an electronic device is provided. The method includes receiving, by the electronic device, the graphical object having at least one predefined portion to animate. The method includes receiving, by the electronic device, an audio to obtain spectral frequencies of the audio. The method includes determining, by the electronic device, at least one of an intensity of the spectral frequencies and at least one range of the spectral frequencies. The method includes generating, by the electronic device, at least one motion on the at least one predefined portion of the graphical object based on the at least one of the intensity of the spectral frequencies and the at least one range of the spectral frequencies.

Type: Grant

Filed: March 30, 2022

Date of Patent: June 18, 2024

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Ramasamy Kannan, Vishakha S. R., Sagar Aggarwal, Lokesh Rayasandra Boregowda
Information processing apparatus and information processing method for selecting a character response to a user based on emotion and intimacy

Patent number: 12002487

Abstract: Provided are an information processing apparatus, an information processing method, and a program that make it possible to assign more natural motions reflecting the emotions of a character. [Solving Means] The information processing apparatus includes a control unit configured to perform the processing of: determining an emotion on the basis of a result of utterance sentence analysis performed on an utterance sentence of a character included in a scenario; selecting, depending on a content of the utterance sentence and the emotion determined, a motion of the character that is synchronized with the utterance sentence; adjusting a character movement speed based on the selected motion and an intimacy between the character and a user; and adding, to the scenario, a description for adjusting presentation of the selected motion to match a voice output timing of the utterance sentence.

Type: Grant

Filed: February 22, 2019

Date of Patent: June 4, 2024

Assignee: SONY GROUP CORPORATION

Inventor: Hideo Nagasaka
Speech-driven animation method and apparatus based on artificial intelligence

Patent number: 12002138

Abstract: Embodiments of this application disclose a speech-driven animation method and apparatus based on artificial intelligence (AI). The method includes obtaining a first speech, the first speech comprising a plurality of speech frames; determining linguistics information corresponding to a speech frame in the first speech, the linguistics information being used for identifying a distribution possibility that the speech frame in the first speech pertains to phonemes; determining an expression parameter corresponding to the speech frame in the first speech according to the linguistics information; and enabling, according to the expression parameter, an animation character to make an expression corresponding to the first speech.

Type: Grant

Filed: October 8, 2021

Date of Patent: June 4, 2024

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Shiyin Kang, Deyi Tuo, Kuongchi Lei, Tianxiao Fu, Huirong Huang, Dan Su
Method and device for generating speech video by using text

Patent number: 11972516

Abstract: A device for generating a speech video according to an embodiment has one or more processor and a memory storing one or more programs executable by the one or more processors, and the device includes a video part generator configured to receive a person background image of a person and generate a video part of a speech video of the person; and an audio part generator configured to receive text, generate an audio part of the speech video of the person, and provide speech-related information occurring during the generation of the audio part to the video part generator.

Type: Grant

Filed: June 19, 2020

Date of Patent: April 30, 2024

Assignee: DEEPBRAIN AI INC.

Inventors: Gyeongsu Chae, Guembuel Hwang, Sungwoo Park, Seyoung Jang
Method for providing speech video and computing device for executing the method

Patent number: 11967336

Abstract: A computing device according to an embodiment is a computing device that is provided with one or more processors and a memory storing one or more programs executed by the one or more processors, the computing device includes a standby state video generating module that generates a standby state video in which a person in a video is in a standby state, a speech state video generating module that generates a speech state video in which a person in a video is in a speech state based on a source of speech content, and a video reproducing module that reproduces the standby state video, and generates a synthesized speech video by synthesizing the standby state video being reproduced and the speech state video.

Type: Grant

Filed: December 22, 2020

Date of Patent: April 23, 2024

Assignee: DEEPBRAIN AI INC.

Inventor: Doohyun Kim
Wearable speech input-based to moving lips display overlay

Patent number: 11955135

Abstract: Eyewear having a speech to moving lips algorithm that receives and translates speech and utterances of a person viewed through the eyewear, and then displays an overlay of moving lips corresponding to the speech and utterances on a mask of the viewed person. A database having text to moving lips information is utilized to translate the speech and generate the moving lips in near-real time with little latency. This translation provides the deaf/hearing impaired users the ability to understand and communicate with the person viewed through the eyewear when they are wearing a mask. The translation may include automatic speech recognition (ASR) and natural language understanding (NLU) as a sound recognition engine.

Type: Grant

Filed: August 23, 2021

Date of Patent: April 9, 2024

Assignee: Snap Inc.

Inventor: Kathleen Worthington McMahon
Voice driven dynamic menus

Patent number: 11934636

Abstract: Disclosed are systems, methods, and computer-readable storage media to provide voice driven dynamic menus. One aspect disclosed is a method including receiving, by an electronic device, video data and audio data, displaying, by the electronic device, a video window, determining, by the electronic device, whether the audio data includes a voice signal, displaying, by the electronic device, a first menu in the video window in response to the audio data including a voice signal, displaying, by the electronic device, a second menu in the video window in response to a voice signal being absent from the audio data, receiving, by the electronic device, input from the displayed menu, and writing, by the electronic device, to an output device based on the received input.

Type: Grant

Filed: March 27, 2023

Date of Patent: March 19, 2024

Assignee: SNAP INC.

Inventor: Jesse Chand
Applying natural language pragmatics in a data visualization user interface

Patent number: 11934461

Abstract: A method uses natural language for visual analysis of a dataset. A data visualization application displays a data visualization, at a computer, based on a dataset retrieved from a database using a set of one or more queries. A user specifies a natural language command related to the displayed data visualization, and the computer extracts an analytic phrase from the natural language command. The computer computes semantic relatedness between the analytic phrase and numeric data fields in the dataset. The computer identifies numeric data fields having highest semantic relatedness to the analytic phrase, and also selects a relevant numerical function. The numerical function compares data values in the numeric data fields to a threshold value. The computer retrieves an updated dataset that filters the identified numeric data fields according to the numeric function. The computer then displays an updated data visualization using the updated dataset.

Type: Grant

Filed: June 8, 2021

Date of Patent: March 19, 2024

Assignee: Tableau Software, Inc.

Inventors: Vidya R. Setlur, Sarah E. Battersby, Melanie K. Tory, Richard C. Gossweiler, III, Angel Xuan Chang, Isaac James Dykeman, Md Enamul Hoque Prince
Device and method for coloring sketch image with color pointer

Patent number: 11922543

Abstract: A method, performed by a coloring apparatus, of coloring a sketch image includes adding a color pointer on the sketch image, according to an input of a user; determining an object related to a point where the color pointer is located, from among objects configuring the sketch image; and generating a colored image by coloring the determined object, based on a color of the color pointer.

Type: Grant

Filed: January 18, 2022

Date of Patent: March 5, 2024

Assignee: NAVER WEBTOON LTD.

Inventors: Jun Hyun Park, Yu Ra Shin, Du Yong Lee, Joo Young Moon
Determining speech from facial skin movements using a housing supported by ear or associated with an earphone

Patent number: 11908478

Abstract: A method for generating speech includes uploading a reference set of features that were extracted from sensed movements of one or more target regions of skin on faces of one or more reference human subjects in response to words articulated by the subjects and without contacting the one or more target regions. A test set of features is extracted a from the sensed movements of at least one of the target regions of skin on a face of a test subject in response to words articulated silently by the test subject and without contacting the one or more target regions. The extracted test set of features is compared to the reference set of features, and, based on the comparison, a speech output is generated, that includes the articulated words of the test subject.

Type: Grant

Filed: March 7, 2023

Date of Patent: February 20, 2024

Assignee: Q (Cue) Ltd.

Inventors: Aviad Maizels, Avi Barliya, Yonatan Wexler
Human centered computing based digital persona generation

Patent number: 11860925

Abstract: In some examples, human centered computing based digital persona generation may include generating, for a digital persona that is to be generated for a target person, synthetic video files and synthetic audio files that are combined to generate synthetic media files. The digital persona may be generated based on a synthetic media file. An inquiry may be received from a user of the generated digital persona. Another synthetic media file may be used by the digital persona to respond to the inquiry. A real-time emotion of the user may be analyzed based on a text sentiment associated with the inquiry, and a voice sentiment and a facial expression associated with the user. Based on the real-time emotion of the user, a further synthetic media file may be utilized by the digital persona to continue or modify a conversation between the generated digital persona and the user.

Type: Grant

Filed: April 5, 2021

Date of Patent: January 2, 2024

Assignee: ACCENTURE GLOBAL SOLUTIONS LIMITED

Inventors: Nisha Ramachandra, Manish Ahuja, Raghotham M Rao, Neville Dubash, Sanjay Podder, Rekha M. Menon
Speech image providing method and computing device for performing the same

Patent number: 11830120

Abstract: A computing device according to an embodiment includes one or more processors, a memory storing one or more programs executed by the one or more processors, a standby state image generating module configured to generate a standby state image in which a person is in a standby state, and generate a back-motion image set including a plurality of back-motion images at a preset frame interval from the standby state image for image interpolation between a preset reference frame of the standby state image, a speech state image generating module configured to generate a speech state image in which a person is in a speech state based on a source of speech content, and an image playback module configured to generate a synthetic speech image by combining the standby state image and the speech state image while playing the standby state image.

Type: Grant

Filed: July 9, 2021

Date of Patent: November 28, 2023

Assignee: DEEPBRAIN AI INC.

Inventor: Doo Hyun Kim
Modifying an environment based on sound

Patent number: 11830119

Abstract: Various implementations disclosed herein include devices, systems, and methods for modifying an environment based on sound. In some implementations, a device includes one or more processors, a display and a non-transitory memory. In some implementations, a method includes displaying a computer graphics environment that includes an object. In some implementations, the method includes detecting, via an audio sensor, a sound from a physical environment of the device. In some implementations, the sound is associated with one or more audio characteristics. In some implementations, the method includes modifying a visual property of the object based on the one or more audio characteristics of the sound.

Type: Grant

Filed: April 28, 2021

Date of Patent: November 28, 2023

Assignee: APPLE INC.

Inventors: Ronald Vernon Ong Siy, Scott Rick Jones, John Brady Parell, Christopher Harlan Paul
Video dubbing method, apparatus, device, and storage medium

Patent number: 11817127

Abstract: The present disclosure provides a video dubbing method, an apparatus, a device, and a storage medium. The method includes: when receiving an audio recording start trigger operation for a first time point of a target video and starting from a video picture corresponding to the first time point, playing the target video based on a timeline and receiving audio data based on the timeline; and when receiving an audio recording end trigger operation for a second time point, generating an audio recording file. The audio recording file has a linkage relationship with a timeline of a video clip taking the video picture corresponding to the first time point as a starting frame and taking a video picture corresponding to the second time point as an ending frame.

Type: Grant

Filed: August 10, 2022

Date of Patent: November 14, 2023

Assignee: BEIJING BYTEDANCE NETWORK TECHNOLOGY CO., LTD.

Inventors: Yan Zeng, Chen Zhao, Qifan Zheng, Pingfei Fu
Producing realistic talking face with expression using images text and voice

Patent number: 11783524

Abstract: A method for providing visual sequences using one or more images comprising: receiving one or more person images of showing at least one face, receiving a message to be enacted by the person, wherein the message comprises at least a text or a emotional and movement command, processing the message to extract or receive an audio data related to voice of the person, and a facial movement data related to expression to be carried on face of the person, processing the image/s, the audio data, and the facial movement data, and generating an animation of the person enacting the message. Wherein emotional and movement command is a GUI or multimedia based instruction to invoke the generation of facial expression/s and or body part/s movement.

Type: Grant

Filed: February 10, 2017

Date of Patent: October 10, 2023

Inventor: Nitin Vats
Camera tracking system for live compositing

Patent number: 11776577

Abstract: A 3D camera tracking and live compositing system includes software and hardware integration and allows users to create, in conjunction with existing programs, live composite video. A video camera, a tracking sensor, encoder, a composite monitor, and a software engine and plugin receive video and data from and integrate it with existing programs to generate real time composite video. The composite feed can be viewed and manipulated by users while filming. Features include 3D masking, depth layering, teleporting, axis locking, motion scaling, and freeze tracking. A storyboarding archive can be used to quickly load scenes with the location, lighting setups, lens profiles and other settings associated with a saved a photo. The video camera's movements can be recorded with video to be later applied to other 3D digital assets in post-production. The system also allows users to load scenes based on a 3D data set created with LIDAR.

Type: Grant

Filed: September 22, 2020

Date of Patent: October 3, 2023

Assignee: Mean Cat Entertainment LLC

Inventors: Donnie Ocean, Michael Batty, John Hoehl
Style-aware audio-driven talking head animation from a single image

Patent number: 11776188

Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for generating an animation of a talking head from an input audio signal of speech and a representation (such as a static image) of a head to animate. Generally, a neural network can learn to predict a set of 3D facial landmarks that can be used to drive the animation. In some embodiments, the neural network can learn to detect different speaking styles in the input speech and account for the different speaking styles when predicting the 3D facial landmarks. Generally, template 3D facial landmarks can be identified or extracted from the input image or other representation of the head, and the template 3D facial landmarks can be used with successive windows of audio from the input speech to predict 3D facial landmarks and generate a corresponding animation with plausible 3D effects.

Type: Grant

Filed: August 15, 2022

Date of Patent: October 3, 2023

Assignee: ADOBE INC.

Inventors: Dingzeyu Li, Yang Zhou, Jose Ignacio Echevarria Vallespi, Elya Shechtman
Text-to-speech (TTS) processing

Patent number: 11763797

Abstract: A speech model includes a sub-model corresponding to a vocal attribute. The speech model generates an output waveform using a sample model, which receives text data, and a conditioning model, which receives text metadata and produces a prosody output for use by the sample model. If, during training or runtime, a different vocal attribute is desired or needed, the sub-model is re-trained or switched to a different sub-model corresponding to the different vocal attribute.

Type: Grant

Filed: June 23, 2020

Date of Patent: September 19, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Roberto Barra Chicote, Adam Franciszek Nadolski, Thomas Edward Merritt, Bartosz Putrycz, Andrew Paul Breen
Modification of objects in film

Patent number: 11715495

Abstract: A computer-implemented method of processing video data comprising a sequence of image frames. The method includes isolating an instance of an object within the sequence of image frames, generating a modified instance of the object using a machine learning model, and modifying the video data to smoothly transition between at least part of the isolated instance of the object and a corresponding at least part of the modified instance of the object over a subsequence of the sequence of image frames.

Type: Grant

Filed: December 23, 2021

Date of Patent: August 1, 2023

Assignee: Flawless Holdings Limited

Inventors: Scott Mann, Pablo Garrido, Hyeongwoo Kim, Sean Danischevsky, Robert Hall, Gary Myles Scullion
Information processing program, terminal device, and information processing method

Patent number: 11712623

Abstract: Provided is an information processing program that is executed at a terminal device that executes effect rendering for outputting video and audio, the information processing program causing the execution of: a second obtaining unit (74) that obtains skip point information indicating skip points for the effect rendering and skip arrival point information indicating a skip arrival point for the effect rendering; and an effect-rendering control unit (77) that controls the effect rendering by skipping the video data to a predetermined point on the basis of an accepted skip operation to resume the output of the video from that point, and in the case where the timing of accepting the skip operation does not coincide with the skip point, by waiting until the skip point after that timing and then skipping to a specific skip arrival point associated with that skip point, on the basis of the skip operation, to resume the output of the audio from that specific skip arrival point.

Type: Grant

Filed: September 25, 2019

Date of Patent: August 1, 2023

Assignee: CYGAMES, INC.

Inventors: Michihiro Sato, Takamichi Yashiki, Soichiro Tamura
Methods and systems for user authentication

Patent number: 11698954

Abstract: A user device, such as a smartphone or laptop, may be password (passphrase) protected. The user device may combine biometric input analysis, such as facial recognition, with viseme analysis to authenticate a user attempting to use a password (passphrase) to access the user device. Secure authentication methods and systems are described that account for variations in how, based on the user's emotion (e.g., mood, temperament, unique pronunciation, etc. . . . ), a password (passphrase) may be presented to the user device.

Type: Grant

Filed: May 7, 2021

Date of Patent: July 11, 2023

Assignee: Comcast Cable Communications, LLC

Inventor: Fei Wan
Viseme data generation for presentation while content is output

Patent number: 11699455

Abstract: Systems and methods for viseme data generation are disclosed. Uncompressed audio data is generated and/or utilized to determine the beats per minute of the audio data. Visemes are associated with the audio data utilizing a Viterbi algorithm and the beats per minute. A time-stamped list of viseme data is generated that associates the visemes with the portions of the audio data that they correspond to. An animatronic toy and/or an animation is caused to lip sync using the viseme data while audio corresponding to the audio data is output.

Type: Grant

Filed: September 4, 2020

Date of Patent: July 11, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Zoe Adams, Pete Klein, Derick Deller, Bradley Michael Richards, Anirudh Ranganath
Joint audio-video facial animation system

Patent number: 11610354

Abstract: The present invention relates to a joint automatic audio visual driven facial animation system that in some example embodiments includes a full scale state of the art Large Vocabulary Continuous Speech Recognition (LVCSR) with a strong language model for speech recognition and obtained phoneme alignment from the word lattice.

Type: Grant

Filed: June 16, 2021

Date of Patent: March 21, 2023

Assignee: Snap Inc.

Inventors: Chen Cao, Xin Chen, Wei Chu, Zehao Xue
Real-time cognitive wireless networking through deep learning in transmission and reception communication paths

Patent number: 11610111

Abstract: Apparatuses and methods for real-time spectrum-driven embedded wireless networking through deep learning are provided. Radio frequency, optical, or acoustic communication apparatus include a programmable logic system having a front-end configuration core, a learning core, and a learning actuation core. The learning core includes a deep learning neural network that receives and processes input in-phase/quadrature (I/Q) input samples through the neural network layers to extract RF, optical, or acoustic spectrum information. A processing system having a learning controller module controls operations of the learning core and the learning actuation core. The processing system and the programmable logic system are operable to configure one or more communication and networking parameters for transmission via the transceiver in response to extracted spectrum information.

Type: Grant

Filed: October 3, 2019

Date of Patent: March 21, 2023

Assignee: Northeastern University

Inventors: Francesco Restuccia, Tommaso Melodia
System and method for talking avatar

Patent number: 11600290

Abstract: Aspects of this disclosure provide techniques for generating a viseme and corresponding intensity pair. In some embodiments, the method includes generating, by a server, a viseme and corresponding intensity pair based at least on one of a clean vocal track or corresponding transcription. The method may include generating, by the server, a compressed audio file based at least on one of the viseme, the corresponding intensity, music, or visual offset. The method may further include generating, by the server or a client end application, a buffer of raw pulse-code modulated (PCM) data based on decoding at least a part of the compressed audio file, where the viseme is scheduled to align with a corresponding phoneme.

Type: Grant

Filed: September 9, 2020

Date of Patent: March 7, 2023

Assignee: LEXIA LEARNING SYSTEMS LLC

Inventor: Carl Adrian Woffenden
Systems and methods for computer animation of an artificial character using facial poses from a live actor

Patent number: 11587278

Abstract: Embodiments described herein provide an approach of animating a character face of an artificial character based on facial poses performed by a live actor. Geometric characteristics of the facial surface corresponding to each facial pose performed the live actor may be learnt by a machine learning system, which in turn build a mesh of a facial rig of an array of controllable elements applicable on a character face of an artificial character.

Type: Grant

Filed: August 16, 2021

Date of Patent: February 21, 2023

Assignee: UNITY TECHNOLOGIES SF

Inventors: Wan-duo Kurt Ma, Muhammad Ghifary
Providing alternative live media content

Patent number: 11553215

Abstract: Techniques are described for providing alternative media content to a client device along with primary media content.

Type: Grant

Filed: September 25, 2019

Date of Patent: January 10, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Alexandria Way-Wun Kravis, Joshua Danovitz, Brandon Scott Love, Felicia Yue, Jeromey Russell Goetz, Lars Christian Ulness
Audio-speech driven animated talking face generation using a cascaded generative adversarial network

Patent number: 11551394

Abstract: Conventional state-of-the-art methods are limited in their ability to generate realistic animation from audio on any unknown faces and cannot be easily generalized to different facial characteristics and voice accents. Further, these methods fail to produce realistic facial animation for subjects which are quite different than that of distribution of facial characteristics network has seen during training. Embodiments of the present disclosure provide systems and methods that generate audio-speech driven animated talking face using a cascaded generative adversarial network (CGAN), wherein a first GAN is used to transfer lip motion from canonical face to person-specific face. A second GAN based texture generator network is conditioned on person-specific landmark to generate high-fidelity face corresponding to the motion. Texture generator GAN is made more flexible using meta learning to adapt to unknown subject's traits and orientation of face during inference.

Type: Grant

Filed: March 11, 2021

Date of Patent: January 10, 2023

Assignee: TATA CONSULTANCY SERVICES LIMITED

Inventors: Sandika Biswas, Dipanjan Das, Sanjana Sinha, Brojeshwar Bhowmick
Virtual object image display method and apparatus, electronic device and storage medium

Patent number: 11423907

Abstract: The application provides a virtual object image display method and apparatus, an electronic device and a storage medium, relates to the field of artificial intelligence, in particular to the field of computer vision and deep learning, and may be applied to virtual object dialogue scenarios. The specific implementation scheme includes: segmenting acquired voice to obtain voice segments; predicting lip shape sequence information for the voice segments; searching for a corresponding lip shape image sequence based on the lip shape sequence information; performing lip fusion between the lip shape image sequence and a virtual object baseplate to obtain a virtual object image; displaying the virtual object image. The application improves ability to obtain virtual object image.

Type: Grant

Filed: March 17, 2021

Date of Patent: August 23, 2022

Assignee: Beijing Baidu Netcom Science Technology Co., Ltd.

Inventors: Tianshu Hu, Mingming Ma, Tonghui Li, Zhibin Hong
Style-aware audio-driven talking head animation from a single image

Patent number: 11417041

Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for generating an animation of a talking head from an input audio signal of speech and a representation (such as a static image) of a head to animate. Generally, a neural network can learn to predict a set of 3D facial landmarks that can be used to drive the animation. In some embodiments, the neural network can learn to detect different speaking styles in the input speech and account for the different speaking styles when predicting the 3D facial landmarks. Generally, template 3D facial landmarks can be identified or extracted from the input image or other representation of the head, and the template 3D facial landmarks can be used with successive windows of audio from the input speech to predict 3D facial landmarks and generate a corresponding animation with plausible 3D effects.

Type: Grant

Filed: February 12, 2020

Date of Patent: August 16, 2022

Assignee: Adobe Inc.

Inventors: Dingzeyu Li, Yang Zhou, Jose Ignacio Echevarria Vallespi, Elya Shechtman
Augmented reality storytelling: audience-side

Patent number: 11406896

Abstract: In one embodiment, a technique includes receiving a first story fragment from a storyteller interface of a first computing system. The technique further includes displaying the first story fragment on an audience interface of a second computing system. The technique also includes detecting a trigger on the audience interface of the second computing system. In the technique, the trigger corresponds to the first story fragment. The technique also includes identifying a special effect associated with the trigger. The technique further includes outputting the special effect and the first story fragment on at least one of the storyteller interface displayed on the first computing system and the audience interface displayed on the second computing system.

Type: Grant

Filed: October 5, 2018

Date of Patent: August 9, 2022

Assignee: Meta Platforms, Inc.

Inventors: Vincent Charles Cheung, Baback Elmieh, Connie Yeewei Ho, Girish Patangay, Mark Alexander Walsh
Automatic gain control circuit and method for automatic gain control

Patent number: 8965011

Abstract: A method of attenuating an input signal to obtain an output signal is described. The method comprises receiving the input signal, attenuating the input signal with a gain factor to obtain the output signal, applying a filter having a frequency response with a frequency-dependent filter gain to at least one of a copy of the input signal and a copy of the output signal to obtain a filtered signal, the frequency-dependent filter gain being arranged to emphasize frequencies within a number N of predetermined frequency ranges, N>1; wherein the filter comprises a sequence of N sub-filters, each one of the N sub-filters having a frequency response adapted to emphasize frequencies within a corresponding one of the N predetermined frequency ranges; determining a signal strength of the filtered signal, and determining the gain factor from at least the signal strength.

Type: Grant

Filed: December 20, 2011

Date of Patent: February 24, 2015

Assignee: Dialog Semiconductor B.V.

Inventor: Michiel Andre Helsloot
Methods and apparatus for facial feature replacement

Patent number: 8818131

Abstract: Three dimensional models corresponding to a target image and a reference image are selected based on a set of feature points defining facial features in the target image and the reference image. The set of feature points defining the facial features in the target image and the reference image are associated with corresponding 3-dimensional models. A 3D motion flow between the 3-dimensional models is computed. The 3D motion flow is projected onto a 2D image plane to create a 2D optical field flow. The target image and the reference image are warped using the 2D optical field flow. A selected feature from the reference image is copied to the target image.

Type: Grant

Filed: November 24, 2010

Date of Patent: August 26, 2014

Assignee: Adobe Systems Incorporated

Inventors: Jue Wang, Elya Shechtman, Lubomir D. Bourdev, Fei Yang
Concealment signal generator, concealment signal generation method, and computer product

Patent number: 8438035

Abstract: When there are missing voice-transmission-signals, a repetition-section calculating unit sets a plurality of repetition sections of different lengths that are determined to be similar to the voice-transmission-signals preceding the missing voice-transmission-signal, the repetition sections being determined with respect to stationary voice-transmission-signals stored in a normal signal storage unit, the stationary voice-transmission-signals being selected from the previously input voice-transmission-signals. A controller generates a concealment signal using the repetition sections.

Type: Grant

Filed: December 31, 2007

Date of Patent: May 7, 2013

Assignee: Fujitsu Limited

Inventors: Kaori Endo, Yasuji Ota, Chikako Matsumoto
PHOTO-REALISTIC SYNTHESIS OF IMAGE SEQUENCES WITH LIP MOVEMENTS SYNCHRONIZED WITH SPEECH

Publication number: 20120284029

Abstract: Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which a synthesized image sequence will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with lip movements synchronized with the desired speech.

Type: Application

Filed: May 2, 2011

Publication date: November 8, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Lijuan Wang, Frank Soong
DEVICES FOR ENCODING AND DECODING A WATERMARKED SIGNAL

Publication number: 20120203555

Abstract: An electronic device configured for encoding a watermarked signal is described. The electronic device includes modeler circuitry. The modeler circuitry determines parameters based on a first signal and a first-pass coded signal. The electronic device also includes coder circuitry coupled to the modeler circuitry. The coder circuitry performs a first-pass coding on a second signal to obtain the first-pass coded signal and performs a second-pass coding based on the parameters to obtain a watermarked signal.

Type: Application

Filed: October 18, 2011

Publication date: August 9, 2012

Applicant: QUALCOMM Incorporated

Inventors: Stephane Pierre Villette, Daniel J. Sinder
DEVICES FOR ENCODING AND DETECTING A WATERMARKED SIGNAL

Publication number: 20120203556

Abstract: A method for decoding a signal on an electronic device is described. The method includes receiving a signal. The method also includes extracting a bitstream from the signal. The method further includes performing watermark error checking on the bitstream for multiple frames. The method additionally includes determining whether watermark data is detected based on the watermark error checking. The method also includes decoding the bitstream to obtain a decoded second signal if the watermark data is not detected.

Type: Application

Filed: October 18, 2011

Publication date: August 9, 2012

Applicant: QUALCOMM Incorporated

Inventors: Stephane Pierre Villette, Daniel J. Sinder
INFORMATION PROVIDING DEVICE

Publication number: 20120130720

Abstract: An information providing device takes an image of a predetermined area and obtains the taken image in the form of image data, while externally obtaining voice data representing speech. The information providing device obtains text in a preset language corresponding to the speech in the form of text data, based on the obtained voice data, generates a composite image including the taken image and the text in the form of composite image data, based on the image data and the text data, and outputs the composite image data.

Type: Application

Filed: November 14, 2011

Publication date: May 24, 2012

Applicant: ELMO COMPANY LIMITED

Inventor: Yasushi Suda
SPEECH AND TEXT DRIVEN HMM-BASED BODY ANIMATION SYNTHESIS

Publication number: 20100082345

Abstract: An “Animation Synthesizer” uses trainable probabilistic models, such as Hidden Markov Models (HMM), Artificial Neural Networks (ANN), etc., to provide speech and text driven body animation synthesis. Probabilistic models are trained using synchronized motion and speech inputs (e.g., live or recorded audio/video feeds) at various speech levels, such as sentences, phrases, words, phonemes, sub-phonemes, etc., depending upon the available data, and the motion type or body part being modeled. The Animation Synthesizer then uses the trainable probabilistic model for selecting animation trajectories for one or more different body parts (e.g., face, head, hands, arms, etc.) based on an arbitrary text and/or speech input. These animation trajectories are then used to synthesize a sequence of animations for digital avatars, cartoon characters, computer generated anthropomorphic persons or creatures, actual motions for physical robots, etc.

Type: Application

Filed: September 26, 2008

Publication date: April 1, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Lijuan Wang, Lei Ma, Frank Kao-Ping Soong

1 2 next