Synchronization Of Speech With Image Or Synthesis Of The Lips Movement From Speech, E.g., For "talking Heads," Etc.(epo) Patents (Class 704/E21.02)
  • Patent number: 12243552
    Abstract: Eyewear having a speech to moving lips algorithm that receives and translates speech and utterances of a person viewed through the eyewear, and then displays an overlay of moving lips corresponding to the speech and utterances on a mask of the viewed person. A database having text to moving lips information is utilized to translate the speech and generate the moving lips in near-real time with little latency. This translation provides the deaf/hearing impaired users the ability to understand and communicate with the person viewed through the eyewear when they are wearing a mask. The translation may include automatic speech recognition (ASR) and natural language understanding (NLU) as a sound recognition engine.
    Type: Grant
    Filed: April 2, 2024
    Date of Patent: March 4, 2025
    Assignee: Snap Inc.
    Inventor: Kathleen Worthington McMahon
  • Patent number: 12230288
    Abstract: Systems and methods for audio processing are described. An audio processing system receives audio content that includes a voice sample. The audio processing system analyzes the voice sample to identify a sound type in the voice sample. The sound type corresponds to pronunciation of at least one specified character in the voice sample. The audio processing system generates a filtered voice sample at least in part by filtering the voice sample to modify the sound type. The audio processing system outputs the filtered voice sample.
    Type: Grant
    Filed: May 31, 2022
    Date of Patent: February 18, 2025
    Assignees: SONY INTERACTIVE ENTERTAINMENT LLC, SONY INTERACTIVE ENTERTAINMENT INC.
    Inventors: Jin Zhang, Celeste Bean, Sepideh Karimi, Sudha Krishnamurthy
  • Patent number: 12223936
    Abstract: A method for synthesizing a video includes: acquiring audio data and dotting data corresponding to the audio data, the dotting data including a beat time point and a beat value corresponding to the beat time point of the audio data; acquiring a plurality of material images from a local source; and synthesizing, based on the dotting data, the plurality of material images and the audio data to acquire a synthesized video, a switching time point of each of the material images in the synthesized video being the beat time point of the audio data.
    Type: Grant
    Filed: November 22, 2019
    Date of Patent: February 11, 2025
    Assignee: Guangzhou Kugou Computer Technology Co., Ltd.
    Inventors: Han Wu, Wentao Li
  • Patent number: 12154548
    Abstract: A processor-implemented method for generating a lip-sync for a face to a target speech of a live session to a speech in one or more languages in-sync with improved visual quality using a machine learning model and a pre-trained lip-sync model is provided. The method includes (i) determining a visual representation of the face and an audio representation, the visual representation includes crops of the face; (ii) modifying the crops of the face to obtain masked crops; (iii) obtaining a reference frame from the visual representation at a second timestamp; (iv) combining the masked crops at the first timestamp with the reference to obtain lower half crops; (v) training the machine learning model by providing historical lower half crops and historical audio representations as training data; (vi) generating lip-synced frames for the face to the target speech, and (vii) generating an in-sync lip-synced frames by the pre-trained lip-sync model.
    Type: Grant
    Filed: January 1, 2022
    Date of Patent: November 26, 2024
    Assignee: INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY, HYDERABAD
    Inventors: C. V. Jawahar, Rudrabha Mukhopadhyay, K R Prajwal, Vinay Namboodiri
  • Patent number: 12112417
    Abstract: This application discloses an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After target text information is obtained, an acoustic feature and a target expression parameter corresponding to the target text information are determined according to the target text information, the foregoing acquired media data, and the first expression base. A second animation character having a second expression base may be driven according to the acoustic feature and the target expression parameter, so that the second animation character may simulate the speaker's sound and facial expression when saying the target text information, thereby improving experience of interaction between the user and the animation character.
    Type: Grant
    Filed: December 13, 2022
    Date of Patent: October 8, 2024
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Linchao Bao, Shiyin Kang, Sheng Wang, Xiangkai Lin, Xing Ji, Zhantu Zhu, Kuongchi Lei, Deyi Tuo, Peng Liu
  • Patent number: 12105928
    Abstract: Technologies for selectively augmenting communications transmitted by a communication device include a communication device configured to acquire new user environment information relating to the environment of the user if such new user environment information becomes available. The communication device is further configured to create one or more user environment indicators based on the new user environment information, to display the one or more created user environment indicators via a display of the communication device and include the created user environment indicator in a communication to be transmitted by the communication device if the created user environment indicator is selected for inclusion in the communication.
    Type: Grant
    Filed: February 10, 2022
    Date of Patent: October 1, 2024
    Assignee: Tahoe Research, Ltd.
    Inventors: Glen J. Anderson, Jose K. Sia, Jr., Wendy March
  • Patent number: 12094047
    Abstract: An animated emoticon generation method, a computer-readable storage medium, and a computer device are provided. The method includes: displaying an emoticon input panel on a chat page; detecting whether a video shooting event is triggered in the emoticon input panel; acquiring video data in response to detecting the video shooting event; obtaining an edit operation for the video data; processing video frames in the video data according to the edit operation to synthesize an animated emoticon; and adding an emoticon thumbnail corresponding to the animated emoticon to the emoticon input panel, the emoticon thumbnail displaying the animated emoticon to be used as a message on the chat page based on a user selecting the emoticon thumbnail in the emoticon input panel.
    Type: Grant
    Filed: March 23, 2023
    Date of Patent: September 17, 2024
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LTD
    Inventors: Dong Huang, Tian Yi Liang, Jia Wen Zhong, Jun Jie Zhou, Jin Jiang, Ying Qi, Si Kun Liang
  • Patent number: 12045639
    Abstract: Embodiments of the present disclosure may include a system providing visual assistants with artificial intelligence, including an artificial intelligence large language model engine (LLM)coupled to a computer system.
    Type: Grant
    Filed: August 23, 2023
    Date of Patent: July 23, 2024
    Assignee: BitHuman Inc
    Inventors: Yun Fu, Steve Gu
  • Patent number: 12039997
    Abstract: Aspects of this disclosure provide techniques for generating a viseme and corresponding intensity pair. In some embodiments, the method includes generating, by a server, a viseme and corresponding intensity pair based at least on one of a clean vocal track or corresponding transcription. The method may include generating, by the server, a compressed audio file based at least on one of the viseme, the corresponding intensity, music, or visual offset. The method may further include generating, by the server or a client end application, a buffer of raw pulse-code modulated (PCM) data based on decoding at least a part of the compressed audio file, where the viseme is scheduled to align with a corresponding phoneme.
    Type: Grant
    Filed: March 7, 2023
    Date of Patent: July 16, 2024
    Assignee: LEXIA LEARNING SYSTEMS LLC
    Inventor: Carl Adrian Woffenden
  • Patent number: 12033258
    Abstract: A conversation augmentation system can automatically augment a conversation with content items based on natural language from the conversation. The conversation augmentation system can select content items to add to the conversation based on determined user “intents” generated using machine learning models. The conversation augmentation system can generate intents for natural language from various sources, such as video chats, audio conversations, textual conversations, virtual reality environments, etc. The conversation augmentation system can identify constraints for mapping the intents to content items or context signals for selecting appropriate content items. In various implementations, the conversation augmentation system can add selected content items to a storyline the conversation describes or can augment a platform in which an unstructured conversation is occurring.
    Type: Grant
    Filed: June 5, 2020
    Date of Patent: July 9, 2024
    Assignee: Meta Platforms Technologies, LLC
    Inventors: Maheen Sohail, Hyunbin Park, Ruoni Wang, Vincent Charles Cheung
  • Patent number: 12019067
    Abstract: A computer-implemented method of using interferometry to detect mass changes of objects in a solution includes obtaining a time series of images using interferometry, and performing background correction on each image by classifying pixels of the image as background pixels or object pixels, fitting only the background pixels of the image to a function to generate a background fitted function, and subtracting the background fitted function from the image to generate a background corrected image. The method includes performing segmentation on the background corrected images to resolve boundaries of one or more objects, performing motion tracking on the objects to track changes in position of the objects, determining respective masses of the motion tracked objects and determining, for each image in the time series, an aggregate mass based on the respective masses to determine whether the aggregate mass of the motion tracked objects is increasing or decreasing.
    Type: Grant
    Filed: May 20, 2021
    Date of Patent: June 25, 2024
    Assignee: NantBio, Inc.
    Inventors: Kayvan Niazi, Krsto Sbutega
  • Patent number: 12014453
    Abstract: A method for animating a graphical object by an electronic device is provided. The method includes receiving, by the electronic device, the graphical object having at least one predefined portion to animate. The method includes receiving, by the electronic device, an audio to obtain spectral frequencies of the audio. The method includes determining, by the electronic device, at least one of an intensity of the spectral frequencies and at least one range of the spectral frequencies. The method includes generating, by the electronic device, at least one motion on the at least one predefined portion of the graphical object based on the at least one of the intensity of the spectral frequencies and the at least one range of the spectral frequencies.
    Type: Grant
    Filed: March 30, 2022
    Date of Patent: June 18, 2024
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Ramasamy Kannan, Vishakha S. R., Sagar Aggarwal, Lokesh Rayasandra Boregowda
  • Patent number: 12002487
    Abstract: Provided are an information processing apparatus, an information processing method, and a program that make it possible to assign more natural motions reflecting the emotions of a character. [Solving Means] The information processing apparatus includes a control unit configured to perform the processing of: determining an emotion on the basis of a result of utterance sentence analysis performed on an utterance sentence of a character included in a scenario; selecting, depending on a content of the utterance sentence and the emotion determined, a motion of the character that is synchronized with the utterance sentence; adjusting a character movement speed based on the selected motion and an intimacy between the character and a user; and adding, to the scenario, a description for adjusting presentation of the selected motion to match a voice output timing of the utterance sentence.
    Type: Grant
    Filed: February 22, 2019
    Date of Patent: June 4, 2024
    Assignee: SONY GROUP CORPORATION
    Inventor: Hideo Nagasaka
  • Patent number: 12002138
    Abstract: Embodiments of this application disclose a speech-driven animation method and apparatus based on artificial intelligence (AI). The method includes obtaining a first speech, the first speech comprising a plurality of speech frames; determining linguistics information corresponding to a speech frame in the first speech, the linguistics information being used for identifying a distribution possibility that the speech frame in the first speech pertains to phonemes; determining an expression parameter corresponding to the speech frame in the first speech according to the linguistics information; and enabling, according to the expression parameter, an animation character to make an expression corresponding to the first speech.
    Type: Grant
    Filed: October 8, 2021
    Date of Patent: June 4, 2024
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Shiyin Kang, Deyi Tuo, Kuongchi Lei, Tianxiao Fu, Huirong Huang, Dan Su
  • Patent number: 11972516
    Abstract: A device for generating a speech video according to an embodiment has one or more processor and a memory storing one or more programs executable by the one or more processors, and the device includes a video part generator configured to receive a person background image of a person and generate a video part of a speech video of the person; and an audio part generator configured to receive text, generate an audio part of the speech video of the person, and provide speech-related information occurring during the generation of the audio part to the video part generator.
    Type: Grant
    Filed: June 19, 2020
    Date of Patent: April 30, 2024
    Assignee: DEEPBRAIN AI INC.
    Inventors: Gyeongsu Chae, Guembuel Hwang, Sungwoo Park, Seyoung Jang
  • Patent number: 11967336
    Abstract: A computing device according to an embodiment is a computing device that is provided with one or more processors and a memory storing one or more programs executed by the one or more processors, the computing device includes a standby state video generating module that generates a standby state video in which a person in a video is in a standby state, a speech state video generating module that generates a speech state video in which a person in a video is in a speech state based on a source of speech content, and a video reproducing module that reproduces the standby state video, and generates a synthesized speech video by synthesizing the standby state video being reproduced and the speech state video.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: April 23, 2024
    Assignee: DEEPBRAIN AI INC.
    Inventor: Doohyun Kim
  • Patent number: 11955135
    Abstract: Eyewear having a speech to moving lips algorithm that receives and translates speech and utterances of a person viewed through the eyewear, and then displays an overlay of moving lips corresponding to the speech and utterances on a mask of the viewed person. A database having text to moving lips information is utilized to translate the speech and generate the moving lips in near-real time with little latency. This translation provides the deaf/hearing impaired users the ability to understand and communicate with the person viewed through the eyewear when they are wearing a mask. The translation may include automatic speech recognition (ASR) and natural language understanding (NLU) as a sound recognition engine.
    Type: Grant
    Filed: August 23, 2021
    Date of Patent: April 9, 2024
    Assignee: Snap Inc.
    Inventor: Kathleen Worthington McMahon
  • Patent number: 11934636
    Abstract: Disclosed are systems, methods, and computer-readable storage media to provide voice driven dynamic menus. One aspect disclosed is a method including receiving, by an electronic device, video data and audio data, displaying, by the electronic device, a video window, determining, by the electronic device, whether the audio data includes a voice signal, displaying, by the electronic device, a first menu in the video window in response to the audio data including a voice signal, displaying, by the electronic device, a second menu in the video window in response to a voice signal being absent from the audio data, receiving, by the electronic device, input from the displayed menu, and writing, by the electronic device, to an output device based on the received input.
    Type: Grant
    Filed: March 27, 2023
    Date of Patent: March 19, 2024
    Assignee: SNAP INC.
    Inventor: Jesse Chand
  • Patent number: 11934461
    Abstract: A method uses natural language for visual analysis of a dataset. A data visualization application displays a data visualization, at a computer, based on a dataset retrieved from a database using a set of one or more queries. A user specifies a natural language command related to the displayed data visualization, and the computer extracts an analytic phrase from the natural language command. The computer computes semantic relatedness between the analytic phrase and numeric data fields in the dataset. The computer identifies numeric data fields having highest semantic relatedness to the analytic phrase, and also selects a relevant numerical function. The numerical function compares data values in the numeric data fields to a threshold value. The computer retrieves an updated dataset that filters the identified numeric data fields according to the numeric function. The computer then displays an updated data visualization using the updated dataset.
    Type: Grant
    Filed: June 8, 2021
    Date of Patent: March 19, 2024
    Assignee: Tableau Software, Inc.
    Inventors: Vidya R. Setlur, Sarah E. Battersby, Melanie K. Tory, Richard C. Gossweiler, III, Angel Xuan Chang, Isaac James Dykeman, Md Enamul Hoque Prince
  • Patent number: 11922543
    Abstract: A method, performed by a coloring apparatus, of coloring a sketch image includes adding a color pointer on the sketch image, according to an input of a user; determining an object related to a point where the color pointer is located, from among objects configuring the sketch image; and generating a colored image by coloring the determined object, based on a color of the color pointer.
    Type: Grant
    Filed: January 18, 2022
    Date of Patent: March 5, 2024
    Assignee: NAVER WEBTOON LTD.
    Inventors: Jun Hyun Park, Yu Ra Shin, Du Yong Lee, Joo Young Moon
  • Patent number: 11908478
    Abstract: A method for generating speech includes uploading a reference set of features that were extracted from sensed movements of one or more target regions of skin on faces of one or more reference human subjects in response to words articulated by the subjects and without contacting the one or more target regions. A test set of features is extracted a from the sensed movements of at least one of the target regions of skin on a face of a test subject in response to words articulated silently by the test subject and without contacting the one or more target regions. The extracted test set of features is compared to the reference set of features, and, based on the comparison, a speech output is generated, that includes the articulated words of the test subject.
    Type: Grant
    Filed: March 7, 2023
    Date of Patent: February 20, 2024
    Assignee: Q (Cue) Ltd.
    Inventors: Aviad Maizels, Avi Barliya, Yonatan Wexler
  • Patent number: 11860925
    Abstract: In some examples, human centered computing based digital persona generation may include generating, for a digital persona that is to be generated for a target person, synthetic video files and synthetic audio files that are combined to generate synthetic media files. The digital persona may be generated based on a synthetic media file. An inquiry may be received from a user of the generated digital persona. Another synthetic media file may be used by the digital persona to respond to the inquiry. A real-time emotion of the user may be analyzed based on a text sentiment associated with the inquiry, and a voice sentiment and a facial expression associated with the user. Based on the real-time emotion of the user, a further synthetic media file may be utilized by the digital persona to continue or modify a conversation between the generated digital persona and the user.
    Type: Grant
    Filed: April 5, 2021
    Date of Patent: January 2, 2024
    Assignee: ACCENTURE GLOBAL SOLUTIONS LIMITED
    Inventors: Nisha Ramachandra, Manish Ahuja, Raghotham M Rao, Neville Dubash, Sanjay Podder, Rekha M. Menon
  • Patent number: 11830120
    Abstract: A computing device according to an embodiment includes one or more processors, a memory storing one or more programs executed by the one or more processors, a standby state image generating module configured to generate a standby state image in which a person is in a standby state, and generate a back-motion image set including a plurality of back-motion images at a preset frame interval from the standby state image for image interpolation between a preset reference frame of the standby state image, a speech state image generating module configured to generate a speech state image in which a person is in a speech state based on a source of speech content, and an image playback module configured to generate a synthetic speech image by combining the standby state image and the speech state image while playing the standby state image.
    Type: Grant
    Filed: July 9, 2021
    Date of Patent: November 28, 2023
    Assignee: DEEPBRAIN AI INC.
    Inventor: Doo Hyun Kim
  • Patent number: 11830119
    Abstract: Various implementations disclosed herein include devices, systems, and methods for modifying an environment based on sound. In some implementations, a device includes one or more processors, a display and a non-transitory memory. In some implementations, a method includes displaying a computer graphics environment that includes an object. In some implementations, the method includes detecting, via an audio sensor, a sound from a physical environment of the device. In some implementations, the sound is associated with one or more audio characteristics. In some implementations, the method includes modifying a visual property of the object based on the one or more audio characteristics of the sound.
    Type: Grant
    Filed: April 28, 2021
    Date of Patent: November 28, 2023
    Assignee: APPLE INC.
    Inventors: Ronald Vernon Ong Siy, Scott Rick Jones, John Brady Parell, Christopher Harlan Paul
  • Patent number: 11817127
    Abstract: The present disclosure provides a video dubbing method, an apparatus, a device, and a storage medium. The method includes: when receiving an audio recording start trigger operation for a first time point of a target video and starting from a video picture corresponding to the first time point, playing the target video based on a timeline and receiving audio data based on the timeline; and when receiving an audio recording end trigger operation for a second time point, generating an audio recording file. The audio recording file has a linkage relationship with a timeline of a video clip taking the video picture corresponding to the first time point as a starting frame and taking a video picture corresponding to the second time point as an ending frame.
    Type: Grant
    Filed: August 10, 2022
    Date of Patent: November 14, 2023
    Assignee: BEIJING BYTEDANCE NETWORK TECHNOLOGY CO., LTD.
    Inventors: Yan Zeng, Chen Zhao, Qifan Zheng, Pingfei Fu
  • Patent number: 11783524
    Abstract: A method for providing visual sequences using one or more images comprising: receiving one or more person images of showing at least one face, receiving a message to be enacted by the person, wherein the message comprises at least a text or a emotional and movement command, processing the message to extract or receive an audio data related to voice of the person, and a facial movement data related to expression to be carried on face of the person, processing the image/s, the audio data, and the facial movement data, and generating an animation of the person enacting the message. Wherein emotional and movement command is a GUI or multimedia based instruction to invoke the generation of facial expression/s and or body part/s movement.
    Type: Grant
    Filed: February 10, 2017
    Date of Patent: October 10, 2023
    Inventor: Nitin Vats
  • Patent number: 11776577
    Abstract: A 3D camera tracking and live compositing system includes software and hardware integration and allows users to create, in conjunction with existing programs, live composite video. A video camera, a tracking sensor, encoder, a composite monitor, and a software engine and plugin receive video and data from and integrate it with existing programs to generate real time composite video. The composite feed can be viewed and manipulated by users while filming. Features include 3D masking, depth layering, teleporting, axis locking, motion scaling, and freeze tracking. A storyboarding archive can be used to quickly load scenes with the location, lighting setups, lens profiles and other settings associated with a saved a photo. The video camera's movements can be recorded with video to be later applied to other 3D digital assets in post-production. The system also allows users to load scenes based on a 3D data set created with LIDAR.
    Type: Grant
    Filed: September 22, 2020
    Date of Patent: October 3, 2023
    Assignee: Mean Cat Entertainment LLC
    Inventors: Donnie Ocean, Michael Batty, John Hoehl
  • Patent number: 11776188
    Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for generating an animation of a talking head from an input audio signal of speech and a representation (such as a static image) of a head to animate. Generally, a neural network can learn to predict a set of 3D facial landmarks that can be used to drive the animation. In some embodiments, the neural network can learn to detect different speaking styles in the input speech and account for the different speaking styles when predicting the 3D facial landmarks. Generally, template 3D facial landmarks can be identified or extracted from the input image or other representation of the head, and the template 3D facial landmarks can be used with successive windows of audio from the input speech to predict 3D facial landmarks and generate a corresponding animation with plausible 3D effects.
    Type: Grant
    Filed: August 15, 2022
    Date of Patent: October 3, 2023
    Assignee: ADOBE INC.
    Inventors: Dingzeyu Li, Yang Zhou, Jose Ignacio Echevarria Vallespi, Elya Shechtman
  • Patent number: 11763797
    Abstract: A speech model includes a sub-model corresponding to a vocal attribute. The speech model generates an output waveform using a sample model, which receives text data, and a conditioning model, which receives text metadata and produces a prosody output for use by the sample model. If, during training or runtime, a different vocal attribute is desired or needed, the sub-model is re-trained or switched to a different sub-model corresponding to the different vocal attribute.
    Type: Grant
    Filed: June 23, 2020
    Date of Patent: September 19, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Roberto Barra Chicote, Adam Franciszek Nadolski, Thomas Edward Merritt, Bartosz Putrycz, Andrew Paul Breen
  • Patent number: 11715495
    Abstract: A computer-implemented method of processing video data comprising a sequence of image frames. The method includes isolating an instance of an object within the sequence of image frames, generating a modified instance of the object using a machine learning model, and modifying the video data to smoothly transition between at least part of the isolated instance of the object and a corresponding at least part of the modified instance of the object over a subsequence of the sequence of image frames.
    Type: Grant
    Filed: December 23, 2021
    Date of Patent: August 1, 2023
    Assignee: Flawless Holdings Limited
    Inventors: Scott Mann, Pablo Garrido, Hyeongwoo Kim, Sean Danischevsky, Robert Hall, Gary Myles Scullion
  • Patent number: 11712623
    Abstract: Provided is an information processing program that is executed at a terminal device that executes effect rendering for outputting video and audio, the information processing program causing the execution of: a second obtaining unit (74) that obtains skip point information indicating skip points for the effect rendering and skip arrival point information indicating a skip arrival point for the effect rendering; and an effect-rendering control unit (77) that controls the effect rendering by skipping the video data to a predetermined point on the basis of an accepted skip operation to resume the output of the video from that point, and in the case where the timing of accepting the skip operation does not coincide with the skip point, by waiting until the skip point after that timing and then skipping to a specific skip arrival point associated with that skip point, on the basis of the skip operation, to resume the output of the audio from that specific skip arrival point.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: August 1, 2023
    Assignee: CYGAMES, INC.
    Inventors: Michihiro Sato, Takamichi Yashiki, Soichiro Tamura
  • Patent number: 11698954
    Abstract: A user device, such as a smartphone or laptop, may be password (passphrase) protected. The user device may combine biometric input analysis, such as facial recognition, with viseme analysis to authenticate a user attempting to use a password (passphrase) to access the user device. Secure authentication methods and systems are described that account for variations in how, based on the user's emotion (e.g., mood, temperament, unique pronunciation, etc. . . . ), a password (passphrase) may be presented to the user device.
    Type: Grant
    Filed: May 7, 2021
    Date of Patent: July 11, 2023
    Assignee: Comcast Cable Communications, LLC
    Inventor: Fei Wan
  • Patent number: 11699455
    Abstract: Systems and methods for viseme data generation are disclosed. Uncompressed audio data is generated and/or utilized to determine the beats per minute of the audio data. Visemes are associated with the audio data utilizing a Viterbi algorithm and the beats per minute. A time-stamped list of viseme data is generated that associates the visemes with the portions of the audio data that they correspond to. An animatronic toy and/or an animation is caused to lip sync using the viseme data while audio corresponding to the audio data is output.
    Type: Grant
    Filed: September 4, 2020
    Date of Patent: July 11, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Zoe Adams, Pete Klein, Derick Deller, Bradley Michael Richards, Anirudh Ranganath
  • Patent number: 11610354
    Abstract: The present invention relates to a joint automatic audio visual driven facial animation system that in some example embodiments includes a full scale state of the art Large Vocabulary Continuous Speech Recognition (LVCSR) with a strong language model for speech recognition and obtained phoneme alignment from the word lattice.
    Type: Grant
    Filed: June 16, 2021
    Date of Patent: March 21, 2023
    Assignee: Snap Inc.
    Inventors: Chen Cao, Xin Chen, Wei Chu, Zehao Xue
  • Patent number: 11610111
    Abstract: Apparatuses and methods for real-time spectrum-driven embedded wireless networking through deep learning are provided. Radio frequency, optical, or acoustic communication apparatus include a programmable logic system having a front-end configuration core, a learning core, and a learning actuation core. The learning core includes a deep learning neural network that receives and processes input in-phase/quadrature (I/Q) input samples through the neural network layers to extract RF, optical, or acoustic spectrum information. A processing system having a learning controller module controls operations of the learning core and the learning actuation core. The processing system and the programmable logic system are operable to configure one or more communication and networking parameters for transmission via the transceiver in response to extracted spectrum information.
    Type: Grant
    Filed: October 3, 2019
    Date of Patent: March 21, 2023
    Assignee: Northeastern University
    Inventors: Francesco Restuccia, Tommaso Melodia
  • Patent number: 11600290
    Abstract: Aspects of this disclosure provide techniques for generating a viseme and corresponding intensity pair. In some embodiments, the method includes generating, by a server, a viseme and corresponding intensity pair based at least on one of a clean vocal track or corresponding transcription. The method may include generating, by the server, a compressed audio file based at least on one of the viseme, the corresponding intensity, music, or visual offset. The method may further include generating, by the server or a client end application, a buffer of raw pulse-code modulated (PCM) data based on decoding at least a part of the compressed audio file, where the viseme is scheduled to align with a corresponding phoneme.
    Type: Grant
    Filed: September 9, 2020
    Date of Patent: March 7, 2023
    Assignee: LEXIA LEARNING SYSTEMS LLC
    Inventor: Carl Adrian Woffenden
  • Patent number: 11587278
    Abstract: Embodiments described herein provide an approach of animating a character face of an artificial character based on facial poses performed by a live actor. Geometric characteristics of the facial surface corresponding to each facial pose performed the live actor may be learnt by a machine learning system, which in turn build a mesh of a facial rig of an array of controllable elements applicable on a character face of an artificial character.
    Type: Grant
    Filed: August 16, 2021
    Date of Patent: February 21, 2023
    Assignee: UNITY TECHNOLOGIES SF
    Inventors: Wan-duo Kurt Ma, Muhammad Ghifary
  • Patent number: 11553215
    Abstract: Techniques are described for providing alternative media content to a client device along with primary media content.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: January 10, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Alexandria Way-Wun Kravis, Joshua Danovitz, Brandon Scott Love, Felicia Yue, Jeromey Russell Goetz, Lars Christian Ulness
  • Patent number: 11551394
    Abstract: Conventional state-of-the-art methods are limited in their ability to generate realistic animation from audio on any unknown faces and cannot be easily generalized to different facial characteristics and voice accents. Further, these methods fail to produce realistic facial animation for subjects which are quite different than that of distribution of facial characteristics network has seen during training. Embodiments of the present disclosure provide systems and methods that generate audio-speech driven animated talking face using a cascaded generative adversarial network (CGAN), wherein a first GAN is used to transfer lip motion from canonical face to person-specific face. A second GAN based texture generator network is conditioned on person-specific landmark to generate high-fidelity face corresponding to the motion. Texture generator GAN is made more flexible using meta learning to adapt to unknown subject's traits and orientation of face during inference.
    Type: Grant
    Filed: March 11, 2021
    Date of Patent: January 10, 2023
    Assignee: TATA CONSULTANCY SERVICES LIMITED
    Inventors: Sandika Biswas, Dipanjan Das, Sanjana Sinha, Brojeshwar Bhowmick
  • Patent number: 11423907
    Abstract: The application provides a virtual object image display method and apparatus, an electronic device and a storage medium, relates to the field of artificial intelligence, in particular to the field of computer vision and deep learning, and may be applied to virtual object dialogue scenarios. The specific implementation scheme includes: segmenting acquired voice to obtain voice segments; predicting lip shape sequence information for the voice segments; searching for a corresponding lip shape image sequence based on the lip shape sequence information; performing lip fusion between the lip shape image sequence and a virtual object baseplate to obtain a virtual object image; displaying the virtual object image. The application improves ability to obtain virtual object image.
    Type: Grant
    Filed: March 17, 2021
    Date of Patent: August 23, 2022
    Assignee: Beijing Baidu Netcom Science Technology Co., Ltd.
    Inventors: Tianshu Hu, Mingming Ma, Tonghui Li, Zhibin Hong
  • Patent number: 11417041
    Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for generating an animation of a talking head from an input audio signal of speech and a representation (such as a static image) of a head to animate. Generally, a neural network can learn to predict a set of 3D facial landmarks that can be used to drive the animation. In some embodiments, the neural network can learn to detect different speaking styles in the input speech and account for the different speaking styles when predicting the 3D facial landmarks. Generally, template 3D facial landmarks can be identified or extracted from the input image or other representation of the head, and the template 3D facial landmarks can be used with successive windows of audio from the input speech to predict 3D facial landmarks and generate a corresponding animation with plausible 3D effects.
    Type: Grant
    Filed: February 12, 2020
    Date of Patent: August 16, 2022
    Assignee: Adobe Inc.
    Inventors: Dingzeyu Li, Yang Zhou, Jose Ignacio Echevarria Vallespi, Elya Shechtman
  • Patent number: 11406896
    Abstract: In one embodiment, a technique includes receiving a first story fragment from a storyteller interface of a first computing system. The technique further includes displaying the first story fragment on an audience interface of a second computing system. The technique also includes detecting a trigger on the audience interface of the second computing system. In the technique, the trigger corresponds to the first story fragment. The technique also includes identifying a special effect associated with the trigger. The technique further includes outputting the special effect and the first story fragment on at least one of the storyteller interface displayed on the first computing system and the audience interface displayed on the second computing system.
    Type: Grant
    Filed: October 5, 2018
    Date of Patent: August 9, 2022
    Assignee: Meta Platforms, Inc.
    Inventors: Vincent Charles Cheung, Baback Elmieh, Connie Yeewei Ho, Girish Patangay, Mark Alexander Walsh
  • Patent number: 8965011
    Abstract: A method of attenuating an input signal to obtain an output signal is described. The method comprises receiving the input signal, attenuating the input signal with a gain factor to obtain the output signal, applying a filter having a frequency response with a frequency-dependent filter gain to at least one of a copy of the input signal and a copy of the output signal to obtain a filtered signal, the frequency-dependent filter gain being arranged to emphasize frequencies within a number N of predetermined frequency ranges, N>1; wherein the filter comprises a sequence of N sub-filters, each one of the N sub-filters having a frequency response adapted to emphasize frequencies within a corresponding one of the N predetermined frequency ranges; determining a signal strength of the filtered signal, and determining the gain factor from at least the signal strength.
    Type: Grant
    Filed: December 20, 2011
    Date of Patent: February 24, 2015
    Assignee: Dialog Semiconductor B.V.
    Inventor: Michiel Andre Helsloot
  • Patent number: 8818131
    Abstract: Three dimensional models corresponding to a target image and a reference image are selected based on a set of feature points defining facial features in the target image and the reference image. The set of feature points defining the facial features in the target image and the reference image are associated with corresponding 3-dimensional models. A 3D motion flow between the 3-dimensional models is computed. The 3D motion flow is projected onto a 2D image plane to create a 2D optical field flow. The target image and the reference image are warped using the 2D optical field flow. A selected feature from the reference image is copied to the target image.
    Type: Grant
    Filed: November 24, 2010
    Date of Patent: August 26, 2014
    Assignee: Adobe Systems Incorporated
    Inventors: Jue Wang, Elya Shechtman, Lubomir D. Bourdev, Fei Yang
  • Patent number: 8438035
    Abstract: When there are missing voice-transmission-signals, a repetition-section calculating unit sets a plurality of repetition sections of different lengths that are determined to be similar to the voice-transmission-signals preceding the missing voice-transmission-signal, the repetition sections being determined with respect to stationary voice-transmission-signals stored in a normal signal storage unit, the stationary voice-transmission-signals being selected from the previously input voice-transmission-signals. A controller generates a concealment signal using the repetition sections.
    Type: Grant
    Filed: December 31, 2007
    Date of Patent: May 7, 2013
    Assignee: Fujitsu Limited
    Inventors: Kaori Endo, Yasuji Ota, Chikako Matsumoto
  • Publication number: 20120284029
    Abstract: Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which a synthesized image sequence will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with lip movements synchronized with the desired speech.
    Type: Application
    Filed: May 2, 2011
    Publication date: November 8, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Lijuan Wang, Frank Soong
  • Publication number: 20120203555
    Abstract: An electronic device configured for encoding a watermarked signal is described. The electronic device includes modeler circuitry. The modeler circuitry determines parameters based on a first signal and a first-pass coded signal. The electronic device also includes coder circuitry coupled to the modeler circuitry. The coder circuitry performs a first-pass coding on a second signal to obtain the first-pass coded signal and performs a second-pass coding based on the parameters to obtain a watermarked signal.
    Type: Application
    Filed: October 18, 2011
    Publication date: August 9, 2012
    Applicant: QUALCOMM Incorporated
    Inventors: Stephane Pierre Villette, Daniel J. Sinder
  • Publication number: 20120203556
    Abstract: A method for decoding a signal on an electronic device is described. The method includes receiving a signal. The method also includes extracting a bitstream from the signal. The method further includes performing watermark error checking on the bitstream for multiple frames. The method additionally includes determining whether watermark data is detected based on the watermark error checking. The method also includes decoding the bitstream to obtain a decoded second signal if the watermark data is not detected.
    Type: Application
    Filed: October 18, 2011
    Publication date: August 9, 2012
    Applicant: QUALCOMM Incorporated
    Inventors: Stephane Pierre Villette, Daniel J. Sinder
  • Publication number: 20120130720
    Abstract: An information providing device takes an image of a predetermined area and obtains the taken image in the form of image data, while externally obtaining voice data representing speech. The information providing device obtains text in a preset language corresponding to the speech in the form of text data, based on the obtained voice data, generates a composite image including the taken image and the text in the form of composite image data, based on the image data and the text data, and outputs the composite image data.
    Type: Application
    Filed: November 14, 2011
    Publication date: May 24, 2012
    Applicant: ELMO COMPANY LIMITED
    Inventor: Yasushi Suda
  • Publication number: 20100082345
    Abstract: An “Animation Synthesizer” uses trainable probabilistic models, such as Hidden Markov Models (HMM), Artificial Neural Networks (ANN), etc., to provide speech and text driven body animation synthesis. Probabilistic models are trained using synchronized motion and speech inputs (e.g., live or recorded audio/video feeds) at various speech levels, such as sentences, phrases, words, phonemes, sub-phonemes, etc., depending upon the available data, and the motion type or body part being modeled. The Animation Synthesizer then uses the trainable probabilistic model for selecting animation trajectories for one or more different body parts (e.g., face, head, hands, arms, etc.) based on an arbitrary text and/or speech input. These animation trajectories are then used to synthesize a sequence of animations for digital avatars, cartoon characters, computer generated anthropomorphic persons or creatures, actual motions for physical robots, etc.
    Type: Application
    Filed: September 26, 2008
    Publication date: April 1, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Lijuan Wang, Lei Ma, Frank Kao-Ping Soong