Synchronization Of Speech With Image Or Synthesis Of The Lips Movement From Speech, E.g., For "talking Heads," Etc.(epo) Patents (Class 704/E21.02)
-
Patent number: 12243552Abstract: Eyewear having a speech to moving lips algorithm that receives and translates speech and utterances of a person viewed through the eyewear, and then displays an overlay of moving lips corresponding to the speech and utterances on a mask of the viewed person. A database having text to moving lips information is utilized to translate the speech and generate the moving lips in near-real time with little latency. This translation provides the deaf/hearing impaired users the ability to understand and communicate with the person viewed through the eyewear when they are wearing a mask. The translation may include automatic speech recognition (ASR) and natural language understanding (NLU) as a sound recognition engine.Type: GrantFiled: April 2, 2024Date of Patent: March 4, 2025Assignee: Snap Inc.Inventor: Kathleen Worthington McMahon
-
Patent number: 12230288Abstract: Systems and methods for audio processing are described. An audio processing system receives audio content that includes a voice sample. The audio processing system analyzes the voice sample to identify a sound type in the voice sample. The sound type corresponds to pronunciation of at least one specified character in the voice sample. The audio processing system generates a filtered voice sample at least in part by filtering the voice sample to modify the sound type. The audio processing system outputs the filtered voice sample.Type: GrantFiled: May 31, 2022Date of Patent: February 18, 2025Assignees: SONY INTERACTIVE ENTERTAINMENT LLC, SONY INTERACTIVE ENTERTAINMENT INC.Inventors: Jin Zhang, Celeste Bean, Sepideh Karimi, Sudha Krishnamurthy
-
Patent number: 12223936Abstract: A method for synthesizing a video includes: acquiring audio data and dotting data corresponding to the audio data, the dotting data including a beat time point and a beat value corresponding to the beat time point of the audio data; acquiring a plurality of material images from a local source; and synthesizing, based on the dotting data, the plurality of material images and the audio data to acquire a synthesized video, a switching time point of each of the material images in the synthesized video being the beat time point of the audio data.Type: GrantFiled: November 22, 2019Date of Patent: February 11, 2025Assignee: Guangzhou Kugou Computer Technology Co., Ltd.Inventors: Han Wu, Wentao Li
-
Patent number: 12154548Abstract: A processor-implemented method for generating a lip-sync for a face to a target speech of a live session to a speech in one or more languages in-sync with improved visual quality using a machine learning model and a pre-trained lip-sync model is provided. The method includes (i) determining a visual representation of the face and an audio representation, the visual representation includes crops of the face; (ii) modifying the crops of the face to obtain masked crops; (iii) obtaining a reference frame from the visual representation at a second timestamp; (iv) combining the masked crops at the first timestamp with the reference to obtain lower half crops; (v) training the machine learning model by providing historical lower half crops and historical audio representations as training data; (vi) generating lip-synced frames for the face to the target speech, and (vii) generating an in-sync lip-synced frames by the pre-trained lip-sync model.Type: GrantFiled: January 1, 2022Date of Patent: November 26, 2024Assignee: INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY, HYDERABADInventors: C. V. Jawahar, Rudrabha Mukhopadhyay, K R Prajwal, Vinay Namboodiri
-
Patent number: 12112417Abstract: This application discloses an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After target text information is obtained, an acoustic feature and a target expression parameter corresponding to the target text information are determined according to the target text information, the foregoing acquired media data, and the first expression base. A second animation character having a second expression base may be driven according to the acoustic feature and the target expression parameter, so that the second animation character may simulate the speaker's sound and facial expression when saying the target text information, thereby improving experience of interaction between the user and the animation character.Type: GrantFiled: December 13, 2022Date of Patent: October 8, 2024Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITEDInventors: Linchao Bao, Shiyin Kang, Sheng Wang, Xiangkai Lin, Xing Ji, Zhantu Zhu, Kuongchi Lei, Deyi Tuo, Peng Liu
-
Patent number: 12105928Abstract: Technologies for selectively augmenting communications transmitted by a communication device include a communication device configured to acquire new user environment information relating to the environment of the user if such new user environment information becomes available. The communication device is further configured to create one or more user environment indicators based on the new user environment information, to display the one or more created user environment indicators via a display of the communication device and include the created user environment indicator in a communication to be transmitted by the communication device if the created user environment indicator is selected for inclusion in the communication.Type: GrantFiled: February 10, 2022Date of Patent: October 1, 2024Assignee: Tahoe Research, Ltd.Inventors: Glen J. Anderson, Jose K. Sia, Jr., Wendy March
-
Patent number: 12094047Abstract: An animated emoticon generation method, a computer-readable storage medium, and a computer device are provided. The method includes: displaying an emoticon input panel on a chat page; detecting whether a video shooting event is triggered in the emoticon input panel; acquiring video data in response to detecting the video shooting event; obtaining an edit operation for the video data; processing video frames in the video data according to the edit operation to synthesize an animated emoticon; and adding an emoticon thumbnail corresponding to the animated emoticon to the emoticon input panel, the emoticon thumbnail displaying the animated emoticon to be used as a message on the chat page based on a user selecting the emoticon thumbnail in the emoticon input panel.Type: GrantFiled: March 23, 2023Date of Patent: September 17, 2024Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LTDInventors: Dong Huang, Tian Yi Liang, Jia Wen Zhong, Jun Jie Zhou, Jin Jiang, Ying Qi, Si Kun Liang
-
Patent number: 12045639Abstract: Embodiments of the present disclosure may include a system providing visual assistants with artificial intelligence, including an artificial intelligence large language model engine (LLM)coupled to a computer system.Type: GrantFiled: August 23, 2023Date of Patent: July 23, 2024Assignee: BitHuman IncInventors: Yun Fu, Steve Gu
-
Patent number: 12039997Abstract: Aspects of this disclosure provide techniques for generating a viseme and corresponding intensity pair. In some embodiments, the method includes generating, by a server, a viseme and corresponding intensity pair based at least on one of a clean vocal track or corresponding transcription. The method may include generating, by the server, a compressed audio file based at least on one of the viseme, the corresponding intensity, music, or visual offset. The method may further include generating, by the server or a client end application, a buffer of raw pulse-code modulated (PCM) data based on decoding at least a part of the compressed audio file, where the viseme is scheduled to align with a corresponding phoneme.Type: GrantFiled: March 7, 2023Date of Patent: July 16, 2024Assignee: LEXIA LEARNING SYSTEMS LLCInventor: Carl Adrian Woffenden
-
Patent number: 12033258Abstract: A conversation augmentation system can automatically augment a conversation with content items based on natural language from the conversation. The conversation augmentation system can select content items to add to the conversation based on determined user “intents” generated using machine learning models. The conversation augmentation system can generate intents for natural language from various sources, such as video chats, audio conversations, textual conversations, virtual reality environments, etc. The conversation augmentation system can identify constraints for mapping the intents to content items or context signals for selecting appropriate content items. In various implementations, the conversation augmentation system can add selected content items to a storyline the conversation describes or can augment a platform in which an unstructured conversation is occurring.Type: GrantFiled: June 5, 2020Date of Patent: July 9, 2024Assignee: Meta Platforms Technologies, LLCInventors: Maheen Sohail, Hyunbin Park, Ruoni Wang, Vincent Charles Cheung
-
Patent number: 12019067Abstract: A computer-implemented method of using interferometry to detect mass changes of objects in a solution includes obtaining a time series of images using interferometry, and performing background correction on each image by classifying pixels of the image as background pixels or object pixels, fitting only the background pixels of the image to a function to generate a background fitted function, and subtracting the background fitted function from the image to generate a background corrected image. The method includes performing segmentation on the background corrected images to resolve boundaries of one or more objects, performing motion tracking on the objects to track changes in position of the objects, determining respective masses of the motion tracked objects and determining, for each image in the time series, an aggregate mass based on the respective masses to determine whether the aggregate mass of the motion tracked objects is increasing or decreasing.Type: GrantFiled: May 20, 2021Date of Patent: June 25, 2024Assignee: NantBio, Inc.Inventors: Kayvan Niazi, Krsto Sbutega
-
Patent number: 12014453Abstract: A method for animating a graphical object by an electronic device is provided. The method includes receiving, by the electronic device, the graphical object having at least one predefined portion to animate. The method includes receiving, by the electronic device, an audio to obtain spectral frequencies of the audio. The method includes determining, by the electronic device, at least one of an intensity of the spectral frequencies and at least one range of the spectral frequencies. The method includes generating, by the electronic device, at least one motion on the at least one predefined portion of the graphical object based on the at least one of the intensity of the spectral frequencies and the at least one range of the spectral frequencies.Type: GrantFiled: March 30, 2022Date of Patent: June 18, 2024Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Ramasamy Kannan, Vishakha S. R., Sagar Aggarwal, Lokesh Rayasandra Boregowda
-
Patent number: 12002487Abstract: Provided are an information processing apparatus, an information processing method, and a program that make it possible to assign more natural motions reflecting the emotions of a character. [Solving Means] The information processing apparatus includes a control unit configured to perform the processing of: determining an emotion on the basis of a result of utterance sentence analysis performed on an utterance sentence of a character included in a scenario; selecting, depending on a content of the utterance sentence and the emotion determined, a motion of the character that is synchronized with the utterance sentence; adjusting a character movement speed based on the selected motion and an intimacy between the character and a user; and adding, to the scenario, a description for adjusting presentation of the selected motion to match a voice output timing of the utterance sentence.Type: GrantFiled: February 22, 2019Date of Patent: June 4, 2024Assignee: SONY GROUP CORPORATIONInventor: Hideo Nagasaka
-
Patent number: 12002138Abstract: Embodiments of this application disclose a speech-driven animation method and apparatus based on artificial intelligence (AI). The method includes obtaining a first speech, the first speech comprising a plurality of speech frames; determining linguistics information corresponding to a speech frame in the first speech, the linguistics information being used for identifying a distribution possibility that the speech frame in the first speech pertains to phonemes; determining an expression parameter corresponding to the speech frame in the first speech according to the linguistics information; and enabling, according to the expression parameter, an animation character to make an expression corresponding to the first speech.Type: GrantFiled: October 8, 2021Date of Patent: June 4, 2024Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITEDInventors: Shiyin Kang, Deyi Tuo, Kuongchi Lei, Tianxiao Fu, Huirong Huang, Dan Su
-
Patent number: 11972516Abstract: A device for generating a speech video according to an embodiment has one or more processor and a memory storing one or more programs executable by the one or more processors, and the device includes a video part generator configured to receive a person background image of a person and generate a video part of a speech video of the person; and an audio part generator configured to receive text, generate an audio part of the speech video of the person, and provide speech-related information occurring during the generation of the audio part to the video part generator.Type: GrantFiled: June 19, 2020Date of Patent: April 30, 2024Assignee: DEEPBRAIN AI INC.Inventors: Gyeongsu Chae, Guembuel Hwang, Sungwoo Park, Seyoung Jang
-
Patent number: 11967336Abstract: A computing device according to an embodiment is a computing device that is provided with one or more processors and a memory storing one or more programs executed by the one or more processors, the computing device includes a standby state video generating module that generates a standby state video in which a person in a video is in a standby state, a speech state video generating module that generates a speech state video in which a person in a video is in a speech state based on a source of speech content, and a video reproducing module that reproduces the standby state video, and generates a synthesized speech video by synthesizing the standby state video being reproduced and the speech state video.Type: GrantFiled: December 22, 2020Date of Patent: April 23, 2024Assignee: DEEPBRAIN AI INC.Inventor: Doohyun Kim
-
Patent number: 11955135Abstract: Eyewear having a speech to moving lips algorithm that receives and translates speech and utterances of a person viewed through the eyewear, and then displays an overlay of moving lips corresponding to the speech and utterances on a mask of the viewed person. A database having text to moving lips information is utilized to translate the speech and generate the moving lips in near-real time with little latency. This translation provides the deaf/hearing impaired users the ability to understand and communicate with the person viewed through the eyewear when they are wearing a mask. The translation may include automatic speech recognition (ASR) and natural language understanding (NLU) as a sound recognition engine.Type: GrantFiled: August 23, 2021Date of Patent: April 9, 2024Assignee: Snap Inc.Inventor: Kathleen Worthington McMahon
-
Patent number: 11934636Abstract: Disclosed are systems, methods, and computer-readable storage media to provide voice driven dynamic menus. One aspect disclosed is a method including receiving, by an electronic device, video data and audio data, displaying, by the electronic device, a video window, determining, by the electronic device, whether the audio data includes a voice signal, displaying, by the electronic device, a first menu in the video window in response to the audio data including a voice signal, displaying, by the electronic device, a second menu in the video window in response to a voice signal being absent from the audio data, receiving, by the electronic device, input from the displayed menu, and writing, by the electronic device, to an output device based on the received input.Type: GrantFiled: March 27, 2023Date of Patent: March 19, 2024Assignee: SNAP INC.Inventor: Jesse Chand
-
Patent number: 11934461Abstract: A method uses natural language for visual analysis of a dataset. A data visualization application displays a data visualization, at a computer, based on a dataset retrieved from a database using a set of one or more queries. A user specifies a natural language command related to the displayed data visualization, and the computer extracts an analytic phrase from the natural language command. The computer computes semantic relatedness between the analytic phrase and numeric data fields in the dataset. The computer identifies numeric data fields having highest semantic relatedness to the analytic phrase, and also selects a relevant numerical function. The numerical function compares data values in the numeric data fields to a threshold value. The computer retrieves an updated dataset that filters the identified numeric data fields according to the numeric function. The computer then displays an updated data visualization using the updated dataset.Type: GrantFiled: June 8, 2021Date of Patent: March 19, 2024Assignee: Tableau Software, Inc.Inventors: Vidya R. Setlur, Sarah E. Battersby, Melanie K. Tory, Richard C. Gossweiler, III, Angel Xuan Chang, Isaac James Dykeman, Md Enamul Hoque Prince
-
Patent number: 11922543Abstract: A method, performed by a coloring apparatus, of coloring a sketch image includes adding a color pointer on the sketch image, according to an input of a user; determining an object related to a point where the color pointer is located, from among objects configuring the sketch image; and generating a colored image by coloring the determined object, based on a color of the color pointer.Type: GrantFiled: January 18, 2022Date of Patent: March 5, 2024Assignee: NAVER WEBTOON LTD.Inventors: Jun Hyun Park, Yu Ra Shin, Du Yong Lee, Joo Young Moon
-
Patent number: 11908478Abstract: A method for generating speech includes uploading a reference set of features that were extracted from sensed movements of one or more target regions of skin on faces of one or more reference human subjects in response to words articulated by the subjects and without contacting the one or more target regions. A test set of features is extracted a from the sensed movements of at least one of the target regions of skin on a face of a test subject in response to words articulated silently by the test subject and without contacting the one or more target regions. The extracted test set of features is compared to the reference set of features, and, based on the comparison, a speech output is generated, that includes the articulated words of the test subject.Type: GrantFiled: March 7, 2023Date of Patent: February 20, 2024Assignee: Q (Cue) Ltd.Inventors: Aviad Maizels, Avi Barliya, Yonatan Wexler
-
Patent number: 11860925Abstract: In some examples, human centered computing based digital persona generation may include generating, for a digital persona that is to be generated for a target person, synthetic video files and synthetic audio files that are combined to generate synthetic media files. The digital persona may be generated based on a synthetic media file. An inquiry may be received from a user of the generated digital persona. Another synthetic media file may be used by the digital persona to respond to the inquiry. A real-time emotion of the user may be analyzed based on a text sentiment associated with the inquiry, and a voice sentiment and a facial expression associated with the user. Based on the real-time emotion of the user, a further synthetic media file may be utilized by the digital persona to continue or modify a conversation between the generated digital persona and the user.Type: GrantFiled: April 5, 2021Date of Patent: January 2, 2024Assignee: ACCENTURE GLOBAL SOLUTIONS LIMITEDInventors: Nisha Ramachandra, Manish Ahuja, Raghotham M Rao, Neville Dubash, Sanjay Podder, Rekha M. Menon
-
Patent number: 11830120Abstract: A computing device according to an embodiment includes one or more processors, a memory storing one or more programs executed by the one or more processors, a standby state image generating module configured to generate a standby state image in which a person is in a standby state, and generate a back-motion image set including a plurality of back-motion images at a preset frame interval from the standby state image for image interpolation between a preset reference frame of the standby state image, a speech state image generating module configured to generate a speech state image in which a person is in a speech state based on a source of speech content, and an image playback module configured to generate a synthetic speech image by combining the standby state image and the speech state image while playing the standby state image.Type: GrantFiled: July 9, 2021Date of Patent: November 28, 2023Assignee: DEEPBRAIN AI INC.Inventor: Doo Hyun Kim
-
Patent number: 11830119Abstract: Various implementations disclosed herein include devices, systems, and methods for modifying an environment based on sound. In some implementations, a device includes one or more processors, a display and a non-transitory memory. In some implementations, a method includes displaying a computer graphics environment that includes an object. In some implementations, the method includes detecting, via an audio sensor, a sound from a physical environment of the device. In some implementations, the sound is associated with one or more audio characteristics. In some implementations, the method includes modifying a visual property of the object based on the one or more audio characteristics of the sound.Type: GrantFiled: April 28, 2021Date of Patent: November 28, 2023Assignee: APPLE INC.Inventors: Ronald Vernon Ong Siy, Scott Rick Jones, John Brady Parell, Christopher Harlan Paul
-
Patent number: 11817127Abstract: The present disclosure provides a video dubbing method, an apparatus, a device, and a storage medium. The method includes: when receiving an audio recording start trigger operation for a first time point of a target video and starting from a video picture corresponding to the first time point, playing the target video based on a timeline and receiving audio data based on the timeline; and when receiving an audio recording end trigger operation for a second time point, generating an audio recording file. The audio recording file has a linkage relationship with a timeline of a video clip taking the video picture corresponding to the first time point as a starting frame and taking a video picture corresponding to the second time point as an ending frame.Type: GrantFiled: August 10, 2022Date of Patent: November 14, 2023Assignee: BEIJING BYTEDANCE NETWORK TECHNOLOGY CO., LTD.Inventors: Yan Zeng, Chen Zhao, Qifan Zheng, Pingfei Fu
-
Patent number: 11783524Abstract: A method for providing visual sequences using one or more images comprising: receiving one or more person images of showing at least one face, receiving a message to be enacted by the person, wherein the message comprises at least a text or a emotional and movement command, processing the message to extract or receive an audio data related to voice of the person, and a facial movement data related to expression to be carried on face of the person, processing the image/s, the audio data, and the facial movement data, and generating an animation of the person enacting the message. Wherein emotional and movement command is a GUI or multimedia based instruction to invoke the generation of facial expression/s and or body part/s movement.Type: GrantFiled: February 10, 2017Date of Patent: October 10, 2023Inventor: Nitin Vats
-
Patent number: 11776577Abstract: A 3D camera tracking and live compositing system includes software and hardware integration and allows users to create, in conjunction with existing programs, live composite video. A video camera, a tracking sensor, encoder, a composite monitor, and a software engine and plugin receive video and data from and integrate it with existing programs to generate real time composite video. The composite feed can be viewed and manipulated by users while filming. Features include 3D masking, depth layering, teleporting, axis locking, motion scaling, and freeze tracking. A storyboarding archive can be used to quickly load scenes with the location, lighting setups, lens profiles and other settings associated with a saved a photo. The video camera's movements can be recorded with video to be later applied to other 3D digital assets in post-production. The system also allows users to load scenes based on a 3D data set created with LIDAR.Type: GrantFiled: September 22, 2020Date of Patent: October 3, 2023Assignee: Mean Cat Entertainment LLCInventors: Donnie Ocean, Michael Batty, John Hoehl
-
Patent number: 11776188Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for generating an animation of a talking head from an input audio signal of speech and a representation (such as a static image) of a head to animate. Generally, a neural network can learn to predict a set of 3D facial landmarks that can be used to drive the animation. In some embodiments, the neural network can learn to detect different speaking styles in the input speech and account for the different speaking styles when predicting the 3D facial landmarks. Generally, template 3D facial landmarks can be identified or extracted from the input image or other representation of the head, and the template 3D facial landmarks can be used with successive windows of audio from the input speech to predict 3D facial landmarks and generate a corresponding animation with plausible 3D effects.Type: GrantFiled: August 15, 2022Date of Patent: October 3, 2023Assignee: ADOBE INC.Inventors: Dingzeyu Li, Yang Zhou, Jose Ignacio Echevarria Vallespi, Elya Shechtman
-
Patent number: 11763797Abstract: A speech model includes a sub-model corresponding to a vocal attribute. The speech model generates an output waveform using a sample model, which receives text data, and a conditioning model, which receives text metadata and produces a prosody output for use by the sample model. If, during training or runtime, a different vocal attribute is desired or needed, the sub-model is re-trained or switched to a different sub-model corresponding to the different vocal attribute.Type: GrantFiled: June 23, 2020Date of Patent: September 19, 2023Assignee: Amazon Technologies, Inc.Inventors: Roberto Barra Chicote, Adam Franciszek Nadolski, Thomas Edward Merritt, Bartosz Putrycz, Andrew Paul Breen
-
Patent number: 11715495Abstract: A computer-implemented method of processing video data comprising a sequence of image frames. The method includes isolating an instance of an object within the sequence of image frames, generating a modified instance of the object using a machine learning model, and modifying the video data to smoothly transition between at least part of the isolated instance of the object and a corresponding at least part of the modified instance of the object over a subsequence of the sequence of image frames.Type: GrantFiled: December 23, 2021Date of Patent: August 1, 2023Assignee: Flawless Holdings LimitedInventors: Scott Mann, Pablo Garrido, Hyeongwoo Kim, Sean Danischevsky, Robert Hall, Gary Myles Scullion
-
Patent number: 11712623Abstract: Provided is an information processing program that is executed at a terminal device that executes effect rendering for outputting video and audio, the information processing program causing the execution of: a second obtaining unit (74) that obtains skip point information indicating skip points for the effect rendering and skip arrival point information indicating a skip arrival point for the effect rendering; and an effect-rendering control unit (77) that controls the effect rendering by skipping the video data to a predetermined point on the basis of an accepted skip operation to resume the output of the video from that point, and in the case where the timing of accepting the skip operation does not coincide with the skip point, by waiting until the skip point after that timing and then skipping to a specific skip arrival point associated with that skip point, on the basis of the skip operation, to resume the output of the audio from that specific skip arrival point.Type: GrantFiled: September 25, 2019Date of Patent: August 1, 2023Assignee: CYGAMES, INC.Inventors: Michihiro Sato, Takamichi Yashiki, Soichiro Tamura
-
Patent number: 11698954Abstract: A user device, such as a smartphone or laptop, may be password (passphrase) protected. The user device may combine biometric input analysis, such as facial recognition, with viseme analysis to authenticate a user attempting to use a password (passphrase) to access the user device. Secure authentication methods and systems are described that account for variations in how, based on the user's emotion (e.g., mood, temperament, unique pronunciation, etc. . . . ), a password (passphrase) may be presented to the user device.Type: GrantFiled: May 7, 2021Date of Patent: July 11, 2023Assignee: Comcast Cable Communications, LLCInventor: Fei Wan
-
Patent number: 11699455Abstract: Systems and methods for viseme data generation are disclosed. Uncompressed audio data is generated and/or utilized to determine the beats per minute of the audio data. Visemes are associated with the audio data utilizing a Viterbi algorithm and the beats per minute. A time-stamped list of viseme data is generated that associates the visemes with the portions of the audio data that they correspond to. An animatronic toy and/or an animation is caused to lip sync using the viseme data while audio corresponding to the audio data is output.Type: GrantFiled: September 4, 2020Date of Patent: July 11, 2023Assignee: Amazon Technologies, Inc.Inventors: Zoe Adams, Pete Klein, Derick Deller, Bradley Michael Richards, Anirudh Ranganath
-
Patent number: 11610354Abstract: The present invention relates to a joint automatic audio visual driven facial animation system that in some example embodiments includes a full scale state of the art Large Vocabulary Continuous Speech Recognition (LVCSR) with a strong language model for speech recognition and obtained phoneme alignment from the word lattice.Type: GrantFiled: June 16, 2021Date of Patent: March 21, 2023Assignee: Snap Inc.Inventors: Chen Cao, Xin Chen, Wei Chu, Zehao Xue
-
Patent number: 11610111Abstract: Apparatuses and methods for real-time spectrum-driven embedded wireless networking through deep learning are provided. Radio frequency, optical, or acoustic communication apparatus include a programmable logic system having a front-end configuration core, a learning core, and a learning actuation core. The learning core includes a deep learning neural network that receives and processes input in-phase/quadrature (I/Q) input samples through the neural network layers to extract RF, optical, or acoustic spectrum information. A processing system having a learning controller module controls operations of the learning core and the learning actuation core. The processing system and the programmable logic system are operable to configure one or more communication and networking parameters for transmission via the transceiver in response to extracted spectrum information.Type: GrantFiled: October 3, 2019Date of Patent: March 21, 2023Assignee: Northeastern UniversityInventors: Francesco Restuccia, Tommaso Melodia
-
Patent number: 11600290Abstract: Aspects of this disclosure provide techniques for generating a viseme and corresponding intensity pair. In some embodiments, the method includes generating, by a server, a viseme and corresponding intensity pair based at least on one of a clean vocal track or corresponding transcription. The method may include generating, by the server, a compressed audio file based at least on one of the viseme, the corresponding intensity, music, or visual offset. The method may further include generating, by the server or a client end application, a buffer of raw pulse-code modulated (PCM) data based on decoding at least a part of the compressed audio file, where the viseme is scheduled to align with a corresponding phoneme.Type: GrantFiled: September 9, 2020Date of Patent: March 7, 2023Assignee: LEXIA LEARNING SYSTEMS LLCInventor: Carl Adrian Woffenden
-
Patent number: 11587278Abstract: Embodiments described herein provide an approach of animating a character face of an artificial character based on facial poses performed by a live actor. Geometric characteristics of the facial surface corresponding to each facial pose performed the live actor may be learnt by a machine learning system, which in turn build a mesh of a facial rig of an array of controllable elements applicable on a character face of an artificial character.Type: GrantFiled: August 16, 2021Date of Patent: February 21, 2023Assignee: UNITY TECHNOLOGIES SFInventors: Wan-duo Kurt Ma, Muhammad Ghifary
-
Patent number: 11553215Abstract: Techniques are described for providing alternative media content to a client device along with primary media content.Type: GrantFiled: September 25, 2019Date of Patent: January 10, 2023Assignee: Amazon Technologies, Inc.Inventors: Alexandria Way-Wun Kravis, Joshua Danovitz, Brandon Scott Love, Felicia Yue, Jeromey Russell Goetz, Lars Christian Ulness
-
Audio-speech driven animated talking face generation using a cascaded generative adversarial network
Patent number: 11551394Abstract: Conventional state-of-the-art methods are limited in their ability to generate realistic animation from audio on any unknown faces and cannot be easily generalized to different facial characteristics and voice accents. Further, these methods fail to produce realistic facial animation for subjects which are quite different than that of distribution of facial characteristics network has seen during training. Embodiments of the present disclosure provide systems and methods that generate audio-speech driven animated talking face using a cascaded generative adversarial network (CGAN), wherein a first GAN is used to transfer lip motion from canonical face to person-specific face. A second GAN based texture generator network is conditioned on person-specific landmark to generate high-fidelity face corresponding to the motion. Texture generator GAN is made more flexible using meta learning to adapt to unknown subject's traits and orientation of face during inference.Type: GrantFiled: March 11, 2021Date of Patent: January 10, 2023Assignee: TATA CONSULTANCY SERVICES LIMITEDInventors: Sandika Biswas, Dipanjan Das, Sanjana Sinha, Brojeshwar Bhowmick -
Patent number: 11423907Abstract: The application provides a virtual object image display method and apparatus, an electronic device and a storage medium, relates to the field of artificial intelligence, in particular to the field of computer vision and deep learning, and may be applied to virtual object dialogue scenarios. The specific implementation scheme includes: segmenting acquired voice to obtain voice segments; predicting lip shape sequence information for the voice segments; searching for a corresponding lip shape image sequence based on the lip shape sequence information; performing lip fusion between the lip shape image sequence and a virtual object baseplate to obtain a virtual object image; displaying the virtual object image. The application improves ability to obtain virtual object image.Type: GrantFiled: March 17, 2021Date of Patent: August 23, 2022Assignee: Beijing Baidu Netcom Science Technology Co., Ltd.Inventors: Tianshu Hu, Mingming Ma, Tonghui Li, Zhibin Hong
-
Patent number: 11417041Abstract: Embodiments of the present invention provide systems, methods, and computer storage media for generating an animation of a talking head from an input audio signal of speech and a representation (such as a static image) of a head to animate. Generally, a neural network can learn to predict a set of 3D facial landmarks that can be used to drive the animation. In some embodiments, the neural network can learn to detect different speaking styles in the input speech and account for the different speaking styles when predicting the 3D facial landmarks. Generally, template 3D facial landmarks can be identified or extracted from the input image or other representation of the head, and the template 3D facial landmarks can be used with successive windows of audio from the input speech to predict 3D facial landmarks and generate a corresponding animation with plausible 3D effects.Type: GrantFiled: February 12, 2020Date of Patent: August 16, 2022Assignee: Adobe Inc.Inventors: Dingzeyu Li, Yang Zhou, Jose Ignacio Echevarria Vallespi, Elya Shechtman
-
Patent number: 11406896Abstract: In one embodiment, a technique includes receiving a first story fragment from a storyteller interface of a first computing system. The technique further includes displaying the first story fragment on an audience interface of a second computing system. The technique also includes detecting a trigger on the audience interface of the second computing system. In the technique, the trigger corresponds to the first story fragment. The technique also includes identifying a special effect associated with the trigger. The technique further includes outputting the special effect and the first story fragment on at least one of the storyteller interface displayed on the first computing system and the audience interface displayed on the second computing system.Type: GrantFiled: October 5, 2018Date of Patent: August 9, 2022Assignee: Meta Platforms, Inc.Inventors: Vincent Charles Cheung, Baback Elmieh, Connie Yeewei Ho, Girish Patangay, Mark Alexander Walsh
-
Patent number: 8965011Abstract: A method of attenuating an input signal to obtain an output signal is described. The method comprises receiving the input signal, attenuating the input signal with a gain factor to obtain the output signal, applying a filter having a frequency response with a frequency-dependent filter gain to at least one of a copy of the input signal and a copy of the output signal to obtain a filtered signal, the frequency-dependent filter gain being arranged to emphasize frequencies within a number N of predetermined frequency ranges, N>1; wherein the filter comprises a sequence of N sub-filters, each one of the N sub-filters having a frequency response adapted to emphasize frequencies within a corresponding one of the N predetermined frequency ranges; determining a signal strength of the filtered signal, and determining the gain factor from at least the signal strength.Type: GrantFiled: December 20, 2011Date of Patent: February 24, 2015Assignee: Dialog Semiconductor B.V.Inventor: Michiel Andre Helsloot
-
Patent number: 8818131Abstract: Three dimensional models corresponding to a target image and a reference image are selected based on a set of feature points defining facial features in the target image and the reference image. The set of feature points defining the facial features in the target image and the reference image are associated with corresponding 3-dimensional models. A 3D motion flow between the 3-dimensional models is computed. The 3D motion flow is projected onto a 2D image plane to create a 2D optical field flow. The target image and the reference image are warped using the 2D optical field flow. A selected feature from the reference image is copied to the target image.Type: GrantFiled: November 24, 2010Date of Patent: August 26, 2014Assignee: Adobe Systems IncorporatedInventors: Jue Wang, Elya Shechtman, Lubomir D. Bourdev, Fei Yang
-
Patent number: 8438035Abstract: When there are missing voice-transmission-signals, a repetition-section calculating unit sets a plurality of repetition sections of different lengths that are determined to be similar to the voice-transmission-signals preceding the missing voice-transmission-signal, the repetition sections being determined with respect to stationary voice-transmission-signals stored in a normal signal storage unit, the stationary voice-transmission-signals being selected from the previously input voice-transmission-signals. A controller generates a concealment signal using the repetition sections.Type: GrantFiled: December 31, 2007Date of Patent: May 7, 2013Assignee: Fujitsu LimitedInventors: Kaori Endo, Yasuji Ota, Chikako Matsumoto
-
Publication number: 20120284029Abstract: Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which a synthesized image sequence will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with lip movements synchronized with the desired speech.Type: ApplicationFiled: May 2, 2011Publication date: November 8, 2012Applicant: MICROSOFT CORPORATIONInventors: Lijuan Wang, Frank Soong
-
Publication number: 20120203555Abstract: An electronic device configured for encoding a watermarked signal is described. The electronic device includes modeler circuitry. The modeler circuitry determines parameters based on a first signal and a first-pass coded signal. The electronic device also includes coder circuitry coupled to the modeler circuitry. The coder circuitry performs a first-pass coding on a second signal to obtain the first-pass coded signal and performs a second-pass coding based on the parameters to obtain a watermarked signal.Type: ApplicationFiled: October 18, 2011Publication date: August 9, 2012Applicant: QUALCOMM IncorporatedInventors: Stephane Pierre Villette, Daniel J. Sinder
-
Publication number: 20120203556Abstract: A method for decoding a signal on an electronic device is described. The method includes receiving a signal. The method also includes extracting a bitstream from the signal. The method further includes performing watermark error checking on the bitstream for multiple frames. The method additionally includes determining whether watermark data is detected based on the watermark error checking. The method also includes decoding the bitstream to obtain a decoded second signal if the watermark data is not detected.Type: ApplicationFiled: October 18, 2011Publication date: August 9, 2012Applicant: QUALCOMM IncorporatedInventors: Stephane Pierre Villette, Daniel J. Sinder
-
Publication number: 20120130720Abstract: An information providing device takes an image of a predetermined area and obtains the taken image in the form of image data, while externally obtaining voice data representing speech. The information providing device obtains text in a preset language corresponding to the speech in the form of text data, based on the obtained voice data, generates a composite image including the taken image and the text in the form of composite image data, based on the image data and the text data, and outputs the composite image data.Type: ApplicationFiled: November 14, 2011Publication date: May 24, 2012Applicant: ELMO COMPANY LIMITEDInventor: Yasushi Suda
-
Publication number: 20100082345Abstract: An “Animation Synthesizer” uses trainable probabilistic models, such as Hidden Markov Models (HMM), Artificial Neural Networks (ANN), etc., to provide speech and text driven body animation synthesis. Probabilistic models are trained using synchronized motion and speech inputs (e.g., live or recorded audio/video feeds) at various speech levels, such as sentences, phrases, words, phonemes, sub-phonemes, etc., depending upon the available data, and the motion type or body part being modeled. The Animation Synthesizer then uses the trainable probabilistic model for selecting animation trajectories for one or more different body parts (e.g., face, head, hands, arms, etc.) based on an arbitrary text and/or speech input. These animation trajectories are then used to synthesize a sequence of animations for digital avatars, cartoon characters, computer generated anthropomorphic persons or creatures, actual motions for physical robots, etc.Type: ApplicationFiled: September 26, 2008Publication date: April 1, 2010Applicant: MICROSOFT CORPORATIONInventors: Lijuan Wang, Lei Ma, Frank Kao-Ping Soong