Patents by Inventor Huapeng Sima

Huapeng Sima has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method for generating a dynamic image based on audio, device, and storage medium

Patent number: 12260481

Abstract: Disclosed are a method for generating a dynamic image based on audio, a device, and a storage medium, relating to the field of natural human-computer interactions. The method includes: obtaining a reference image and reference audio input by a user; determining a target head pose feature and a target expression coefficient feature based on the reference image and a trained generation network model, and adjusting the trained generation network model based on the target head pose feature and the target expression coefficient feature, to obtain a target generation network model; and processing a to-be-processed image based on the reference audio, the reference image, and the target generation network model, to obtain a target dynamic image. An image object in the to-be-processed image is same as that in the reference image. In this case, a corresponding digital person can be obtained based on a single picture of a target person.

Type: Grant

Filed: July 19, 2024

Date of Patent: March 25, 2025

Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.

Inventors: Huapeng Sima, Maolin Zhang, Liyan Mao
Audio-driven three-dimensional facial animation model generation method and apparatus, and electronic device

Patent number: 12254552

Abstract: This application provides a audio-driven three-dimensional facial animation model generation method and apparatus, and an electronic device.

Type: Grant

Filed: December 23, 2024

Date of Patent: March 18, 2025

Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.

Inventors: Huapeng Sima, Zheng Liao
DIGITAL PERSON TRAINING METHOD AND SYSTEM, AND DIGITAL PERSON DRIVING SYSTEM

Publication number: 20250086826

Abstract: This application provides a digital person training method and system, and a digital person driving system. According to the method, human-body pose estimation data in training data is extracted, and the human-body pose estimation data is input into an optimized pose estimation network to obtain human-body pose optimization data. Generation losses of position optimization data and acceleration optimization data in the human-body pose optimization data are calculated based on a loss function of the optimized pose estimation network, so as to minimize errors between position estimation data and acceleration estimation data and a real value. In this way, the optimized pose estimation network is driven to update a network parameter to obtain an optimal driving model that is based on the optimized pose estimation network. The errors between the position estimation data and the acceleration estimation data and the real value are minimized.

Type: Application

Filed: August 19, 2024

Publication date: March 13, 2025

Inventors: Huapeng Sima, Hao Jiang, Hongwei Fan, Qixun Qu, Jiabin Li, Jintai Luan
Digital person training method and system, and digital person driving system

Patent number: 12236635

Abstract: This application provides a digital person training method and system, and a digital person driving system. According to the method, human-body pose estimation data in training data is extracted, and the human-body pose estimation data is input into an optimized pose estimation network to obtain human-body pose optimization data. Generation losses of position optimization data and acceleration optimization data in the human-body pose optimization data are calculated based on a loss function of the optimized pose estimation network, so as to minimize errors between position estimation data and acceleration estimation data and a real value. In this way, the optimized pose estimation network is driven to update a network parameter to obtain an optimal driving model that is based on the optimized pose estimation network. The errors between the position estimation data and the acceleration estimation data and the real value are minimized.

Type: Grant

Filed: August 19, 2024

Date of Patent: February 25, 2025

Inventors: Huapeng Sima, Hao Jiang, Hongwei Fan, Qixun Qu, Jiabin Li, Jintai Luan
Speech conversion method and apparatus, storage medium, and electronic device

Patent number: 12223973

Abstract: Embodiments of the present application provide a speech conversion method and apparatus, a storage medium, and an electronic device. The method includes: acquiring a source speech to be converted and a target speech sample of a target speaker; recognizing a style category of the target speech sample, and extracting a target audio feature from the target speech sample according to the style category; extracting a source audio feature from the source speech; acquiring a first style feature of the target speech sample and determining a second style feature of the target speech sample according to the first style feature; fusing and mapping the source audio feature, the target audio feature, and the second style feature to obtain a joint encoding feature; and decoding the joint encoding feature, to obtain a target speech feature, and converting the source speech based on the target speech feature to obtain a target speech.

Type: Grant

Filed: August 9, 2024

Date of Patent: February 11, 2025

Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.

Inventors: Huapeng Sima, Ao Yao, Yiping Tang
Digital human driving method and apparatus, and storage medium

Patent number: 12094046

Abstract: A digital human driving method and apparatus are provided which relate to computer and image processing, and can solve the problem of shaking, joint rotation malposition and partial loss of a digital human during a driving process. The solution includes: capturing video data from multiple angles of view in a real three-dimensional space by multiple video capture devices; determining a first coordinate of a key point of the target human; determining a mapping relationship based on the first coordinate; calculating a second coordinate based on the mapping relationship and the first coordinate; processing the second coordinate according to a key point rotation model to obtain rotation value of the virtual key point in the virtual three-dimensional space; and driving the digital human to move based on the rotation value of the virtual key point in the virtual three-dimensional space.

Type: Grant

Filed: January 23, 2024

Date of Patent: September 17, 2024

Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.

Inventors: Huapeng Sima, Jintai Luan, Hongwei Fan, Jiabin Li, Hao Jiang, Qixun Qu
Generator, generator training method, and method for avoiding image coordinate adhesion

Patent number: 12056903

Abstract: Disclosed are a gated network-based generator, a generator training method, and a method for avoiding image coordinate adhesion. The generator processes, by using an image input layer, a to-be-processed image as an image sequence and inputs it to a feature encoding layer. Multiple feature encoding layers encode the image sequence by using a gated convolutional network, to obtain an image code. Moreover, multiple image decoding layers decode the image code by using an inverse gated convolution unit, to obtain a target image sequence. Finally, an image output layer splices the target image sequence to obtain a target image. Therefore, a character feature in the obtained target image is more obvious, making details of a facial image of generated digital human more vivid, whereby solving a problem of image coordinate adhesion in a digital human image generated by an existing generator using a generative adversarial network, and improving user experience.

Type: Grant

Filed: June 29, 2023

Date of Patent: August 6, 2024

Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.

Inventors: Huapeng Sima, Maolin Zhang, Peiyu Wang
Synthetic audio output method and apparatus, storage medium, and electronic device

Patent number: 12051400

Abstract: This application provide a synthetic audio output method and apparatus, a storage medium, and an electronic device. The method includes: inputting input text and a specified target identity identifier into an audio output model; extracting an identity feature sequence of a target identity by an identity recognition model; extracting a phoneme feature sequence corresponding to the input text by an encoding layer of a speech synthesis model; superimposing and inputting the identity feature sequence of the target identity and the phoneme feature sequence into a variable adapter of the speech synthesis model; and after duration prediction and alignment, energy prediction, and pitch prediction are performed on the phoneme feature sequence by the variable adapter, outputting a target Mel-frequency spectrum feature corresponding to the input text through a decoding layer of the speech synthesis model; and inputting the target Mel-frequency spectrum feature into a vocoder to output synthetic audio.

Type: Grant

Filed: February 7, 2024

Date of Patent: July 30, 2024

Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.

Inventors: Huapeng Sima, Haie Wu, Ao Yao, Da Jiang, Yiping Tang
Generator, Generator Training Method, And Method For Avoiding Image Coordinate Adhesion

Publication number: 20240169592

Abstract: Disclosed are a gated network-based generator, a generator training method, and a method for avoiding image coordinate adhesion. The generator processes, by using an image input layer, a to-be-processed image as an image sequence and inputs it to a feature encoding layer. Multiple feature encoding layers encode the image sequence by using a gated convolutional network, to obtain an image code. Moreover, multiple image decoding layers decode the image code by using an inverse gated convolution unit, to obtain a target image sequence. Finally, an image output layer splices the target image sequence to obtain a target image. Therefore, a character feature in the obtained target image is more obvious, making details of a facial image of generated digital human more vivid, whereby solving a problem of image coordinate adhesion in a digital human image generated by an existing generator using a generative adversarial network, and improving user experience.

Type: Application

Filed: June 29, 2023

Publication date: May 23, 2024

Applicant: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.

Inventors: Huapeng SIMA, Maolin ZHANG, Peiyu WANG
Method for audio-driven character lip sync, model for audio-driven character lip sync and training method therefor

Patent number: 11928767

Abstract: Embodiments of the present disclosure provide a method for audio-driven character lip sync, a model for audio-driven character lip sync, and a training method therefor. A target dynamic image is obtained by acquiring a character image of a target character and speech for generating a target dynamic image, processing the character image and the speech as image-audio data that may be trained, respectively, and mixing the image-audio data with auxiliary data for training. When a large amount of sample data needs to be obtained for training in different scenarios, a video when another character speaks is used as an auxiliary video for processing, so as to obtain the auxiliary data. The auxiliary data, which replaces non-general sample data, and other data are input into a model in a preset ratio for training. The auxiliary data may improve a process of training a synthetic lip sync action of the model, so that there are no parts unrelated to the synthetic lip sync action during the training process.

Type: Grant

Filed: June 21, 2023

Date of Patent: March 12, 2024

Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.

Inventors: Huapeng Sima, Zheng Liao
Method for Audio-Driven Character Lip Sync, Model for Audio-Driven Character Lip Sync and Training Method Therefor

Publication number: 20240054711

Abstract: Embodiments of the present disclosure provide a method for audio-driven character lip sync, a model for audio-driven character lip sync, and a training method therefor. A target dynamic image is obtained by acquiring a character image of a target character and speech for generating a target dynamic image, processing the character image and the speech as image-audio data that may be trained, respectively, and mixing the image-audio data with auxiliary data for training. When a large amount of sample data needs to be obtained for training in different scenarios, a video when another character speaks is used as an auxiliary video for processing, so as to obtain the auxiliary data. The auxiliary data, which replaces non-general sample data, and other data are input into a model in a preset ratio for training. The auxiliary data may improve a process of training a synthetic lip sync action of the model, so that there are no parts unrelated to the synthetic lip sync action during the training process.

Type: Application

Filed: June 21, 2023

Publication date: February 15, 2024

Inventors: Huapeng Sima, Zheng Liao
Mouth Shape Correction Model, And Model Training And Application Method

Publication number: 20240054811

Abstract: Embodiments of this disclosure provide a mouth shape correction model, and model training and application methods. The model includes a mouth feature extraction module, a key point extraction module, a first video module, a second video module, and a discriminator. The training method includes: based on a first original video and a second original video, extracting corresponding features by using various modules in the model to train the model; and when the model meets a convergence condition, completing the training to generate a target mouth shape correction model. The application method includes: inputting a video in which a mouth shape of a digital-human actor is to be corrected and corresponding audio into a mouth shape correction model, to obtain a video in which the mouth shape of the digital-human actor in the video is corrected, wherein the mouth shape correction model is a model trained by using the training method.

Type: Application

Filed: June 21, 2023

Publication date: February 15, 2024

Applicant: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.

Inventors: Huapeng SIMA, Guo YANG
Mouth shape correction model, and model training and application method

Patent number: 11887403

Abstract: Embodiments of this disclosure provide a mouth shape correction model, and model training and application methods. The model includes a mouth feature extraction module, a key point extraction module, a first video module, a second video module, and a discriminator. The training method includes: based on a first original video and a second original video, extracting corresponding features by using various modules in the model to train the model; and when the model meets a convergence condition, completing the training to generate a target mouth shape correction model. The application method includes: inputting a video in which a mouth shape of a digital-human actor is to be corrected and corresponding audio into a mouth shape correction model, to obtain a video in which the mouth shape of the digital-human actor in the video is corrected, wherein the mouth shape correction model is a model trained by using the training method.

Type: Grant

Filed: June 21, 2023

Date of Patent: January 30, 2024

Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.

Inventors: Huapeng Sima, Guo Yang
Voice conversion system and training method therefor

Patent number: 11875775

Abstract: The present disclosure proposes a speech conversion scheme for non-parallel corpus training, to get rid of dependence on parallel text and resolve a technical problem that it is difficult to achieve speech conversion under conditions that resources and equipment are limited. A voice conversion system and a training method therefor are included. Compared with the prior art, according to the embodiments of the present disclosure: a trained speaker-independent automatic speech recognition model can be used for any source speaker, that is, the speaker is independent; and bottleneck features of audio are more abstract as compared with phonetic posteriorGram features, can reflect decoupling of spoken content and timbre of the speaker, and meanwhile are not closely bound with a phoneme class, and are not in a clear one-to-one correspondence relationship. In this way, a problem of inaccurate pronunciation caused by a recognition error in ASR is relieved to some extent.

Type: Grant

Filed: April 20, 2021

Date of Patent: January 16, 2024

Assignee: Nanjing Silicon Intelligence Technology Co., Ltd.

Inventors: Huapeng Sima, Zhiqiang Mao, Xuefei Gong
Fitness action recognition model, method of training model, and method of recognizing fitness action

Patent number: 11854306

Abstract: A model including an information extraction layer that obtains image information of a training object in a depth image; a pixel point positioning layer that performs position estimation on a three-dimensional coordinate of human-body key points, defines a body part of the training object as a body component, and calibrates a three-dimensional coordinate of all human-body key points corresponding to the body component; a feature extraction layer that extracts a key-point position feature, a body moving speed feature, and a key-point moving speed feature for action recognition; a vector dimensionality reduction layer that combines the key-point position feature, the body moving speed feature, and the key-point moving speed feature as a multidimensional feature vector, and performs dimensionality reduction on the multidimensional feature vector; and a feature vector classification layer that classifies the multidimensional feature vector that is performed with dimensionality reduction, to recognize a fitness act

Type: Grant

Filed: June 28, 2023

Date of Patent: December 26, 2023

Assignee: Nanjing Silicon Intelligence Technology Co., Ltd.

Inventors: Huapeng Sima, Hao Jiang, Hongwei Fan, Qixun Qu, Jintai Luan, Jiabin Li
Method for outputting blend shape value, storage medium, and electronic device

Patent number: 11847726

Abstract: A method for outputting a blend shape value includes: performing feature extraction on obtained target audio data to obtain a target audio feature vector; inputting the target audio feature vector and a target identifier into an audio-driven animation model; inputting the target audio feature vector into an audio encoding layer, determining an input feature vector of a next layer at a (2t?n)/2 time point based on an input feature vector of a previous layer between a t time point and a t?n time point, determining a feature vector having a causal relationship with the input feature vector of the previous layer as a valid feature vector, outputting sequentially target-audio encoding features, and inputting the target identifier into a one-hot encoding layer for binary vector encoding to obtain a target-identifier encoding feature; and outputting a blend shape value corresponding to the target audio data.

Type: Grant

Filed: July 22, 2022

Date of Patent: December 19, 2023

Assignee: Nanjing Silicon Intelligence Technology Co., Ltd.

Inventors: Huapeng Sima, Cuicui Tang, Zheng Liao
GAN-based speech synthesis model and training method

Patent number: 11817079

Abstract: The present disclosure provides a GAN-based speech synthesis model, a training method, and a speech synthesis method. According to the speech synthesis method, to-be-converted text is obtained and is converted into a text phoneme, the text phoneme is further digitized to obtain text data, and the text data is converted into a text vector to be input into a speech synthesis model. In this way, target audio corresponding to the to-be-converted text is obtained. When a target Mel-frequency spectrum is generated by using a trained generator, accuracy of the generated target Mel-frequency spectrum can reach that of a standard Mel-frequency spectrum. Through constant adversary between the generator and a discriminator and the trainings thereof, acoustic losses of the target Mel-frequency spectrum are reduced, and acoustic losses of the target audio generated based on the target Mel-frequency spectrum are also reduced, thereby improving accuracy of audio synthesized from speech.

Type: Grant

Filed: June 16, 2023

Date of Patent: November 14, 2023

Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.

Inventors: Huapeng Sima, Zhiqiang Mao
Method and system for outputting target audio, readable storage medium, and electronic device

Patent number: 11763801

Abstract: Embodiments of the present application provide a method and system for outputting target audio, a readable storage medium, and an electronic device. The method includes: inputting source audio into a phonetic posteriorgram PPG classification network model to obtain a PPG feature vector, where the PPG feature vector is used for indicating a phoneme label corresponding to each frame of the source audio, and the PPG feature vector contains text information and prosodic information of the source audio; inputting the PPG feature vector into a voice conversion network model, and outputting an acoustic feature vector of target audio based on the phoneme label corresponding to the PPG feature vector, where the target audio contains a plurality pieces of audio with different timbres; and inputting the acoustic feature vector of the target audio into a voice coder, and outputting the target audio through the voice coder.

Type: Grant

Filed: August 29, 2022

Date of Patent: September 19, 2023

Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.

Inventors: Huapeng Sima, Zhiqiang Mao, Xuefei Gong
METHOD FOR OUTPUTTING BLEND SHAPE VALUE, STORAGE MEDIUM, AND ELECTRONIC DEVICE

Publication number: 20230215068

Abstract: A method for outputting a blend shape value includes: performing feature extraction on obtained target audio data to obtain a target audio feature vector; inputting the target audio feature vector and a target identifier into an audio-driven animation model; inputting the target audio feature vector into an audio encoding layer, determining an input feature vector of a next layer at a (2t?n)/2 time point based on an input feature vector of a previous layer between a t time point and a t-n time point, determining a feature vector having a causal relationship with the input feature vector of the previous layer as a valid feature vector, outputting sequentially target-audio encoding features, and inputting the target identifier into a one-hot encoding layer for binary vector encoding to obtain a target-identifier encoding feature; and outputting a blend shape value corresponding to the target audio data.

Type: Application

Filed: July 22, 2022

Publication date: July 6, 2023

Applicant: NANJING SILICON INTERLLIGENCE TECHNOLOGY CO., LTD.

Inventors: Huapeng SIMA, Cuicui TANG, Zheng LIAO
Method and System for Outputting Target Audio, Readable Storage Medium, and Electronic Device

Publication number: 20230197061

Abstract: Embodiments of the present application provide a method and system for outputting target audio, a readable storage medium, and an electronic device. The method includes: inputting source audio into a phonetic posteriorgram PPG classification network model to obtain a PPG feature vector, where the PPG feature vector is used for indicating a phoneme label corresponding to each frame of the source audio, and the PPG feature vector contains text information and prosodic information of the source audio; inputting the PPG feature vector into a voice conversion network model, and outputting an acoustic feature vector of target audio based on the phoneme label corresponding to the PPG feature vector, where the target audio contains a plurality pieces of audio with different timbres; and inputting the acoustic feature vector of the target audio into a voice coder, and outputting the target audio through the voice coder.

Type: Application

Filed: August 29, 2022

Publication date: June 22, 2023

Inventors: Huapeng Sima, Zhiqiang Mao, Xuefei Gong

1 2 next