Patents by Inventor Huapeng Sima

Huapeng Sima has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11928767
    Abstract: Embodiments of the present disclosure provide a method for audio-driven character lip sync, a model for audio-driven character lip sync, and a training method therefor. A target dynamic image is obtained by acquiring a character image of a target character and speech for generating a target dynamic image, processing the character image and the speech as image-audio data that may be trained, respectively, and mixing the image-audio data with auxiliary data for training. When a large amount of sample data needs to be obtained for training in different scenarios, a video when another character speaks is used as an auxiliary video for processing, so as to obtain the auxiliary data. The auxiliary data, which replaces non-general sample data, and other data are input into a model in a preset ratio for training. The auxiliary data may improve a process of training a synthetic lip sync action of the model, so that there are no parts unrelated to the synthetic lip sync action during the training process.
    Type: Grant
    Filed: June 21, 2023
    Date of Patent: March 12, 2024
    Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.
    Inventors: Huapeng Sima, Zheng Liao
  • Publication number: 20240054811
    Abstract: Embodiments of this disclosure provide a mouth shape correction model, and model training and application methods. The model includes a mouth feature extraction module, a key point extraction module, a first video module, a second video module, and a discriminator. The training method includes: based on a first original video and a second original video, extracting corresponding features by using various modules in the model to train the model; and when the model meets a convergence condition, completing the training to generate a target mouth shape correction model. The application method includes: inputting a video in which a mouth shape of a digital-human actor is to be corrected and corresponding audio into a mouth shape correction model, to obtain a video in which the mouth shape of the digital-human actor in the video is corrected, wherein the mouth shape correction model is a model trained by using the training method.
    Type: Application
    Filed: June 21, 2023
    Publication date: February 15, 2024
    Applicant: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.
    Inventors: Huapeng SIMA, Guo YANG
  • Publication number: 20240054711
    Abstract: Embodiments of the present disclosure provide a method for audio-driven character lip sync, a model for audio-driven character lip sync, and a training method therefor. A target dynamic image is obtained by acquiring a character image of a target character and speech for generating a target dynamic image, processing the character image and the speech as image-audio data that may be trained, respectively, and mixing the image-audio data with auxiliary data for training. When a large amount of sample data needs to be obtained for training in different scenarios, a video when another character speaks is used as an auxiliary video for processing, so as to obtain the auxiliary data. The auxiliary data, which replaces non-general sample data, and other data are input into a model in a preset ratio for training. The auxiliary data may improve a process of training a synthetic lip sync action of the model, so that there are no parts unrelated to the synthetic lip sync action during the training process.
    Type: Application
    Filed: June 21, 2023
    Publication date: February 15, 2024
    Inventors: Huapeng Sima, Zheng Liao
  • Patent number: 11887403
    Abstract: Embodiments of this disclosure provide a mouth shape correction model, and model training and application methods. The model includes a mouth feature extraction module, a key point extraction module, a first video module, a second video module, and a discriminator. The training method includes: based on a first original video and a second original video, extracting corresponding features by using various modules in the model to train the model; and when the model meets a convergence condition, completing the training to generate a target mouth shape correction model. The application method includes: inputting a video in which a mouth shape of a digital-human actor is to be corrected and corresponding audio into a mouth shape correction model, to obtain a video in which the mouth shape of the digital-human actor in the video is corrected, wherein the mouth shape correction model is a model trained by using the training method.
    Type: Grant
    Filed: June 21, 2023
    Date of Patent: January 30, 2024
    Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.
    Inventors: Huapeng Sima, Guo Yang
  • Patent number: 11875775
    Abstract: The present disclosure proposes a speech conversion scheme for non-parallel corpus training, to get rid of dependence on parallel text and resolve a technical problem that it is difficult to achieve speech conversion under conditions that resources and equipment are limited. A voice conversion system and a training method therefor are included. Compared with the prior art, according to the embodiments of the present disclosure: a trained speaker-independent automatic speech recognition model can be used for any source speaker, that is, the speaker is independent; and bottleneck features of audio are more abstract as compared with phonetic posteriorGram features, can reflect decoupling of spoken content and timbre of the speaker, and meanwhile are not closely bound with a phoneme class, and are not in a clear one-to-one correspondence relationship. In this way, a problem of inaccurate pronunciation caused by a recognition error in ASR is relieved to some extent.
    Type: Grant
    Filed: April 20, 2021
    Date of Patent: January 16, 2024
    Assignee: Nanjing Silicon Intelligence Technology Co., Ltd.
    Inventors: Huapeng Sima, Zhiqiang Mao, Xuefei Gong
  • Patent number: 11854306
    Abstract: A model including an information extraction layer that obtains image information of a training object in a depth image; a pixel point positioning layer that performs position estimation on a three-dimensional coordinate of human-body key points, defines a body part of the training object as a body component, and calibrates a three-dimensional coordinate of all human-body key points corresponding to the body component; a feature extraction layer that extracts a key-point position feature, a body moving speed feature, and a key-point moving speed feature for action recognition; a vector dimensionality reduction layer that combines the key-point position feature, the body moving speed feature, and the key-point moving speed feature as a multidimensional feature vector, and performs dimensionality reduction on the multidimensional feature vector; and a feature vector classification layer that classifies the multidimensional feature vector that is performed with dimensionality reduction, to recognize a fitness act
    Type: Grant
    Filed: June 28, 2023
    Date of Patent: December 26, 2023
    Assignee: Nanjing Silicon Intelligence Technology Co., Ltd.
    Inventors: Huapeng Sima, Hao Jiang, Hongwei Fan, Qixun Qu, Jintai Luan, Jiabin Li
  • Patent number: 11847726
    Abstract: A method for outputting a blend shape value includes: performing feature extraction on obtained target audio data to obtain a target audio feature vector; inputting the target audio feature vector and a target identifier into an audio-driven animation model; inputting the target audio feature vector into an audio encoding layer, determining an input feature vector of a next layer at a (2t?n)/2 time point based on an input feature vector of a previous layer between a t time point and a t?n time point, determining a feature vector having a causal relationship with the input feature vector of the previous layer as a valid feature vector, outputting sequentially target-audio encoding features, and inputting the target identifier into a one-hot encoding layer for binary vector encoding to obtain a target-identifier encoding feature; and outputting a blend shape value corresponding to the target audio data.
    Type: Grant
    Filed: July 22, 2022
    Date of Patent: December 19, 2023
    Assignee: Nanjing Silicon Intelligence Technology Co., Ltd.
    Inventors: Huapeng Sima, Cuicui Tang, Zheng Liao
  • Patent number: 11817079
    Abstract: The present disclosure provides a GAN-based speech synthesis model, a training method, and a speech synthesis method. According to the speech synthesis method, to-be-converted text is obtained and is converted into a text phoneme, the text phoneme is further digitized to obtain text data, and the text data is converted into a text vector to be input into a speech synthesis model. In this way, target audio corresponding to the to-be-converted text is obtained. When a target Mel-frequency spectrum is generated by using a trained generator, accuracy of the generated target Mel-frequency spectrum can reach that of a standard Mel-frequency spectrum. Through constant adversary between the generator and a discriminator and the trainings thereof, acoustic losses of the target Mel-frequency spectrum are reduced, and acoustic losses of the target audio generated based on the target Mel-frequency spectrum are also reduced, thereby improving accuracy of audio synthesized from speech.
    Type: Grant
    Filed: June 16, 2023
    Date of Patent: November 14, 2023
    Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.
    Inventors: Huapeng Sima, Zhiqiang Mao
  • Patent number: 11763801
    Abstract: Embodiments of the present application provide a method and system for outputting target audio, a readable storage medium, and an electronic device. The method includes: inputting source audio into a phonetic posteriorgram PPG classification network model to obtain a PPG feature vector, where the PPG feature vector is used for indicating a phoneme label corresponding to each frame of the source audio, and the PPG feature vector contains text information and prosodic information of the source audio; inputting the PPG feature vector into a voice conversion network model, and outputting an acoustic feature vector of target audio based on the phoneme label corresponding to the PPG feature vector, where the target audio contains a plurality pieces of audio with different timbres; and inputting the acoustic feature vector of the target audio into a voice coder, and outputting the target audio through the voice coder.
    Type: Grant
    Filed: August 29, 2022
    Date of Patent: September 19, 2023
    Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.
    Inventors: Huapeng Sima, Zhiqiang Mao, Xuefei Gong
  • Publication number: 20230215068
    Abstract: A method for outputting a blend shape value includes: performing feature extraction on obtained target audio data to obtain a target audio feature vector; inputting the target audio feature vector and a target identifier into an audio-driven animation model; inputting the target audio feature vector into an audio encoding layer, determining an input feature vector of a next layer at a (2t?n)/2 time point based on an input feature vector of a previous layer between a t time point and a t-n time point, determining a feature vector having a causal relationship with the input feature vector of the previous layer as a valid feature vector, outputting sequentially target-audio encoding features, and inputting the target identifier into a one-hot encoding layer for binary vector encoding to obtain a target-identifier encoding feature; and outputting a blend shape value corresponding to the target audio data.
    Type: Application
    Filed: July 22, 2022
    Publication date: July 6, 2023
    Applicant: NANJING SILICON INTERLLIGENCE TECHNOLOGY CO., LTD.
    Inventors: Huapeng SIMA, Cuicui TANG, Zheng LIAO
  • Publication number: 20230197061
    Abstract: Embodiments of the present application provide a method and system for outputting target audio, a readable storage medium, and an electronic device. The method includes: inputting source audio into a phonetic posteriorgram PPG classification network model to obtain a PPG feature vector, where the PPG feature vector is used for indicating a phoneme label corresponding to each frame of the source audio, and the PPG feature vector contains text information and prosodic information of the source audio; inputting the PPG feature vector into a voice conversion network model, and outputting an acoustic feature vector of target audio based on the phoneme label corresponding to the PPG feature vector, where the target audio contains a plurality pieces of audio with different timbres; and inputting the acoustic feature vector of the target audio into a voice coder, and outputting the target audio through the voice coder.
    Type: Application
    Filed: August 29, 2022
    Publication date: June 22, 2023
    Inventors: Huapeng Sima, Zhiqiang Mao, Xuefei Gong
  • Patent number: 11682482
    Abstract: Embodiments of the present disclosure provide a method and apparatus for determining a psychological counseling training scheme. The method includes: obtaining training feeling data of a user after each session of psychological counseling training through an interactive inquiry with the user; inputting the training feeling data into a first classification model, identifying a training result of the user after each session of psychological counseling training by using the first classification model, and collecting statistics about a training result of the user in a current training period; and determining a training scheme of the user for a next training period based on the training result in the current training period of the user. The invention resolves the problem that a counseling result is not ideal because individual demands of different users cannot be satisfied if a psychological robot performs psychological counseling on the user according to a preset counseling procedure.
    Type: Grant
    Filed: July 8, 2021
    Date of Patent: June 20, 2023
    Assignee: Nanjing Silicon Intelligence Technology Co., Ltd.
    Inventors: Huapeng Sima, Bingtao Hua, Yiping Tang, Cheng Wang
  • Patent number: 11651139
    Abstract: Embodiments of the present application provide a text output method and system, a storage medium, and an electronic device. The system includes at least an automatic speech recognition ASR model group, a text alignment model, and a re-scoring model that are sequentially connected, where the ASR model group includes a plurality of ASR models each configured to convert input audio data into respective first texts; the text alignment model is configured to perform alignment for a plurality of first texts, to obtain a plurality of target texts, where lengths of the plurality of target texts are all equal; and the re-scoring model is configured to score words/terms at each alignment position of the plurality of target texts, to obtain a word/term with the highest score at each alignment position, as a target word/term, and determine the target word/terms, as an output text, by the respective alignment positions.
    Type: Grant
    Filed: May 25, 2022
    Date of Patent: May 16, 2023
    Assignee: Nanjing Silicon Intelligence Technology Co., Ltd.
    Inventors: Huapeng Sima, Manhong Wang, Yiping Tang
  • Publication number: 20230121683
    Abstract: Embodiments of the present application provide a text output method and system, a storage medium, and an electronic device. The system includes at least an automatic speech recognition ASR model group, a text alignment model, and a re-scoring model that are sequentially connected, where the ASR model group includes a plurality of ASR models each configured to convert input audio data into respective first texts; the text alignment model is configured to perform alignment for a plurality of first texts, to obtain a plurality of target texts, where lengths of the plurality of target texts are all equal; and the re-scoring model is configured to score words/terms at each alignment position of the plurality of target texts, to obtain a word/term with the highest score at each alignment position, as a target word/term, and determine the target word/terms, as an output text, by the respective alignment positions.
    Type: Application
    Filed: May 25, 2022
    Publication date: April 20, 2023
    Applicant: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.
    Inventors: Huapeng SIMA, Manhong WANG, Yiping TANG
  • Patent number: 11516346
    Abstract: A three-way calling terminal for a mobile human-machine coordination calling robot. Technical solutions include: a first speech interface, configured to transfer call audio between a call object and a back-end processing module; a CODEC1 module, configured to encode and decode the call audio between the call object and the back-end processing module; a second speech interface, configured to transfer call audio between the human agent and the call object; a CODEC2 module, configured to encode and decode the call audio between the human agent and the call object; a call control module, configured to process a control signal, and automatically make, answer, and hang up a call; a data processing submodule, configured to process speech data and perform data transfer between the data processing submodule and the back-end processing module; and a networking submodule, configured to be connected to the back-end processing module.
    Type: Grant
    Filed: July 8, 2021
    Date of Patent: November 29, 2022
    Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.
    Inventor: Huapeng Sima
  • Publication number: 20220310063
    Abstract: The present disclosure proposes a speech conversion scheme for non-parallel corpus training, to get rid of dependence on parallel text and resolve a technical problem that it is difficult to achieve speech conversion under conditions that resources and equipment are limited. A voice conversion system and a training method therefor are included. Compared with the prior art, according to the embodiments of the present disclosure: a trained speaker-independent automatic speech recognition model can be used for any source speaker, that is, the speaker is independent; and bottleneck features of audio are more abstract as compared with phonetic posteriorGram features, can reflect decoupling of spoken content and timbre of the speaker, and meanwhile are not closely bound with a phoneme class, and are not in a clear one-to-one correspondence relationship. In this way, a problem of inaccurate pronunciation caused by a recognition error in ASR is relieved to some extent.
    Type: Application
    Filed: April 20, 2021
    Publication date: September 29, 2022
    Inventors: Huapeng Sima, Zhiqiang Mao, Xuefei Gong
  • Publication number: 20220277833
    Abstract: Embodiments of the present disclosure provide a method and apparatus for determining a psychological counseling training scheme. The method includes: obtaining training feeling data of a user after each session of psychological counseling training through an interactive inquiry with the user; inputting the training feeling data into a first classification model, identifying a training result of the user after each session of psychological counseling training by using the first classification model, and collecting statistics about a training result of the user in a current training period; and determining a training scheme of the user for a next training period based on the training result in the current training period of the user. The invention resolves the problem that a counseling result is not ideal because individual demands of different users cannot be satisfied if a psychological robot performs psychological counseling on the user according to a preset counseling procedure.
    Type: Application
    Filed: July 8, 2021
    Publication date: September 1, 2022
    Applicant: Nanjing Silicon Intelligence Technology Co., Ltd.
    Inventors: Huapeng SIMA, Bingtao HUA, Yiping TANG, Cheng WANG
  • Patent number: 11380327
    Abstract: The present disclosure relates to a field of intelligent communications, and discloses a speech communication system and method with human-machine coordination, which resolve a problem of bad client experience because great differences occur after a switchover in a call through a prior human-machine coordination and time of a client is wasted.
    Type: Grant
    Filed: April 2, 2021
    Date of Patent: July 5, 2022
    Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.
    Inventor: Huapeng Sima
  • Publication number: 20220210275
    Abstract: A three-way calling terminal for a mobile human-machine coordination calling robot. Technical solutions include: a first speech interface, configured to transfer call audio between a call object and a back-end processing module; a CODEC1 module, configured to encode and decode the call audio between the call object and the back-end processing module; a second speech interface, configured to transfer call audio between the human agent and the call object; a CODEC2 module, configured to encode and decode the call audio between the human agent and the call object; a call control module, configured to process a control signal, and automatically make, answer, and hang up a call; a data processing submodule, configured to process speech data and perform data transfer between the data processing submodule and the back-end processing module; and a networking submodule, configured to be connected to the back-end processing module.
    Type: Application
    Filed: July 8, 2021
    Publication date: June 30, 2022
    Inventor: Huapeng SIMA
  • Publication number: 20220044679
    Abstract: The present disclosure relates to a field of intelligent communications, and discloses a speech communication system and method with human-machine coordination, which resolve a problem of bad client experience because great differences occur after a switchover in a call through a prior human-machine coordination and time of a client is wasted.
    Type: Application
    Filed: April 2, 2021
    Publication date: February 10, 2022
    Applicant: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.
    Inventor: Huapeng SIMA