Patents by Inventor Jianhua Tao

Jianhua Tao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Dialogue emotion correction method based on graph neural network

Patent number: 12100418

Abstract: Disclosed is a dialogue emotion correction method based on a graph neural network, including: extracting acoustic features, text features, and image features from a video file to fuse them into multi-modal features; obtaining an emotion prediction result of each sentence of a dialogue in the video file by using the multi-modal features; fusing the emotion prediction result of each sentence with interaction information between talkers in the video file to obtain interaction information fused emotion features; combining, on the basis of the interaction information fused emotion features, with context-dependence relationship in the dialogue to obtain time-series information fused emotion features; correcting, by using the time-series information fused emotion features, the emotion prediction result of each sentence that is obtained previously as to obtain a more accurate emotion recognition result.

Type: Grant

Filed: September 10, 2021

Date of Patent: September 24, 2024

Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua Tao, Zheng Lian, Bin Liu, Xuefei Liu
Automatic depression detection method based on audio-video

Patent number: 11963771

Abstract: Disclosed is an automatic depression detection method using audio-video, including: acquiring original data containing two modalities of long-term audio file and long-term video file from an audio-video file; dividing the long-term audio file into several audio segments, and meanwhile dividing the long-term video file into a plurality of video segments; inputting each audio segment/each video segment into an audio feature extraction network/a video feature extraction network to obtain in-depth audio features/in-depth video features; calculating the in-depth audio features and the in-depth video features by using multi-head attention mechanism so as to obtain attention audio features and attention video features; aggregating the attention audio features and the attention video features into audio-video features; and inputting the audio-video features into a decision network to predict a depression level of an individual in the audio-video file.

Type: Grant

Filed: September 10, 2021

Date of Patent: April 23, 2024

Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua Tao, Cong Cai, Bin Liu, Mingyue Niu
Micro-expression recognition method based on multi-scale spatiotemporal feature neural network

Patent number: 11908240

Abstract: Disclosed is a micro-expression recognition method based on a multi-scale spatiotemporal feature neural network, in which spatial features and temporal features of micro-expression are obtained from micro-expression video frames, and combined together to form more robust micro-expression features, at the same time, since the micro-expression occurs in local areas of a face, active local areas of the face during occurrence of the micro-expression and an overall area of the face are combined together for micro-expression recognition.

Type: Grant

Filed: September 10, 2021

Date of Patent: February 20, 2024

Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua Tao, Hao Zhang, Bin Liu, Wenxiang She
Hierarchical generated audio detection system

Patent number: 11763836

Abstract: Disclosed is a hierarchical generated audio detection system, comprising an audio preprocessing module, a CQCC feature extraction module, a LFCC feature extraction module, a first-stage lightweight coarse-level detection model and a second-stage fine-level deep identification model; the audio preprocessing module preprocesses collected audio or video data to obtain an audio clip with a length not exceeding the limit; inputting the audio clip into CQCC feature extraction module and LFCC feature extraction module respectively to obtain CQCC feature and LFCC feature; inputting CQCC feature or LFCC feature into the first-stage lightweight coarse-level detection model for first-stage screening to screen out the first-stage real audio and the first-stage generated audio; inputting the CQCC feature or LFCC feature of the first-stage generated audio into the second-stage fine-level deep identification model to identify the second-stage real audio and the second-stage generated audio, and the second-stage generated au

Type: Grant

Filed: February 17, 2022

Date of Patent: September 19, 2023

Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua Tao, Zhengkun Tian, Jiangyan Yi
Method and electronic apparatus for detecting tampering audio, and storage medium

Patent number: 11636871

Abstract: Disclosed are a method, an electronic apparatus for detecting tampering audio and a storage medium. The method includes: acquiring a signal to be detected, and performing a wavelet transform of a first preset order on the signal to be detected so as to obtain a first low-frequency coefficient and a first high-frequency coefficient corresponding to the signal to be detected, the number of which is equal to that of the first preset order; performing an inverse wavelet transform on the first high-frequency coefficient having an order greater than or equal to a second preset order so as to obtain a first high-frequency component signal corresponding to the signal to be detected; calculating a first Mel cepstrum feature of the first high-frequency component signal in units of frame, and concatenating the first Mel cepstrum features of a current frame signal and a preset number of frame signals.

Type: Grant

Filed: February 8, 2022

Date of Patent: April 25, 2023

Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua Tao, Shan Liang, Shuai Nie, Jiangyan Yi
METHOD AND ELECTRONIC APPARATUS FOR DETECTING TAMPERING AUDIO, AND STORAGE MEDIUM

Publication number: 20230076251

Abstract: Disclosed are a method, an electronic apparatus for detecting tampering audio and a storage medium. The method includes: acquiring a signal to be detected, and performing a wavelet transform of a first preset order on the signal to be detected so as to obtain a first low-frequency coefficient and a first high-frequency coefficient corresponding to the signal to be detected, the number of which is equal to that of the first preset order; performing an inverse wavelet transform on the first high-frequency coefficient having an order greater than or equal to a second preset order so as to obtain a first high-frequency component signal corresponding to the signal to be detected; calculating a first Mel cepstrum feature of the first high-frequency component signal in units of frame, and concatenating the first Mel cepstrum features of a current frame signal and a preset number of frame signals.

Type: Application

Filed: February 8, 2022

Publication date: March 9, 2023

Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua TAO, Shan LIANG, Shuai NIE, Jiangyan YI
Method for training speech recognition model, method and system for speech recognition

Patent number: 11580957

Abstract: Disclosed are a method for training speech recognition model, a method and a system for speech recognition. The disclosure relates to field of speech recognition and includes: inputting an audio training sample into the acoustic encoder to represent acoustic features of the audio training sample in an encoded way and determine an acoustic encoded state vector; inputting a preset vocabulary into the language predictor to determine text prediction vector; inputting the text prediction vector into the text mapping layer to obtain a text output probability distribution; calculating a first loss function according to a target text sequence corresponding to the audio training sample and the text output probability distribution; inputting the text prediction vector and the acoustic encoded state vector into the joint network to calculate a second loss function, and performing iterative optimization according to the first loss function and the second loss function.

Type: Grant

Filed: June 9, 2022

Date of Patent: February 14, 2023

Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua Tao, Zhengkun Tian, Jiangyan Yi
HIERARCHICAL GENERATED AUDIO DETECTION SYSTEM

Publication number: 20230027645

Abstract: Disclosed is a hierarchical generated audio detection system, comprising an audio preprocessing module, a CQCC feature extraction module, a LFCC feature extraction module, a first-stage lightweight coarse-level detection model and a second-stage fine-level deep identification model; the audio preprocessing module preprocesses collected audio or video data to obtain an audio clip with a length not exceeding the limit; inputting the audio clip into CQCC feature extraction module and LFCC feature extraction module respectively to obtain CQCC feature and LFCC feature; inputting CQCC feature or LFCC feature into the first-stage lightweight coarse-level detection model for first-stage screening to screen out the first-stage real audio and the first-stage generated audio; inputting the CQCC feature or LFCC feature of the first-stage generated audio into the second-stage fine-level deep identification model to identify the second-stage real audio and the second-stage generated audio, and the second-stage generated au

Type: Application

Filed: February 17, 2022

Publication date: January 26, 2023

Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua TAO, Zhengkun TIAN, Jiangyan YI
Method for obtaining digital audio tampering evidence based on phase deviation detection

Patent number: 11521629

Abstract: Disclosed is a digital audio tampering forensics method based on phase offset detection, comprising: multiplying a signal to be identified with a time label to obtain a modulation signal of the signal to be identified; then, performing a short-time Fourier transform on the signal to be identified and the modulation signal to obtain a signal power spectrum and a modulation signal power spectrum; computing group delay characteristics by using the signal power spectrum and the modulation signal power spectrum; computing a mean value of the group delay characteristics, and then using the mean value results for smoothing computation to obtain phase information of a current frame signal; computing a dynamic threshold by using the phase information of the current frame signal, and then deciding whether the signal is tampered by using the dynamic threshold and the phase information of the current frame signal.

Type: Grant

Filed: February 9, 2022

Date of Patent: December 6, 2022

Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua Tao, Shan Liang, Shuai Nie, Jiangyan Yi
Method, system for speech recognition, electronic device and storage medium

Patent number: 11501759

Abstract: Disclosed are a method and a system for speech recognition, an electronic device and a storage medium, which relates to the technical field of speech recognition. Embodiments of the application comprise performing encoded representation on an audio to be recognized to obtain an acoustic encoded state vector sequence of the audio to be recognized; performing sparse encoding on the acoustic encoded state vector sequence of the audio to be recognized to obtain an acoustic encoded sparse vector; determining a text prediction vector of each label in a preset vocabulary; recognizing the audio to be recognized and determining a text content corresponding to the audio to be recognized according to the acoustic encoded sparse vector and the text prediction vector. The acoustic encoded sparse vector of the audio to be recognized is obtained by performing sparse encoding on the acoustic encoded state vector of the audio to be recognized.

Type: Grant

Filed: July 19, 2022

Date of Patent: November 15, 2022

Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua Tao, Zhengkun Tian, Jiangyan Yi
System for speech recognition text enhancement fusing multi-modal semantic invariance

Patent number: 11488586

Abstract: Disclosed is a system for speech recognition text enhancement fusing multi-modal semantic invariance, the system includes an acoustic feature extraction module, an acoustic down-sampling module, an acoustic feature extraction module, an acoustic down-sampling module, an encoder and a decoder fusing multi-modal semantic invariance; the acoustic feature extraction module is configured for frame-dividing processing of speech data, dividing the speech data into short-term audio frames with a fixed length, extracting thank acoustic features from the short-term audio frames, and inputting the acoustic features into the acoustic down-sampling module for down-sampling to obtain an acoustic representation; inputting the speech data into an existing speech recognition module to obtain input text data, and inputting the input text data into the encoder to obtain an input text encoded representation; inputting the acoustic representation and the input text encoded representation into the decoder to fuse.

Type: Grant

Filed: July 19, 2022

Date of Patent: November 1, 2022

Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua Tao, Shuai Zhang, Jiangyan Yi
End-to-end system for speech recognition and speech translation and device

Patent number: 11475877

Abstract: Disclosed are an end-to-end system for speech recognition and speech translation and an electronic device. The system comprises an acoustic encoder and a multi-task decoder and a semantic invariance constraint module, and completes two tasks for speech recognition and speech translation. In addition, according to the characteristic of the semantic consistency of texts between different tasks, semantic constraints are imposed on the model to learn high-level semantic information, and the semantic information can effectively improve the performance of speech recognition and speech translation. The application has the following advantages that the error accumulation problem of serial system is avoided, and the calculation cost of the model is low and the real-time performance is very high.

Type: Grant

Filed: June 28, 2022

Date of Patent: October 18, 2022

Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua Tao, Shuai Zhang, Jiangyan Yi
Method and apparatus for editing audio, electronic device and storage medium

Patent number: 11462207

Abstract: Disclosed are a method and an apparatus for editing audio, an electronic device and a storage medium. The method includes: acquiring a modified text obtained by modifying a known original text of an audio to be edited according to a known text for modification; predicting a duration of an audio corresponding to the text for modification; adjusting a region to be edited of the audio to be edited according to the duration of the audio corresponding to the text for modification, to obtain an adjusted audio to be edited; obtaining, based on a pre-trained audio editing model, an edited audio according to the adjusted audio to be edited and the modified text. In the present disclosure, the edited audio obtained by the audio editing model sounds natural in the context, and supports the function of synthesizing new words that do not appear in the corpus.

Type: Grant

Filed: May 5, 2022

Date of Patent: October 4, 2022

Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua Tao, Tao Wang, Jiangyan Yi, Ruibo Fu
AUTOMATIC DEPRESSION DETECTION METHOD BASED ON AUDIO-VIDEO

Publication number: 20220265184

Abstract: Disclosed is an automatic depression detection method using audio-video, including: acquiring original data containing two modalities of long-term audio file and long-term video file from an audio-video file; dividing the long-term audio file into several audio segments, and meanwhile dividing the long-term video file into a plurality of video segments; inputting each audio segment/each video segment into an audio feature extraction network/a video feature extraction network to obtain in-depth audio features/in-depth video features; calculating the in-depth audio features and the in-depth video features by using multi-head attention mechanism so as to obtain attention audio features and attention video features; aggregating the attention audio features and the attention video features into audio-video features; and inputting the audio-video features into a decision network to predict a depression level of an individual in the audio-video file.

Type: Application

Filed: September 10, 2021

Publication date: August 25, 2022

Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua TAO, Cong CAI, Bin LIU, Mingyue NIU
MICRO-EXPRESSION RECOGNITION METHOD BASED ON MULTI-SCALE SPATIOTEMPORAL FEATURE NEURAL NETWORK

Publication number: 20220269881

Abstract: Disclosed is a micro-expression recognition method based on a multi-scale spatiotemporal feature neural network, in which spatial features and temporal features of micro-expression are obtained from micro-expression video frames, and combined together to form more robust micro-expression features, at the same time, since the micro-expression occurs in local areas of a face, active local areas of the face during occurrence of the micro-expression and an overall area of the face are combined together for micro-expression recognition.

Type: Application

Filed: September 10, 2021

Publication date: August 25, 2022

Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua TAO, Hao ZHANG, Bin LIU, Wenxiang SHE
DIALOGUE EMOTION CORRECTION METHOD BASED ON GRAPH NEURAL NETWORK

Publication number: 20220270636

Abstract: Disclosed is a dialogue emotion correction method based on a graph neural network, including: extracting acoustic features, text features, and image features from a video file to fuse them into multi-modal features; obtaining an emotion prediction result of each sentence of a dialogue in the video file by using the multi-modal features; fusing the emotion prediction result of each sentence with interaction information between talkers in the video file to obtain interaction information fused emotion features; combining, on the basis of the interaction information fused emotion features, with context-dependence relationship in the dialogue to obtain time-series information fused emotion features; correcting, by using the time-series information fused emotion features, the emotion prediction result of each sentence that is obtained previously as to obtain a more accurate emotion recognition result.

Type: Application

Filed: September 10, 2021

Publication date: August 25, 2022

Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua TAO, Zheng LIAN, Bin LIU, Xuefei LIU
Method for detecting voice splicing points and storage medium

Patent number: 11410685

Abstract: Disclosed are a method for detecting speech concatenating points and a storage medium. The method includes: acquiring a speech to be detected, and determining high-frequency components and low-frequency components of the speech to be detected; extracting first cepstrum features and second cepstrum features corresponding to the speech to be detected according to the high-frequency components and the low-frequency components; splicing the first and the second cepstrum feature of speech per frame in the speech to be detected in units of frame so as to obtain a parameter sequence; inputting the parameter sequence into a neural network model so as to obtain a feature sequence corresponding to the speech to be detected, wherein the model has been trained, has learned and stored a correspondence between the parameter sequence and the feature sequence; and performing detection of speech concatenating points on the speech to be detected according to the feature sequence.

Type: Grant

Filed: February 9, 2022

Date of Patent: August 9, 2022

Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua Tao, Ruibo Fu, Jiangyan Yi
Multimodal dimensional emotion recognition method

Patent number: 11281945

Abstract: A multimodal dimensional emotion recognition method includes: acquiring a frame-level audio feature, a frame-level video feature, and a frame-level text feature from an audio, a video, and a corresponding text of a sample to be tested; performing temporal contextual modeling on the frame-level audio feature, the frame-level video feature, and the frame-level text feature respectively by using a temporal convolutional network to obtain a contextual audio feature, a contextual video feature, and a contextual text feature; performing weighted fusion on these three features by using a gated attention mechanism to obtain a multimodal feature; splicing the multimodal feature and these three features together to obtain a spliced feature, and then performing further temporal contextual modeling on the spliced feature by using a temporal convolutional network to obtain a contextual spliced feature; and performing regression prediction on the contextual spliced feature to obtain a final dimensional emotion prediction r

Type: Grant

Filed: September 8, 2021

Date of Patent: March 22, 2022

Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua Tao, Licai Sun, Bin Liu, Zheng Lian
Automatic depression detection method and device, and equipment

Patent number: 11266338

Abstract: An automatic depression detection method includes the following steps of: inputting audio and video files, wherein the audio and video files contain original data in both audio and video modes; conducting segmentation and feature extraction on the audio and video files to obtain a plurality of audio segment horizontal features and video segment horizontal features; combining segment horizontal features into an audio horizontal feature and a video horizontal feature respectively by utilizing a feature evolution pooling objective function; and conducting attentional computation on the segment horizontal features to obtain a video attention audio feature and an audio attention video feature, splicing the audio horizontal feature, the video horizontal feature, the video attention audio feature and the audio attention video feature to form a multimodal spatio-temporal representation, and inputting the multimodal spatio-temporal representation into support vector regression to predict the depression level of indivi

Type: Grant

Filed: July 30, 2021

Date of Patent: March 8, 2022

Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua Tao, Mingyue Niu, Bin Liu, Qifei Li
Multi-modal lie detection method and apparatus, and device

Patent number: 11244119

Abstract: A multi-modal lie detection method and apparatus, and a device to improve an accuracy of an automatic lie detection are provided. The multi-modal lie detection method includes inputting original data of three modalities, namely a to-be-detected audio, a to-be-detected video and a to-be-detected text; performing a feature extraction on input contents to obtain deep features of the three modalities; explicitly depicting first-order, second-order and third-order interactive relationships of the deep features of the three modalities to obtain an integrated multi-modal feature of each word; performing a context modeling on the integrated multi-modal feature of the each word to obtain a final feature of the each word; and pooling the final feature of the each word to obtain global features, and then obtaining a lie classification result by a fully-connected layer.

Type: Grant

Filed: July 30, 2021

Date of Patent: February 8, 2022

Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES

Inventors: Jianhua Tao, Licai Sun, Bin Liu, Zheng Lian

1 2 next