Patents by Inventor Jiangyan Yi

Jiangyan Yi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11763836
    Abstract: Disclosed is a hierarchical generated audio detection system, comprising an audio preprocessing module, a CQCC feature extraction module, a LFCC feature extraction module, a first-stage lightweight coarse-level detection model and a second-stage fine-level deep identification model; the audio preprocessing module preprocesses collected audio or video data to obtain an audio clip with a length not exceeding the limit; inputting the audio clip into CQCC feature extraction module and LFCC feature extraction module respectively to obtain CQCC feature and LFCC feature; inputting CQCC feature or LFCC feature into the first-stage lightweight coarse-level detection model for first-stage screening to screen out the first-stage real audio and the first-stage generated audio; inputting the CQCC feature or LFCC feature of the first-stage generated audio into the second-stage fine-level deep identification model to identify the second-stage real audio and the second-stage generated audio, and the second-stage generated au
    Type: Grant
    Filed: February 17, 2022
    Date of Patent: September 19, 2023
    Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
    Inventors: Jianhua Tao, Zhengkun Tian, Jiangyan Yi
  • Patent number: 11636871
    Abstract: Disclosed are a method, an electronic apparatus for detecting tampering audio and a storage medium. The method includes: acquiring a signal to be detected, and performing a wavelet transform of a first preset order on the signal to be detected so as to obtain a first low-frequency coefficient and a first high-frequency coefficient corresponding to the signal to be detected, the number of which is equal to that of the first preset order; performing an inverse wavelet transform on the first high-frequency coefficient having an order greater than or equal to a second preset order so as to obtain a first high-frequency component signal corresponding to the signal to be detected; calculating a first Mel cepstrum feature of the first high-frequency component signal in units of frame, and concatenating the first Mel cepstrum features of a current frame signal and a preset number of frame signals.
    Type: Grant
    Filed: February 8, 2022
    Date of Patent: April 25, 2023
    Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
    Inventors: Jianhua Tao, Shan Liang, Shuai Nie, Jiangyan Yi
  • Publication number: 20230076251
    Abstract: Disclosed are a method, an electronic apparatus for detecting tampering audio and a storage medium. The method includes: acquiring a signal to be detected, and performing a wavelet transform of a first preset order on the signal to be detected so as to obtain a first low-frequency coefficient and a first high-frequency coefficient corresponding to the signal to be detected, the number of which is equal to that of the first preset order; performing an inverse wavelet transform on the first high-frequency coefficient having an order greater than or equal to a second preset order so as to obtain a first high-frequency component signal corresponding to the signal to be detected; calculating a first Mel cepstrum feature of the first high-frequency component signal in units of frame, and concatenating the first Mel cepstrum features of a current frame signal and a preset number of frame signals.
    Type: Application
    Filed: February 8, 2022
    Publication date: March 9, 2023
    Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
    Inventors: Jianhua TAO, Shan LIANG, Shuai NIE, Jiangyan YI
  • Patent number: 11580957
    Abstract: Disclosed are a method for training speech recognition model, a method and a system for speech recognition. The disclosure relates to field of speech recognition and includes: inputting an audio training sample into the acoustic encoder to represent acoustic features of the audio training sample in an encoded way and determine an acoustic encoded state vector; inputting a preset vocabulary into the language predictor to determine text prediction vector; inputting the text prediction vector into the text mapping layer to obtain a text output probability distribution; calculating a first loss function according to a target text sequence corresponding to the audio training sample and the text output probability distribution; inputting the text prediction vector and the acoustic encoded state vector into the joint network to calculate a second loss function, and performing iterative optimization according to the first loss function and the second loss function.
    Type: Grant
    Filed: June 9, 2022
    Date of Patent: February 14, 2023
    Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
    Inventors: Jianhua Tao, Zhengkun Tian, Jiangyan Yi
  • Publication number: 20230027645
    Abstract: Disclosed is a hierarchical generated audio detection system, comprising an audio preprocessing module, a CQCC feature extraction module, a LFCC feature extraction module, a first-stage lightweight coarse-level detection model and a second-stage fine-level deep identification model; the audio preprocessing module preprocesses collected audio or video data to obtain an audio clip with a length not exceeding the limit; inputting the audio clip into CQCC feature extraction module and LFCC feature extraction module respectively to obtain CQCC feature and LFCC feature; inputting CQCC feature or LFCC feature into the first-stage lightweight coarse-level detection model for first-stage screening to screen out the first-stage real audio and the first-stage generated audio; inputting the CQCC feature or LFCC feature of the first-stage generated audio into the second-stage fine-level deep identification model to identify the second-stage real audio and the second-stage generated audio, and the second-stage generated au
    Type: Application
    Filed: February 17, 2022
    Publication date: January 26, 2023
    Applicant: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
    Inventors: Jianhua TAO, Zhengkun TIAN, Jiangyan YI
  • Patent number: 11521629
    Abstract: Disclosed is a digital audio tampering forensics method based on phase offset detection, comprising: multiplying a signal to be identified with a time label to obtain a modulation signal of the signal to be identified; then, performing a short-time Fourier transform on the signal to be identified and the modulation signal to obtain a signal power spectrum and a modulation signal power spectrum; computing group delay characteristics by using the signal power spectrum and the modulation signal power spectrum; computing a mean value of the group delay characteristics, and then using the mean value results for smoothing computation to obtain phase information of a current frame signal; computing a dynamic threshold by using the phase information of the current frame signal, and then deciding whether the signal is tampered by using the dynamic threshold and the phase information of the current frame signal.
    Type: Grant
    Filed: February 9, 2022
    Date of Patent: December 6, 2022
    Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
    Inventors: Jianhua Tao, Shan Liang, Shuai Nie, Jiangyan Yi
  • Patent number: 11501759
    Abstract: Disclosed are a method and a system for speech recognition, an electronic device and a storage medium, which relates to the technical field of speech recognition. Embodiments of the application comprise performing encoded representation on an audio to be recognized to obtain an acoustic encoded state vector sequence of the audio to be recognized; performing sparse encoding on the acoustic encoded state vector sequence of the audio to be recognized to obtain an acoustic encoded sparse vector; determining a text prediction vector of each label in a preset vocabulary; recognizing the audio to be recognized and determining a text content corresponding to the audio to be recognized according to the acoustic encoded sparse vector and the text prediction vector. The acoustic encoded sparse vector of the audio to be recognized is obtained by performing sparse encoding on the acoustic encoded state vector of the audio to be recognized.
    Type: Grant
    Filed: July 19, 2022
    Date of Patent: November 15, 2022
    Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
    Inventors: Jianhua Tao, Zhengkun Tian, Jiangyan Yi
  • Patent number: 11488586
    Abstract: Disclosed is a system for speech recognition text enhancement fusing multi-modal semantic invariance, the system includes an acoustic feature extraction module, an acoustic down-sampling module, an acoustic feature extraction module, an acoustic down-sampling module, an encoder and a decoder fusing multi-modal semantic invariance; the acoustic feature extraction module is configured for frame-dividing processing of speech data, dividing the speech data into short-term audio frames with a fixed length, extracting thank acoustic features from the short-term audio frames, and inputting the acoustic features into the acoustic down-sampling module for down-sampling to obtain an acoustic representation; inputting the speech data into an existing speech recognition module to obtain input text data, and inputting the input text data into the encoder to obtain an input text encoded representation; inputting the acoustic representation and the input text encoded representation into the decoder to fuse.
    Type: Grant
    Filed: July 19, 2022
    Date of Patent: November 1, 2022
    Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
    Inventors: Jianhua Tao, Shuai Zhang, Jiangyan Yi
  • Patent number: 11475877
    Abstract: Disclosed are an end-to-end system for speech recognition and speech translation and an electronic device. The system comprises an acoustic encoder and a multi-task decoder and a semantic invariance constraint module, and completes two tasks for speech recognition and speech translation. In addition, according to the characteristic of the semantic consistency of texts between different tasks, semantic constraints are imposed on the model to learn high-level semantic information, and the semantic information can effectively improve the performance of speech recognition and speech translation. The application has the following advantages that the error accumulation problem of serial system is avoided, and the calculation cost of the model is low and the real-time performance is very high.
    Type: Grant
    Filed: June 28, 2022
    Date of Patent: October 18, 2022
    Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
    Inventors: Jianhua Tao, Shuai Zhang, Jiangyan Yi
  • Patent number: 11462207
    Abstract: Disclosed are a method and an apparatus for editing audio, an electronic device and a storage medium. The method includes: acquiring a modified text obtained by modifying a known original text of an audio to be edited according to a known text for modification; predicting a duration of an audio corresponding to the text for modification; adjusting a region to be edited of the audio to be edited according to the duration of the audio corresponding to the text for modification, to obtain an adjusted audio to be edited; obtaining, based on a pre-trained audio editing model, an edited audio according to the adjusted audio to be edited and the modified text. In the present disclosure, the edited audio obtained by the audio editing model sounds natural in the context, and supports the function of synthesizing new words that do not appear in the corpus.
    Type: Grant
    Filed: May 5, 2022
    Date of Patent: October 4, 2022
    Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
    Inventors: Jianhua Tao, Tao Wang, Jiangyan Yi, Ruibo Fu
  • Patent number: 11410685
    Abstract: Disclosed are a method for detecting speech concatenating points and a storage medium. The method includes: acquiring a speech to be detected, and determining high-frequency components and low-frequency components of the speech to be detected; extracting first cepstrum features and second cepstrum features corresponding to the speech to be detected according to the high-frequency components and the low-frequency components; splicing the first and the second cepstrum feature of speech per frame in the speech to be detected in units of frame so as to obtain a parameter sequence; inputting the parameter sequence into a neural network model so as to obtain a feature sequence corresponding to the speech to be detected, wherein the model has been trained, has learned and stored a correspondence between the parameter sequence and the feature sequence; and performing detection of speech concatenating points on the speech to be detected according to the feature sequence.
    Type: Grant
    Filed: February 9, 2022
    Date of Patent: August 9, 2022
    Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
    Inventors: Jianhua Tao, Ruibo Fu, Jiangyan Yi