Patents by Inventor Zejun Ma

Zejun Ma has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12614553
    Abstract: Embodiments of the present disclosure provide a method, apparatus, electronic device, and medium for speech processing. The method comprises generating a token-level semantic feature of target speech data based on a frame-level acoustic feature of the target speech data. The method further comprises generating a token-level voiceprint feature of the target speech data based on the frame-level acoustic feature. The method further comprises determining a token in the target speech data where speaker change occurs based on the token-level semantic feature and the token-level voiceprint feature. According to embodiments of the present disclosure, speaker change in speech data is detected at the token level in conjunction with the speaker's acoustic features and speech contents, and speaker-based speech recognition results are output directly without post-processing, simplifying the speech recognition process.
    Type: Grant
    Filed: August 4, 2023
    Date of Patent: April 28, 2026
    Assignee: BEIJING YOUZHUJU NETWORK TECHNOLOGY CO., LTD.
    Inventors: Linhao Dong, Zhenlin Liang, Zhiyun Fan, Yi Liu, Zejun Ma
  • Patent number: 12547834
    Abstract: Provided are an electronic device and a computer readable storage medium. The method includes: acquiring a text to be analyzed; performing token conversion on words in the text to be analyzed to obtain a token sequence to be analyzed, where tokens in token sequences to be analyzed corresponding to texts to be analyzed in different languages belong to a same type; and performing feature extraction on the token sequence to be analyzed, and processing a target task based on the extracted feature, to determine an analysis result for the text to be analyzed.
    Type: Grant
    Filed: September 18, 2023
    Date of Patent: February 10, 2026
    Assignee: BEIJING YOUZHUJU NETWORK TECHNOLOGY CO., LTD.
    Inventors: Yuxiang Zou, Zejun Ma
  • Patent number: 12536424
    Abstract: The present application relates to an intention recognition method and apparatus, a readable medium, and an electronic device. The method includes: by means of a preset intention recognition quantification model, performing a quantification operation on a dot product of a query vector and a key vector which correspond to each character in a target text, so as to obtain a fixed-point type target vector of a first bit; according to the fixed-point type target vector, determining, by means of a target mapping relationship, a floating-point type attention weight of a second bit corresponding to each character; and according to the floating-point type attention weight, determining a target intention corresponding to the target text, the first bit being smaller than the second bit.
    Type: Grant
    Filed: February 16, 2024
    Date of Patent: January 27, 2026
    Assignee: BEIJING YOUZHUJU NETWORK TECHNOLOGY CO., LTD.
    Inventors: Xiaoyang Li, Zilin Yu, Xiangyang Zhang, Xiaogang Tian, Zejun Ma
  • Patent number: 12536990
    Abstract: A model training method, a speech recognition method and apparatus, a medium, and a device are provided. The speech recognition model including an encoder, a CIF prediction sub-model and a CTC prediction sub-model. The model training method includes: encoding training speech data based on the encoder to obtain an acoustic vector sequence corresponding to the training speech data; obtaining an information amount sequence corresponding to the training speech data based on the acoustic vector sequence and the CIF prediction sub-model; obtaining a target probability sequence based on the acoustic vector sequence and the CTC prediction sub-model; determining a target loss of the speech recognition model based on the information amount sequence and the target probability sequence; and updating, in response to an updating condition being satisfied, a model parameter of the speech recognition model based on the target loss.
    Type: Grant
    Filed: May 7, 2022
    Date of Patent: January 27, 2026
    Assignee: BEIJING YOUZHUJU NETWORK TECHNOLOGY CO., LTD.
    Inventors: Linhao Dong, Zejun Ma
  • Publication number: 20260010566
    Abstract: Embodiments of the present disclosure provide an audio recognition method and apparatus, an electronic device, and a computer program product. The method may include obtaining a target feature map of audio data based on a multi-level feature map of the audio data. The method may further include determining a feature representation of the audio data based on the target feature map. In addition, the method may further include determining a recognition result for the audio data at least based on the feature representation. By means of implementing the technical solution of the present disclosure, a determined feature representation has high-resolution position information, thereby optimizing the model performance and improving the user experience.
    Type: Application
    Filed: June 30, 2023
    Publication date: January 8, 2026
    Inventors: Xingjian DU, Huidong LIANG, Bilei ZHU, Zejun MA
  • Patent number: 12444401
    Abstract: A method, apparatus, a computer readable medium, and an electronic device of speech synthesis. The method includes: obtaining a phoneme sequence corresponding to text to be synthesized; generating a phonemic-level TOBI representation sequence and a prosodic-acoustic feature corresponding to the text to be synthesized based on the phoneme sequence and the text to be synthesized, and generating acoustic feature information corresponding to the text to be synthesized based on the TOBI representation sequence and the prosodic-acoustic feature; and generating first audio information corresponding to the text to be synthesized based on the acoustic feature information. The method enables the synthesized audio to be more natural, cadenced, and aligned with the intended semantics of a speaker.
    Type: Grant
    Filed: August 26, 2024
    Date of Patent: October 14, 2025
    Assignee: Beijing Youzhuju Network Technology Co., Ltd.
    Inventors: Haopeng Lin, Zejun Ma
  • Publication number: 20250316149
    Abstract: A tactile signal generation method, a computer readable medium, and an electronic device are provided. The method includes: determining a target frequency domain signal corresponding to a target audio signal to be processed; dividing the target frequency domain signal into a plurality of sub-bands, and determining a target energy value of each of the sub-bands; determining, from a plurality of alternative mapped frequency combinations, a target frequency combination capable of achieving a best tactile feedback effect on a tactile feedback device; and obtaining a tactile feedback signal for tactile feedback according to the target frequency combination.
    Type: Application
    Filed: May 11, 2023
    Publication date: October 9, 2025
    Inventors: Yuzhou GONG, Yangfei XU, Zejun MA
  • Patent number: 12374339
    Abstract: A method, apparatus, device, and storage medium for speaker change point detection, the method including: acquiring target voice data to be detected; and extracting an acoustic feature characterizing acoustic information of the target voice data from the target voice data; encoding the acoustic feature to obtain speaker characterization vectors of the target voice data; integrating and firing the speaker characterization vectors of the target voice data based on a continuous integrate-and-fire CIF mechanism, to obtain a sequence of speaker characterizations in the target voice data; and determining the speaker change points, according to the sequence of the speaker characterizations bounded by the speaker change points in the target voice data. This method can effectively improve the accuracy of the detection result of a speaker change point in target voice data with a type of interaction.
    Type: Grant
    Filed: June 12, 2024
    Date of Patent: July 29, 2025
    Assignee: Beijing Youzhuju Network Technology Co., Ltd.
    Inventors: Linhao Dong, Zhiyun Fan, Zejun Ma
  • Publication number: 20250218425
    Abstract: Here provide a method for prosody prediction, an apparatus, a readable medium, and an electronic device.
    Type: Application
    Filed: March 17, 2023
    Publication date: July 3, 2025
    Inventors: Yuxiang Zou, Zejun Ma
  • Publication number: 20250182776
    Abstract: The disclosure relates to a method for generating a feature encoding model. The method includes: acquiring a plurality of sample audios marked with category labels; extracting audio features of the plurality of sample audios; encoding the audio features of the plurality of sample audios by the feature encoding model to obtain a plurality of encoding vectors of the plurality of sample audios, and performing classification processing on the plurality of sample audios based on the plurality of encoding vectors to obtain category prediction values of the plurality of sample audios; and determining a target loss value of a target loss function based on the plurality of encoding vectors, the category prediction values of the plurality of sample audios and the category labels of the plurality of sample audios, and updating a parameter of the feature encoding model based on the target loss value-se as to obtain the trained feature encoding model.
    Type: Application
    Filed: January 6, 2023
    Publication date: June 5, 2025
    Inventors: Xingjian DU, Zijie WANG, Zhesong YU, Bilei ZHU, Zejun MA
  • Publication number: 20250182743
    Abstract: A method of speech recognition, an apparatus for speech recognition, a computer-readable medium, an electronic device, a computer program product, and a computer program. The method includes: obtaining a target speech signal comprising a plurality of languages to be recognized; recognizing semantics of the target speech signal through a speech recognition model fusing sparse sub-networks of various languages; and the sparse sub-network being obtained by performing a parameter-pruning processing on a multi-language pre-trained model, the multi-language pre-trained model being obtained by training based on a speech signal comprising the plurality of languages.
    Type: Application
    Filed: March 1, 2023
    Publication date: June 5, 2025
    Inventors: Yizhou Lu, Zejun Ma
  • Patent number: 12231872
    Abstract: An audio signal playing method and apparatus, and an electronic device are provided. The method comprises: separating, from a first audio signal, a recorded audio signal corresponding to each of at least one sound source; on the basis of the first audio signal, determining a real-time orientation of each of the at least one sound source relative to the head of a user; for each sound source, according to the real-time orientation of the sound source and the recorded audio signal corresponding to the sound source, generating a target direct audio signal corresponding to the sound source, and generating a target reverberated audio signal corresponding to the sound source; and playing a second audio signal that is generated by means of fusing the target direct audio signal and the target reverberated audio signal corresponding to each sound source.
    Type: Grant
    Filed: February 28, 2024
    Date of Patent: February 18, 2025
    Assignee: Beijing Youzhuju Network Technology Co., Ltd.
    Inventors: Zheng Xue, Yangfei Xu, Wenzhi Fan, Zhifei Zhang, Yuzhou Gong, Zejun Ma
  • Publication number: 20240420678
    Abstract: A method, apparatus, a computer readable medium, and an electronic device of speech synthesis. The method includes: obtaining a phoneme sequence corresponding to text to be synthesized; generating a phonemic-level TOBI representation sequence and a prosodic-acoustic feature corresponding to the text to be synthesized based on the phoneme sequence and the text to be synthesized, and generating acoustic feature information corresponding to the text to be synthesized based on the TOBI representation sequence and the prosodic-acoustic feature; and generating first audio information corresponding to the text to be synthesized based on the acoustic feature information. The method enables the synthesized audio to be more natural, cadenced, and aligned with the intended semantics of a speaker.
    Type: Application
    Filed: August 26, 2024
    Publication date: December 19, 2024
    Inventors: Haopeng Lin, Zejun Ma
  • Publication number: 20240379116
    Abstract: The disclosure relates to an audio caption alignment method and apparatus, a medium, and an electronic device. The method includes: obtaining a target audio and a target caption text of the target audio; obtaining a plurality of first target audios by slicing the target audio according to a slicing duration in a case that a duration of the target audio is greater than a first preset duration; determining first audio feature information of each of the first target audios; obtaining target audio feature information of the target audio by concatenating all of the first audio feature information in a case that the duration of the target audio is less than or equal to a second preset duration, where the second preset duration is greater than the first preset duration; and generating caption information corresponding to the target audio according to the target caption text and the target audio feature information.
    Type: Application
    Filed: May 13, 2024
    Publication date: November 14, 2024
    Inventors: Xiusong SUN, Zejun MA
  • Publication number: 20240331706
    Abstract: A method, apparatus, device, and storage medium for speaker change point detection, the method including: acquiring target voice data to be detected; and extracting an acoustic feature characterizing acoustic information of the target voice data from the target voice data; encoding the acoustic feature to obtain speaker characterization vectors of the target voice data; integrating and firing the speaker characterization vectors of the target voice data based on a continuous integrate-and-fire CIF mechanism, to obtain a sequence of speaker characterizations in the target voice data; and determining the speaker change points, according to the sequence of the speaker characterizations bounded by the speaker change points in the target voice data. This method can effectively improve the accuracy of the detection result of a speaker change point in target voice data with a type of interaction.
    Type: Application
    Filed: June 12, 2024
    Publication date: October 3, 2024
    Inventors: Linhao DONG, Zhiyun FAN, Zejun MA
  • Patent number: 12067987
    Abstract: The present disclosure discloses a method and device of generating acoustic features, speech model training, and speech recognition. By acquiring the acoustic information vector of the current speech frame and the information weight of the current speech frame, and according to the accumulated information weight corresponding to the previous speech frame, the retention rate corresponding to the current speech frame, and the information weight of the current speech frame, the accumulated information weight corresponding to the current speech frame can be obtained. The retention rate is the difference between 1 and a leakage rate.
    Type: Grant
    Filed: January 30, 2024
    Date of Patent: August 20, 2024
    Assignee: BEIJING YOUZHUJU NETWORK TECHNOLOGY CO., LTD.
    Inventors: Linhao Dong, Zejun Ma
  • Patent number: 12039981
    Abstract: A method, apparatus, device, and storage medium for speaker change point detection, the method including: acquiring target voice data to be detected; and extracting an acoustic feature characterizing acoustic information of the target voice data from the target voice data; encoding the acoustic feature to obtain speaker characterization vectors at a voice frame level of the target voice data; integrating and firing the speaker characterization vectors at the voice frame level of the target voice data based on a continuous integrate-and-fire CIF mechanism, to obtain a sequence of speaker characterizations bounded by speaker change points in the target voice data; and determining a timestamp corresponding to the speaker change points, according to the sequence of the speaker characterizations bounded by the speaker change points in the target voice data.
    Type: Grant
    Filed: December 22, 2023
    Date of Patent: July 16, 2024
    Assignee: BEIJING YOUZHUJU NETWORK TECHNOLOGY CO., LTD.
    Inventors: Linhao Dong, Zhiyun Fan, Zejun Ma
  • Publication number: 20240221729
    Abstract: The present disclosure provides a voice recognition method and apparatus, a medium, and an electronic device. The method includes: encoding received voice data to obtain an acoustic vector sequence corresponding to the voice data; obtaining, according to the acoustic vector sequence and a first prediction model, an information amount sequence corresponding to the voice data and a first probability sequence corresponding to the voice data; obtaining a second probability sequence according to the acoustic vector sequence and a second prediction model; determining a target probability sequence according to the first probability sequence and the second probability sequence; and determining a target text corresponding to the voice data according to the target probability sequence.
    Type: Application
    Filed: May 7, 2022
    Publication date: July 4, 2024
    Inventors: Linhao DONG, Zejun MA
  • Publication number: 20240205634
    Abstract: An audio signal playing method and apparatus, and an electronic device are provided. The method comprises: separating, from a first audio signal, a recorded audio signal corresponding to each of at least one sound source; on the basis of the first audio signal, determining a real-time orientation of each of the at least one sound source relative to the head of a user; for each sound source, according to the real-time orientation of the sound source and the recorded audio signal corresponding to the sound source, generating a target direct audio signal corresponding to the sound source, and generating a target reverberated audio signal corresponding to the sound source; and playing a second audio signal that is generated by means of fusing the target direct audio signal and the target reverberated audio signal corresponding to each sound source.
    Type: Application
    Filed: February 28, 2024
    Publication date: June 20, 2024
    Inventors: Zheng XUE, Yangfei XU, Wenzhi FAN, Zhifei ZHANG, Yuzhou GONG, Zejun MA
  • Publication number: 20240185046
    Abstract: The present application relates to an intention recognition method and apparatus, a readable medium, and an electronic device. The method includes: by means of a preset intention recognition quantification model, performing a quantification operation on a dot product of a query vector and a key vector which correspond to each character in a target text, so as to obtain a fixed-point type target vector of a first bit; according to the fixed-point type target vector, determining, by means of a target mapping relationship, a floating-point type attention weight of a second bit corresponding to each character; and according to the floating-point type attention weight, determining a target intention corresponding to the target text, the first bit being smaller than the second bit.
    Type: Application
    Filed: February 16, 2024
    Publication date: June 6, 2024
    Inventors: Xiaoyang LI, Zilin YU, Xiangyang ZHANG, Xiaogang TIAN, Zejun MA