Abstract: A multi-person speech separation method is provided for a terminal. The method includes extracting a hybrid speech feature from a hybrid speech signal requiring separation, N human voices being mixed in the hybrid speech signal, N being a positive integer greater than or equal to 2; extracting a masking coefficient of the hybrid speech feature by using a generative adversarial network (GAN) model, to obtain a masking matrix corresponding to the N human voices, wherein the GAN model comprises a generative network model and an adversarial network model; and performing a speech separation on the masking matrix corresponding to the N human voices and the hybrid speech signal by using the GAN model, and outputting N separated speech signals corresponding to the N human voices.
Type:
Grant
Filed:
September 17, 2020
Date of Patent:
September 20, 2022
Assignee:
TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
Inventors:
Lianwu Chen, Meng Yu, Yanmin Qian, Dan Su, Dong Yu
Abstract: A wave-source-direction estimation device includes: a plurality of input units that acquires, as input signals, electrical signals based on waves detected by a plurality of sensors; a signal selection unit that selects a plurality of pairs that are each a combination of two input signals from among a plurality of the input signals; a relative delay time calculation unit that calculates, as relative delay times, arrival time differences of the waves at the sensors that are supply sources of the two input signals composing each of the pairs, for each wave source direction; and an integrated-estimated-direction-information calculation unit that generates per-frequency estimated direction information for each of the pairs using the input signals composing each of the pairs and the relative delay times of each of the pairs and generates integrated estimated direction information by assigning a weight to and integrating the estimated direction information on all the pairs.
Abstract: Hough transform is performed on the point groups forming two dimensional data to generate a plurality of loci respectively corresponding to each of the point groups in a Hough voting space. When adding a voting value to a position in the Hough voting space through which the plurality of loci passes, addition is performed by varying the voting value based on a level difference between first and second signals respectively indicated by the two pieces of frequency decomposition information.