Abstract: Disclosed is a speaker diarization process for determining which speaker is speaking at what time during the course of a conversation. The entire process can be most easily described in five main parts: Segmentation where speech/non-speech decisions are made; frame feature extraction where useful information is obtained from the frames; segment modeling where the information from the frame feature extraction is combined with segment start and end time information to create segment specific features; speaker decisions when the segments are clustered to create speaker models; and corrections where frame level corrections are applied to the information extracted.
Type:
Grant
Filed:
May 3, 2016
Date of Patent:
July 17, 2018
Assignee:
SESTEK Ses velletisim Bilgisayar Tekn. San. Ve Tic A.S.
Inventors:
Mustafa Levent Arslan, Mustafa Erden, Sedat Demirba{hacek over (g)}, Gökçe Sarar