Abstract: The present invention relates to a method and apparatus for processing a voice signal, and the voice signal encoding method according to the present invention comprises the steps of: generating transform coefficients of sine wave components forming an input voice signal by transforming the sine wave components; determining transform coefficients to be encoded from the generated transform coefficients; and transmitting indication information indicating the determined transform coefficients, wherein the indication information may include position information, magnitude information, and sign information of the transform coefficients.
Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for inputting speech data that corresponds to a particular utterance to a neural network; determining an evaluation vector based on output at a hidden layer of the neural network; comparing the evaluation vector with a reference vector that corresponds to a past utterance of a particular speaker; and based on comparing the evaluation vector and the reference vector, determining whether the particular utterance was likely spoken by the particular speaker.
Type:
Grant
Filed:
March 28, 2014
Date of Patent:
July 26, 2016
Assignee:
Google Inc.
Inventors:
Xin Lei, Erik McDermott, Ehsan Variani, Ignacio L. Moreno
Abstract: Disclosed are an apparatus and a method for detecting a speech endpoint using a WFST. The apparatus in accordance with an embodiment of the present invention includes: a speech decision portion configured to receive frame units of feature vector converted from a speech signal and to analyze and classify the received feature vector into a speech class or a noise class; a frame level WFST configured to receive the speech class and the noise class and to convert the speech class and the noise class to a WFST format; a speech level WFST configured to detect a speech endpoint by analyzing a relationship between the speech class and noise class and a preset state; a WFST combination portion configured to combine the frame level WFST with the speech level WFST; and an optimization portion configured to optimize the combined WFST having the frame level WFST and the speech level WFST combined therein to have a minimum route.
Type:
Grant
Filed:
March 25, 2014
Date of Patent:
July 19, 2016
Assignee:
ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
Abstract: A candidate selection apparatus utilizing voice recognition includes an association unit that associates target candidates with candidate numbers so that numerals of the target candidates coincide with numerals of the candidate numbers when the target candidates to be displayed in list form are character strings representing the numerals of the candidate numbers, and a display control unit that displays the target candidates and the candidate numbers in list form in accordance with the associations made between the target candidates and the candidate numbers.