Abstract: Speech signal information is formatted, processed and transported in accordance with a format adapted for TCP/IP protocols used on the Internet and other communications networks. NULL characters are used for indicating the end of a voice segment. The method is useful for distributed speech recognition systems such as a client-server system, typically implemented on an intranet or over the Internet based on user queries at his/her computer, a PDA, or a workstation using a speech input interface.
Abstract: A speech recognition device that is capable of presenting, to a user in an easy-to-understand manner, whether or not the user's utterance is a word unregistered in a speech recognition dictionary and whether or not the utterance should be repeated due to a recognition error includes: a speech recognition vocabulary storage unit (102) which defines vocabulary for speech recognition; a speech recognition unit (101) which checks the uttered speech against the registered words; a reference similarity calculation unit (103) which calculates a similarity between the uttered speech and a combination of acoustic units, which are subwords; an unregistered word judgment unit (104) which judges, based on the result of the check by the speech recognition unit (101) and a result of the calculation performed by the reference similarity calculation unit (103), whether the uttered speech is a registered word or an unregistered word; an unregistered word storage (106) which stores unregistered words; an unregistered word cand
Abstract: The present invention can include a speech enrollment system including an ordered stack of grammars and a recognition engine. The ordered stack of grammars can include an application grammars layer, a confusable grammar layer, a personal grammar layer, a phrase enrolled grammar layer, and an enrollment grammar layer. The recognition engine can return recognition results for speech input by processing the input using the ordered stack of grammars. The processing can occur from the topmost layer in the stack to the bottommost layer in the stack. Each layer in the stack can includes exit criteria based upon a defined condition. When the exit criteria is satisfied, a result can be returned based upon that layer and lower layers of the ordered stack can be ignored.
Type:
Application
Filed:
December 22, 2006
Publication date:
June 26, 2008
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Abstract: A system and method for improving voice recognition processing at a server system that receives voice input from a remotely located user system. The user system includes a microphone, a processor that performs front-end voice recognition processing of the received user voice input, and a communication component configured to send the front-end processed user voice input to a destination wirelessly over a network. The server system includes a communication component configured to receive the sent front-end processed user voice input, and a processor configured to complete voice recognition processing of the sent front-end processed user voice input.
Abstract: A voice dialing method includes the steps of receiving an utterance from a user, decoding the utterance to identify a recognition result for the utterance, and communicating to the user the recognition result. If an indication is received from the user that the communicated recognition result is incorrect, then it is added to a rejection reference. Then, when the user repeats the misunderstood utterance, the rejection reference can be used to eliminate the incorrect recognition result as a potential subsequent recognition result. The method can be used for single or multiple digits or digit strings.
Type:
Application
Filed:
November 28, 2006
Publication date:
May 29, 2008
Applicant:
GENERAL MOTORS CORPORATION
Inventors:
Jason W. Clark, Rathinavelu Chengalvarayan, Timothy J. Grost, Dana B. Fecher, Jeremy M. Spaulding
Abstract: When the degree of similarity of the recognition candidates is greater than the second threshold value, the speech verification unit outputs the recognition candidates as a recognition result, and when the degree of similarity of the recognition candidates is smaller than the second threshold value, it outputs the recognition candidates as a recognition result if the degree of similarity of the recognition candidates is greater than the first threshold value and, at the same time, the degree of similarity of the recognition candidates is greater than the degree of similarity of the rejection candidates. It should be noted that the first threshold value is a measure used for rejecting input speech. The second threshold value is larger than the first threshold value and is used as a measure for outputting recognition candidates as a recognition result.