Abstract: A large amount of human labor is required to transcribe and annotate a training corpus that is needed to create and update models for automatic speech recognition (ASR) and spoken language understanding (SLU). Active learning enables a reduction in the amount of transcribed and annotated data required to train ASR and SLU models. In one aspect of the present invention, an active learning ASR process and active learning SLU process are coupled, thereby enabling further efficiencies to be gained relative to a process that maintains an isolation of data in both the ASR and SLU domains.
Type:
Grant
Filed:
September 26, 2007
Date of Patent:
July 14, 2009
Assignee:
AT&T Intellectual Property II, L.P.
Inventors:
Dilek Z Hakkani-Tur, Mazin G Rahim, Giuseppe Riccardi, Gokhan Tur
Abstract: Speech likeliness or a degree of speech is determined with a simple configuration or with a small amount of processing, and speech parts are separated from an input sound signal. The input sound signal is subjected to a waveform slicing process in frame units. The increase and decrease rate of a half wavelength in the frame is computed. The rate of a zero cross in the frame is computed. The increase and decrease rate of a half wavelength is computed by determining the rate of the portion where the upward half-wavelength or the downward half-wavelength of the waveform of the input sound signal changes to increase and decrease alternately or to decrease and increase alternately. The degree of speech is determined using each rate. Speech processing for separating or accentuating/attenuating speech and background noise in accordance with the degree of speech is performed on the sound signal for each frame.
Abstract: The present invention provides a dialogue system in which semantic ambiguity is reduced by selectively choosing which semantic structures are to be made available for parsing based on previous information obtained from the user or other context information. In one embodiment, the semantic grammar used by the parser is altered so that the grammar is focused based on information about the user or the dialogue state. In other embodiments, the semantic parsing is focused on certain parse structures by giving preference to structures that the dialogue system has marked as being expected.
Abstract: A method for performing a frame erasure concealment for a higher-band signal involves calculating a periodic intensity of the higher-band signal with respect to pitch period information of a lower-band signal; comparing the periodic intensity to a preconfigured threshold and, if the periodic intensity is greater or equal to the preconfigured threshold, performing the frame erasure concealment with a pitch period repetition based method. If the periodic intensity is less than the preconfigured threshold, performing the frame erasure concealment with a previous frame data repetition based method. A device for performing a frame erasure concealment includes a periodic intensity calculation module, a pitch period repetition module, and a previous frame data repetition module.
Type:
Grant
Filed:
November 18, 2008
Date of Patent:
June 23, 2009
Assignee:
Huawei Technologies Co., Ltd.
Inventors:
Jianfeng Xu, Lei Miao, Chen Hu, Qing Zhang, Lijing Xu, Wei Li, Zhengzhong Du, Yi Yang, Fengyan Qi, Wuzhou Zhan, Dongqi Wang
Abstract: An apparatus selects from among a plurality of translation records a translation record for use in translation of a newly received text. Each of the translation records stores with respect to past translation results at least one pair of a source-language text, being a divided part corresponding to a translation segment in the received text, and a target-language text corresponding to the source-language text. A first key generation unit generates an input key for each of the translation segments in the received text. A translation segment is encoded based on a predetermined conversion rule. An acquisition unit acquires a translation record key in which a source-language text is encoded based on the predetermined conversion rule. A key search unit determines whether a translation record key identical with each of the input keys is present or not. A first count unit counts a quantity of input keys that have translation record keys identical with each of the translation records.
Type:
Grant
Filed:
December 8, 2006
Date of Patent:
June 16, 2009
Assignee:
International Business Machines Corporation
Abstract: Emails that are generated, as part of an automated or semi-automated process, are to be language sensitive. It is possible, by determining the preferred language of the user of a company's information services, to customize the communication to the user, or in the case of a user using a server generating electronic mail message to a company to customize messages to the company, based on users chosen language or an automatically determined language. This customization affects two levels of an electronic mail message. The first is the actual header information provided to the custom electronic mail message to allow proper interpretation of the electronic mail message at the receiving end. The second is to customize information to be written to the addressee section of the body of the electronic mail message.
Abstract: A method for evaluating contents of a message is provided. The method initiates with characterizing a message segment. Then, the message is scanned to define tokens associated with the message segment. Next, the tokens are parsed to define substructures. Then, the rules associated with the tokens are determined, wherein the rules define actions. At the same time determining the session or meta session associated with the communication. Then, the actions associated with the message are executed. Next, the message is queued to be sent out. A method for providing content based security, a computer readable media, an adapter card and a network device configured to provide content based security and an intrusion protection system are provided.
Abstract: A processor configured to identify message contents is provided. The processor includes a message characterization block configured to characterize a message through analysis of header information associated with the message. A semantic processing block configured to translate the message into tokens associated with segments of the message is included. The semantic processing block identifies rules associated with each of the tokens and the semantic processing block is configured to apply the identified rules to the message. A queuing block configured to queue the message to be transmitted from the processor is included. A method for providing content based security, a computer readable media, an adapter card and a network device configured to provide content based security and an intrusion protection system are provided.
Abstract: A system and method is disclosed for detecting and repairing audio recordings that contain busy signals and extended periods of silence by searching for clusters of silence by reviewing the amplitude in an audio recording sample and listing each silence and sample time.
Abstract: A two stage utterance verification device and a method thereof are provided. The two stage utterance verification method includes performing a first utterance verification function based on a SVM pattern classification method by using feature data inputted from a search block of a speech recognizer and performing a second utterance verification function based on a CART pattern classification method by using heterogeneity feature data including meta data extracted from a preprocessing module, intermediate results from function blocks of the speech recognizer and the result of the first utterance verification function. Therefore, the two state utterance verification device and the method thereof provide a high quality speech recognition service to a user.
Type:
Grant
Filed:
April 1, 2005
Date of Patent:
May 5, 2009
Assignee:
Electronics and Telecommunications Research Institute
Abstract: A method for authoring a grammar for use in a language processing application is provided. The method includes receiving at least one grammar configuration parameter relating to how to configure a grammar and creating the grammar based on the at least one grammar configuration parameter.
Abstract: A voice based multimodal speaker authentication method and telecommunications application thereof employing a speaker adaptive method for training phenome specific Gaussian mixture models. Applied to telecommunications services, the method may advantageously be implemented in contemporary wireless terminals.
Abstract: A voice control system receives a voice command from a user via a microphone. A command executability determination circuit determined whether the voice command is executable in the current function setting of a target device controlled by the voice control system. If the command is executable, the command is executed. If the command is inexecutable, a proper usage of the voice command and executable commands related to the voice command are provided to the user. Then, the user is prompted to use one of the executable commands. In other words, the user is notified of the reason why the voice command is not executed and a proper usage of the voice command.
Abstract: A weighted search program is disclosed. The weighted search program may be integrated into a translation program, or the weighted search program may be used independently with an available search engine. When integrated with the translation program, setting and weighting may be combined in a single search. In one embodiment, the weighting would be used in conjunction with a Pin Yin translation program so that a user could set some terms, and allocate a search weight to the remaining terms. The invention may be applied independently in Internet searching so that a user can apply weights to multiple elements of a search term.
Type:
Grant
Filed:
April 19, 2005
Date of Patent:
April 7, 2009
Assignee:
International Business Machines Corporation
Inventors:
Yen-Fu Chen, John W. Dunsmoir, Hari Shankar
Abstract: An apparatus for processing a speech signal includes a receiver, a speech signal decoder, a speech rate conversion information detector, and a speech rate converting processor. The receiver receives multiplexed signal of information concerning controls and programs, including speech packets through a transmission line. The decoder decodes the speech signal of packets out of the received signals. The detector detects speech rate conversion execution information in the received signals. The processor subjects the decoded speech signal to a speech rate conversion process if the speech rate conversion execution information indicates that the speech signal has not been subjected to the speech rate conversion process on the transmitting end, and which does not subject the decoded speech signal to the speech rate conversion process if the speech rate conversion execution information indicates that the speech signal has been subjected to the speech rate conversion process on the transmitting end.
Abstract: A new approach to speech recognition that reacts to concepts conveyed through speech, which shifts the balance of power in speech recognition from straight sound recognition and statistical models to a more powerful and complete approach determining and addressing conveyed concepts. A probabilistically unbiased multi-phoneme recognition process is employed, followed by a phoneme stream analysis process that builds the list of candidate words derived from recognized phonemes, followed by a permutation analysis process that produces sequences of candidate words with high potential of being syntactically valid, and finally, by processing targeted syntactic sequences in a conceptual analysis process to generate the utterance's conceptual representation that can be used to produce an adequate response.
Abstract: An audio encoder and decoder use architectures and techniques that improve the efficiency of multi-channel audio coding and decoding. The described strategies include various techniques and tools, which can be used in combination or independently. For example, an audio encoder performs a pre-processing multi-channel transform on multi-channel audio data, varying the transform so as to control quality. The encoder groups multiple windows from different channels into one or more tiles and outputs tile configuration information, which allows the encoder to isolate transients that appear in a particular channel with small windows, but use large windows in other channels. Using a variety of techniques, the encoder performs flexible multi-channel transforms that effectively take advantage of inter-channel correlation. An audio decoder performs corresponding processing and decoding. In addition, the decoder performs a post-processing multi-channel transform for any of multiple different purposes.
Abstract: Input is received from at least two different input sources. Information from these sources are combined together to provide a result. In a particular example, input from one source corresponds to potential recognition candidates, and input from another source corresponds to other potential candidates. These candidates are combined together to select a result.
Type:
Grant
Filed:
June 28, 2005
Date of Patent:
February 24, 2009
Assignee:
Microsoft Corporation
Inventors:
Frank Kao-Ping Soong, Jian-Lai Zhou, Ye Tian
Abstract: A text processing system for processing multi-lingual text for a speech synthesizer includes a first language dependent module for performing at least one of text and prosody analysis on a portion of input text comprising a first language. A second language dependent module performs at least one of text and prosody analysis on a second portion of input text comprising a second language. A third module is adapted to receive outputs from the first and second dependent module and performs prosodic and phonetic context abstraction over the outputs based on multi-lingual text.
Abstract: There is disclosed a method and system for preparing a document to be read by a text-to-speech reader. The method can include identifying two or more voice types available to the text-to-speech reader, identifying the text elements within the document, grouping related text elements together, and classifying the text elements according to voice types available to the text-to-speech reader. The method of grouping the related text elements together can include syntactic and intelligent clustering. The classification of text elements can include performing latent semantic analysis on the text elements and characteristics of the available voice types.
Type:
Grant
Filed:
June 26, 2003
Date of Patent:
February 10, 2009
Assignee:
International Business Machines Corporation