Training (epo) Patents (Class 704/E15.008)

Apparatus and Methods Using a Pattern Matching Speech Recognition Engine to Train a Natural Language Speech Recognition Engine

Publication number: 20130332158

Abstract: The technology of the present application provides a speech recognition system with at least two different speech recognition engines or a single engine speech recognition engine with at least two different modes of operation. The first speech recognition being used to match audio to text, which text may be words or phrases. The matched audio and text is used by a training module to train a user profile for a natural language speech recognition engine, which is at least one of the two different speech recognition engines or modes. An evaluation module evaluates when the user profile is sufficiently trained to convert the speech recognition engine from the first speech recognition engine or mode to the natural language speech recognition or mode.

Type: Application

Filed: June 8, 2012

Publication date: December 12, 2013

Applicant: NVOQ INCORPORATED

Inventors: Charles Corfield, Brian Marquette
METHOD AND SYSTEM FOR AUTOMATIC DOMAIN ADAPTATION IN SPEECH RECOGNITION APPLICATIONS

Publication number: 20130262106

Abstract: A system and method for adapting a language model to a specific environment by receiving interactions captured the specific environment, generating a collection of documents from documents retrieved from external resources, detecting in the collection of documents terms related to the environment that are not included in an initial language model and adapting the initial language model to include the terms detected.

Type: Application

Filed: March 29, 2012

Publication date: October 3, 2013

Inventors: Eyal HURVITZ, Ezra Daya, Oren Pereg, Moshe Wasserblat
Crowdsourced, Grounded Language for Intent Modeling in Conversational Interfaces

Publication number: 20130262114

Abstract: Different advantageous embodiments provide a crowdsourcing method for modeling user intent in conversational interfaces. One or more stimuli are presented to a plurality of describers. One or more sets of describer data are captured from the plurality of describers using a data collection mechanism. The one or more sets of describer data are processed to generate one or more models. Each of the one or more models is associated with a specific stimulus from the one or more stimuli.

Type: Application

Filed: April 3, 2012

Publication date: October 3, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Christopher John Brockett, Piali Choudhury, William Brennan Dolan, Yun-Cheng Ju, Patrick Pantel, Noelle Mallory Sophy, Svitlana Volkova
Semi-Supervised Source Separation Using Non-Negative Techniques

Publication number: 20130132077

Abstract: Systems and methods for semi-supervised source separation using non-negative techniques are described. In some embodiments, various techniques disclosed herein may enable the separation of signals present within a mixture, where one or more of the signals may be emitted by one or more different sources. In audio-related applications, for instance, a signal mixture may include speech (e.g., from a human speaker) and noise (e.g., background noise). In some cases, speech may be separated from noise using a speech model developed from training data. A noise model may be created, for example, during the separation process (e.g., “on-the-fly”) and in the absence of corresponding training data.

Type: Application

Filed: May 27, 2011

Publication date: May 23, 2013

Inventors: Gautham J. Mysore, Paris Smaragdis
SYSTEM AND METHOD FOR APPLYING DYNAMIC CONTEXTUAL GRAMMARS AND LANGUAGE MODELS TO IMPROVE AUTOMATIC SPEECH RECOGNITION ACCURACY

Publication number: 20130006632

Abstract: The invention involves the loading and unloading of dynamic section grammars and language models in a speech recognition system. The values of the sections of the structured document are either determined in advance from a collection of documents of the same domain, document type, and speaker; or collected incrementally from documents of the same domain, document type, and speaker; or added incrementally to an already existing set of values. Speech recognition in the context of the given field is constrained to the contents of these dynamic values. If speech recognition fails or produces a poor match within this grammar or section language model, speech recognition against a larger, more general vocabulary that is not constrained to the given section is performed.

Type: Application

Filed: September 12, 2012

Publication date: January 3, 2013

Inventors: Alwin B. Carus, Larissa Lapshina, Raghu Vemula
LEARNING SPEECH MODELS FOR MOBILE DEVICE USERS

Publication number: 20130006633

Abstract: Techniques are provided to recognize a speaker's voice. In one embodiment, received audio data may be separated into a plurality of signals. For each signal, the signal may be associated with value/s for one or more features (e.g., Mel-Frequency Cepstral coefficients). The received data may be clustered (e.g., by clustering features associated with the signals). A predominate voice cluster may be identified and associated with a user. A speech model (e.g., a Gaussian Mixture Model or Hidden Markov Model) may be trained based on data associated with the predominate cluster. A received audio signal may then be processed using the speech model to, e.g.: determine who was speaking; determine whether the user was speaking; determining whether anyone was speaking; and/or determine what words were said. A context of the device or the user may then be inferred based at least partly on the processed signal.

Type: Application

Filed: January 5, 2012

Publication date: January 3, 2013

Applicant: QUALCOMM Incorporated

Inventors: Leonard Henry Grokop, Vidya Narayanan
STATE DETECTING APPARATUS, COMMUNICATION APPARATUS, AND STORAGE MEDIUM STORING STATE DETECTING PROGRAM

Publication number: 20130006630

Abstract: A state detecting apparatus includes: a processor to execute acquiring utterance data related to uttered speech, computing a plurality of statistical quantities for feature parameters regarding features of the utterance data, creating, on the basis of the plurality of statistical quantities regarding the utterance data and another plurality of statistical quantities regarding reference utterance data based on other uttered speech, pseudo-utterance data having at least one statistical quantity equal to a statistical quantity in the other plurality of statistical quantities, computing a plurality of statistical quantities for synthetic utterance data synthesized on the basis of the pseudo-utterance data and the utterance data, and determining, on the basis of a comparison between statistical quantities of the synthetic utterance data and statistical quantities of the reference utterance data, whether the speaker who produced the uttered speech is in a first state or a second state; and a memory.

Type: Application

Filed: April 13, 2012

Publication date: January 3, 2013

Applicant: FUJITSU LIMITED

Inventors: Shoji HAYAKAWA, Naoshi Matsuo
CHINESE SPEECH RECOGNITION SYSTEM AND METHOD

Publication number: 20120290302

Abstract: A Chinese speech recognition system and method is disclosed. Firstly, a speech signal is received and recognized to output a word lattice. Next, the word lattice is received, and word arcs of the word lattice are rescored and reranked with a prosodic break model, a prosodic state model, a syllable prosodic-acoustic model, a syllable-juncture prosodic-acoustic model and a factored language model, so as to output a language tag, a prosodic tag and a phonetic segmentation tag, which correspond to the speech signal. The present invention performs rescoring in a two-stage way to promote the recognition rate of basic speech information and labels the language tag, prosodic tag and phonetic segmentation tag to provide the prosodic structure and language information for the rear-stage voice conversion and voice synthesis.

Type: Application

Filed: April 13, 2012

Publication date: November 15, 2012

Inventors: Jyh-Her YANG, Chen-Yu Chiang, Ming-Chieh Liu, Yih-Ru Wang, Yuan-Fu Liao, Sin-Horng Chen
METHOD AND SYSTEM FOR CONSIDERING INFORMATION ABOUT AN EXPECTED RESPONSE WHEN PERFORMING SPEECH RECOGNITION

Publication number: 20120245939

Abstract: A speech recognition system receives and analyzes speech input from a user in order to recognize and accept a response from the user. Under certain conditions, information about the response expected from the user may be available. In these situations, the available information about the expected response is used to modify the behavior of the speech recognition system by taking this information into account. The modified behavior of the speech recognition system according to the invention has several embodiments including: comparing the observed speech features to the models of the expected response separately from the usual hypothesis search in order to speed up the recognition system; modifying the usual hypothesis search to emphasize the expected response; updating and adapting the models when the recognized speech matches the expected response to improve the accuracy of the recognition system.

Type: Application

Filed: June 8, 2012

Publication date: September 27, 2012

Inventors: Keith Braho, Amro El-Jaroudi
SYSTEM AND METHOD FOR SPEECH RECOGNITION MODELING FOR MOBILE VOICE SEARCH

Publication number: 20120232902

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating an acoustic model for use in speech recognition. A system configured to practice the method first receives training data and identifies non-contextual lexical-level features in the training data. Then the system infers sentence-level features from the training data and generates a set of decision trees by node-splitting based on the non-contextual lexical-level features and the sentence-level features. The system decorrelates training vectors, based on the training data, for each decision tree in the set of decision trees to approximate full-covariance Gaussian models, and then can train an acoustic model for use in speech recognition based on the training data, the set of decision trees, and the training vectors.

Type: Application

Filed: March 8, 2011

Publication date: September 13, 2012

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Enrico BOCCHIERI, Diamantino Antonio Caseiro, Dimitrios Dimitriadis
Phonetic Features for Speech Recognition

Publication number: 20120221333

Abstract: Techniques are disclosed for using phonetic features for speech recognition. For example, a method comprises the steps of obtaining a first dictionary and a training data set associated with a speech recognition system, computing one or more support parameters from the training data set, transforming the first dictionary into a second dictionary, wherein the second dictionary is a function of one or more phonetic labels of the first dictionary, and using the one or more support parameters to select one or more samples from the second dictionary to create a set of one or more exemplar-based class identification features for a pattern recognition task.

Type: Application

Filed: February 24, 2011

Publication date: August 30, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Dimitri Kanevsky, David Nahamoo, Bhuvana Ramabhadran, Tara N. Sainath
MULTI-STATE BARGE-IN MODELS FOR SPOKEN DIALOG SYSTEMS

Publication number: 20120101820

Abstract: A method is disclosed for applying a multi-state barge-in acoustic model in a spoken dialogue system. The method includes receiving an audio speech input from the user during the presentation of a prompt, accumulating the audio speech input from the user, applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, determining whether the audio speech input is a barge-in-speech input from the user, and if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.

Type: Application

Filed: October 24, 2011

Publication date: April 26, 2012

Applicant: AT&T Intellectual Property I, L.P.

Inventor: Andrej Ljolje
SPEAKER-ADAPTIVE SYNTHESIZED VOICE

Publication number: 20120059654

Abstract: An objective is to provide a technique for accurately reproducing features of a fundamental frequency of a target-speaker's voice on the basis of only a small amount of learning data. A learning apparatus learns shift amounts from a reference source F0 pattern to a target F0 pattern of a target-speaker's voice. The learning apparatus associates a source F0 pattern of a learning text to a target F0 pattern of the same learning text by associating their peaks and troughs. For each of points on the target F0 pattern, the learning apparatus obtains shift amounts in a time-axis direction and in a frequency-axis direction from a corresponding point on the source F0 pattern in reference to a result of the association, and learns a decision tree using, as an input feature vector, linguistic information obtained by parsing the learning text, and using, as an output feature vector, the calculated shift amounts.

Type: Application

Filed: March 16, 2010

Publication date: March 8, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Masafumi Nishimura, Ryuki Tachibana
SPEAKER ADAPTATION OF VOCABULARY FOR SPEECH RECOGNITION

Publication number: 20120035928

Abstract: A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.

Type: Application

Filed: October 13, 2011

Publication date: February 9, 2012

Applicant: Nuance Communications, Inc.

Inventors: Nitendra Rajput, Ashish Verma
ACOUSTIC MODEL ADAPTATION USING GEOGRAPHIC INFORMATION

Publication number: 20120022869

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving an audio signal that corresponds to an utterance recorded by a mobile device, determining a geographic location associated with the mobile device, adapting one or more acoustic models for the geographic location, and performing speech recognition on the audio signal using the one or more acoustic models model that are adapted for the geographic location.

Type: Application

Filed: September 30, 2011

Publication date: January 26, 2012

Applicant: GOOGLE, INC.

Inventors: Matthew I. Lloyd, Trausti Kristjansson
System and Method for Unsupervised and Active Learning for Automatic Speech Recognition

Publication number: 20120010885

Abstract: A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data.

Type: Application

Filed: September 19, 2011

Publication date: January 12, 2012

Applicant: AT&T Intellectual Property II, L.P.

Inventors: Dilek Zeynep Hakkani-Tür, Giuseppe Riccardi
SYSTEM AND METHOD FOR COMBINING GEOGRAPHIC METADATA IN AUTOMATIC SPEECH RECOGNITION LANGUAGE AND ACOUSTIC MODELS

Publication number: 20110144973

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for a speech recognition application for directory assistance that is based on a user's spoken search query. The spoken search query is received by a portable device and portable device then determines its present location. Upon determining the location of the portable device, that information is incorporated into a local language model that is used to process the search query. Finally, the portable device outputs the results of the search query based on the local language model.

Type: Application

Filed: December 15, 2009

Publication date: June 16, 2011

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Enrico Bocchieri, Diamantino Antonio Caseiro
Compressing Feature Space Transforms

Publication number: 20110144991

Abstract: Methods for compressing a transform associated with a feature space are presented. For example, a method for compressing a transform associated with a feature space includes obtaining the transform including a plurality of transform parameters, assigning each of a plurality of quantization levels for the plurality of transform parameters to one of a plurality of quantization values, and assigning each of the plurality of transform parameters to one of the plurality of quantization values to which one of the plurality of quantization levels is assigned. One or more of obtaining the transform, assigning of each of the plurality of quantization levels, and assigning of each of the transform parameters are implemented as instruction code executed on a processor device. Further, a Viterbi algorithm may be employed for use in non-uniform level/value assignments.

Type: Application

Filed: December 11, 2009

Publication date: June 16, 2011

Applicant: International Business Machines Corporation

Inventors: Petr Fousek, Vaibhava Goel, Etienne Marcheret, Peder Andreas Olsen
UNSUPERVISED LEARNING USING GLOBAL FEATURES, INCLUDING FOR LOG-LINEAR MODEL WORD SEGMENTATION

Publication number: 20110144992

Abstract: Described is a technology for performing unsupervised learning using global features extracted from unlabeled examples. The unsupervised learning process may be used to train a log-linear model, such as for use in morphological segmentation of words. For example, segmentations of the examples are sampled based upon the global features to produce a segmented corpus and log-linear model, which are then iteratively reprocessed to produce a final segmented corpus and a log-linear model.

Type: Application

Filed: December 15, 2009

Publication date: June 16, 2011

Applicant: Microsoft Corporation

Inventors: Kristina N. Toutanova, Colin Andrew Cherry, Hoifung Poon
SPEECH RECOGNITION METHOD FOR ALL LANGUAGES WITHOUT USING SAMPLES

Publication number: 20110035216

Abstract: The invention can recognize any several languages at the same time without using samples. The important skill is that features of known words in any language are extracted from unknown words or continuous voices. These unknown words represented by matrices are spread in the 144-dimensional space. The feature of a known word of any language represented by a matrix is simulated by the surrounding unknown words. The invention includes 12 elastic frames of equal length without filter and without overlap to normalize the signal waveform of variable length for a word, which has one to several syllables, into a 12×12 matrix as a feature of the word. The invention can improve the feature such that the speech recognition of an unknown sentence is correct. The invention can correctly recognize any languages without samples, such as English, Chinese, German, French, Japanese, Korean, Russian, Cantonese, Taiwanese, etc.

Type: Application

Filed: August 5, 2009

Publication date: February 10, 2011

Inventors: Tze Fen LI, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
SYSTEM AND METHOD FOR IMPROVING PERFORMANCE OF SEMANTIC CLASSIFIERS IN SPOKEN DIALOG SYSTEMS

Publication number: 20100268536

Abstract: A method and apparatus for continuously improving the performance of semantic classifiers in the scope of spoken dialog systems are disclosed. Rule-based or statistical classifiers are replaced with better performing rule-based or statistical classifiers and/or certain parameters of existing classifiers are modified. The replacement classifiers or new parameters are trained and tested on a collection of transcriptions and annotations of utterances which are generated manually or in a partially automated fashion. Automated quality assurance leads to more accurate training and testing data, higher classification performance, and feedback into the design of the spoken dialog system by suggesting changes to improve system behavior.

Type: Application

Filed: April 17, 2009

Publication date: October 21, 2010

Inventors: David Suendermann, Keelan Evanini, Jackson Liscombe, Krishna Dayanidhi, Roberto Pieraccini
SPEECH RECOGNITION METHOD

Publication number: 20100204988

Abstract: A speech recognition method includes receiving a speech input signal in a first noise environment which includes a sequence of observations, determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, adapting the model trained in a second noise environment to that of the first environment, wherein adapting the model trained in the second environment to that of the first environment includes using second order or higher order Taylor expansion coefficients derived for a group of probability distributions and the same expansion coefficient is used for the whole group.

Type: Application

Filed: April 20, 2010

Publication date: August 12, 2010

Inventors: Haitian XU, Kean Kheong Chin
TRAINING WIDEBAND ACOUSTIC MODELS IN THE CEPSTRAL DOMAIN USING MIXED-BANDWIDTH TRAINING DATA FOR SPEECH RECOGNITION

Publication number: 20100161332

Abstract: A method and apparatus are provided that use narrowband data and wideband data to train a wideband acoustic model.

Type: Application

Filed: March 8, 2010

Publication date: June 24, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Michael L. Seltzer, Alejandro Acero
Method for Preparing Information for a Speech Dialogue System

Publication number: 20100161331

Abstract: In many application environments, it is desirable to provide voice access to tables on Internet pages, where the user asks a subject-related question in a natural language and receives an adequate answer from the table read out to him in a natural language. A method is disclosed for preparing information presented in a tabular form for a speech dialogue system so that the information of the table can be consulted in a user dialogue in a targeted manner.

Type: Application

Filed: October 25, 2006

Publication date: June 24, 2010

Applicant: Siemens Aktiengesellschaft

Inventors: Hans-Ulrich Block, Manfred Gehrke, Stefanie Schachchti
METHOD AND APPARATUS FOR SPEECH SEGMENTATION

Publication number: 20100153109

Abstract: Machine-readable media, methods, apparatus and system for speech segmentation are described. In some embodiments, a fuzzy rule may be determined to discriminate a speech segment from a non-speech segment. An antecedent of the fuzzy rule may include an input variable and an input variable membership. A consequent of the fuzzy rule may include an output variable and an output variable membership. An instance of the input variable may be extracted from a segment. An input variable membership function associated with the input variable membership and an output variable membership function associated with the output variable membership may be trained. The instance of the input variable, the input variable membership function, the output variable, and the output variable membership function may be operated, to determine whether the segment is the speech segment or the non-speech segment.

Type: Application

Filed: December 27, 2006

Publication date: June 17, 2010

Inventors: Robert Du, Ye Tao, Daren Zu
WEIGHT COEFFICIENT LEARNING SYSTEM AND AUDIO RECOGNITION SYSTEM

Publication number: 20100094629

Abstract: A weighting factor learning system includes an audio recognition section that recognizes learning audio data and outputting the recognition result; a weighting factor updating section that updates a weighting factor applied to a score obtained from an acoustic model and a language model so that the difference between a correct-answer score calculated with the use of a correct-answer text of the learning audio data and a score of the recognition result becomes large; a convergence determination section that determines, with the use of the score after updating, whether to return to the weighting factor updating section to update the weighting factor again; and a weighting factor convergence determination section that determines, with the use of the score after updating, whether to return to the audio recognition section to perform the process again and update the weighting factor using the weighting factor updating section.

Type: Application

Filed: February 19, 2008

Publication date: April 15, 2010

Inventors: Tadashi Emori, Yoshifumi Onishi
CUSTOMIZABLE METHOD AND SYSTEM FOR EMOTIONAL RECOGNITION

Publication number: 20100088088

Abstract: An automated emotional recognition system is adapted to determine emotional states of a speaker based on the analysis of a speech signal. The emotional recognition system includes at least one server function and at least one client function in communication with the at least one server function for receiving assistance in determining the emotional states of the speaker. The at least one client function includes an emotional features calculator adapted to receive the speech signal and to extract therefrom a set of speech features indicative of the emotional state of the speaker. The emotional state recognition system further includes at least one emotional state decider adapted to determine the emotional state of the speaker exploiting the set of speech features based on a decision model. The server function includes at least a decision model trainer adapted to update the selected decision model according to the speech signal.

Type: Application

Filed: January 31, 2007

Publication date: April 8, 2010

Inventors: Gianmario Bollano, Donato Ettorre, Antonio Esiliato
Voice activity detection system and method

Publication number: 20100057453

Abstract: Discrimination between at least two classes of events in an input signal is carried out in the following way. A set of frames containing an input signal is received, and at least two different feature vectors are determined for each of said frames. Said at least two different feature vectors are classified using respective sets of preclassifiers trained for said at least two classes of events. Values for at least one weighting factor are determined based on outputs of said preclassifiers for each of said frames. A combined feature vector is calculated for each of said frames by applying said at least one weighting factor to said at least two different feature vectors. Said combined feature vector is classified using a set of classifiers trained for said at least two classes of events.

Type: Application

Filed: November 16, 2006

Publication date: March 4, 2010

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Zica Valsan
METHOD FOR BUILDING A NATURAL LANGUAGE UNDERSTANDING MODEL FOR A SPOKEN DIALOG SYSTEM

Publication number: 20100042404

Abstract: A method of generating a natural language model for use in a spoken dialog system is disclosed. The method comprises using sample utterances and creating a number of hand crafted rules for each call-type defined in a labeling guide. A first NLU model is generated and tested using the hand crafted rules and sample utterances. A second NLU model is built using the sample utterances as new training data and using the hand crafted rules. The second NLU model is tested for performance using a first batch of labeled data. A series of NLU models are built by adding a previous batch of labeled data to training data and using a new batch of labeling data as test data to generate the series of NLU models with training data that increases constantly. If not all the labeling data is received, the method comprises repeating the step of building a series of NLU models until all labeling data is received.

Type: Application

Filed: October 20, 2009

Publication date: February 18, 2010

Applicant: AT&T Corp.

Inventors: Narendra K. Gupta, Mazin G. Rahim, Gokhan Tur, Antony Van der Mude
METHOD AND APPARATUS FOR SPEECH RECOGNITION

Publication number: 20090259469

Abstract: A method and apparatus for performing speech recognition receives an audio signal, generates a sequence of frames of the audio signal, transforms each frame of the audio signal into a set of narrow band feature vectors using a narrow passband, couples the narrow band feature vectors to a speech model, and determines whether the audio signal is a wide band signal. When the audio signal is determined to be a wide band signal, a pass band parameter of each of one or more passbands that are outside the narrow passband is generated for each frame and the one or more band energy parameters are coupled to the speech model.

Type: Application

Filed: April 14, 2008

Publication date: October 15, 2009

Applicant: MOTOROLA, INC.

Inventors: Changxue Ma, Yuan-Jun Wei
MODIFYING METHOD FOR SPEECH MODEL AND MODIFYING MODULE THEREOF

Publication number: 20090132249

Abstract: A modifying method for a speech model and a modifying module thereof are provided. The modifying method is as follows. First, a correct sequence of a speech is generated according to a correct sequence generating method and the speech model. Next, a candidate sequence generating method is selected from a plurality of candidate sequence generating methods, and a candidate sequence of the speech is generated according to the selected candidate sequence generating method and the speech model. Finally, the speech model is modified according to the correct sequence and the candidate sequence. Therefore, the present invention increases a discrimination of the speech model.

Type: Application

Filed: January 10, 2008

Publication date: May 21, 2009

Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE

Inventors: Jia-Jang Tu, Yuan-Fu Liao
Combining active and semi-supervised learning for spoken language understanding

Publication number: 20090063145

Abstract: Combined active and semi-supervised learning to reduce an amount of manual labeling when training a spoken language understanding model classifier. The classifier may be trained with human-labeled utterance data. Ones of a group of unselected utterance data may be selected for manual labeling via active learning. The classifier may be changed, via semi-supervised learning, based on the selected ones of the unselected utterance data.

Type: Application

Filed: January 12, 2005

Publication date: March 5, 2009

Applicant: AT&T Corp.

Inventors: Dilek Z. Hakkani-Tur, Robert Elias Schapire, Gokham Tur
SYSTEM AND METHOD FOR TUNING AND TESTING IN A SPEECH RECOGNITION SYSTEM

Publication number: 20090043576

Abstract: Systems and methods for improving the performance of a speech recognition system. In some embodiments a tuner module and/or a tester module are configured to cooperate with a speech recognition system. The tester and tuner modules can be configured to cooperate with each other. In one embodiment, the tuner module may include a module for playing back a selected portion of a digital data audio file, a module for creating and/or editing a transcript of the selected portion, and/or a module for displaying information associated with a decoding of the selected portion, the decoding generated by a speech recognition engine. In other embodiments, the tester module can include an editor for creating and/or modifying a grammar, a module for receiving a selected portion of a digital audio file and its corresponding transcript, and a scoring module for producing scoring statistics of the decoding based at least in part on the transcript.

Type: Application

Filed: October 21, 2008

Publication date: February 12, 2009

Applicant: LumenVox, LLC

Inventors: Edward S. Miller, James F. Blake, II, Keith C. Herold, Michael D. Bergman, Kyle N. Danielson, Alexandra L. Auckland
KEYWORD OUTPUTTING APPARATUS AND METHOD

Publication number: 20080319746

Abstract: A keyword analysis device obtains word vectors represented by the documents by analyzing keywords contained in each of documents input in a designated period. A topic cluster extraction device extracts topic clusters belonging to the same topic from a plurality of documents. A keyword extraction device extracts, as a characteristic keyword group, a predetermined number of keywords from the topic cluster in descending order of appearance frequency. A topic structurization determination device determines whether the topic can be structurized, by segmenting the topic cluster into subtopic clusters with reference to the number of documents, the variance of dates contained in the documents, or the C-value of keyword contained in the documents, as a determination criterion. And a keyword presentation device presents the characteristic keyword group in the subtopic cluster upon arranging the keyword group on the basis of the date information.

Type: Application

Filed: March 25, 2008

Publication date: December 25, 2008

Inventors: Masayuki Okamoto, Masaaki Kikuchi, Kazuyuki Goto
METHOD AND APPARATUS FOR LARGE POPULATION SPEAKER IDENTIFICATION IN TELEPHONE INTERACTIONS

Publication number: 20080195387

Abstract: A method and apparatus for determining whether a speaker uttering an utterance belongs to a predetermined set comprising known speakers, wherein a training utterance is available for each known speaker. The method and apparatus test whether features extracted from the tested utterance provide a score exceeding a threshold when matched against one or more of models constructed upon voice samples of each known speaker. The method and system further provide optional enhancements such as determining, using, and updating model normalization parameters, a fast scoring algorithm, summed calls handling, or quality evaluation for the tested utterance.

Type: Application

Filed: October 19, 2006

Publication date: August 14, 2008

Applicant: NICE SYSTEMS LTD.

Inventors: Yaniv ZIGEL, Moshe WASSERBLAT
Orientation pronunciation

Publication number: 20080167873

Abstract: A method for pronunciation of English alphas according to the indications at different orientations of the alpha, comprises the steps of: dividing an area around an alpha into six sections, indicating short sounds, long sounds and strong sounds by points, lines and slashes; that put a small piece of line (in different angle) or point on an alpha indicating that it is pronounced by the pronunciation of another alpha; using underlines to indicate long sounds and short sounds of phonetic symbols of a set of double alphas; using a delete line to indicate that the alpha will not be pronounced, using a space area to divide syllables of a word; using a vertical cut line to indicate that one alpha is pronounced by two sounds; indicating an original sound line at an upper side of the first stroke to represents that the alpha is pronounced with an original sound; and a “?” under a double alpha set representing that the alpha is pronounced with a reverse sound.

Type: Application

Filed: January 8, 2007

Publication date: July 10, 2008

Inventor: Wei-Chou Su
System and methods for accent classification and adaptation

Publication number: 20080147404

Abstract: Speech is processed that may be colored by speech accent. A method for recognizing speech includes maintaining a model of speech accent that is established based on training speech data, wherein the training speech data includes at least a first set of training speech data, and wherein establishing the model of speech accent includes not using any phone or phone-class transcription of the first set of training speech data. Related systems are also presented. A system for recognizing speech includes an accent identification module that is configured to identify accent of the speech to be recognized; and a recognizer that is configured to use models to recognize the speech to be recognized, wherein the models include at least an acoustic model that has been adapted for the identified accent using training speech data of a language, other than primary language of the speech to be recognized, that is associated with the identified accent. Related methods are also presented.

Type: Application

Filed: May 15, 2001

Publication date: June 19, 2008

Applicant: NuSuara Technologies SDN BHD

Inventors: Wai Kat Liu, Pascale Fung
Efficient Empirical Determination, Computation, and Use of Acoustic Confusability Measures

Publication number: 20080126089

Abstract: Efficient empirical determination, computation, and use of an acoustic confusability measure comprises: (1) an empirically derived acoustic confusability measure, comprising a means for determining the acoustic confusability between any two textual phrases in a given language, where the measure of acoustic confusability is empirically derived from examples of the application of a specific speech recognition technology, where the procedure does not require access to the internal computational models of the speech recognition technology, and does not depend upon any particular internal structure or modeling technique, and where the procedure is based upon iterative improvement from an initial estimate; (2) techniques for efficient computation of empirically derived acoustic confusability measure, comprising means for efficient application of an acoustic confusability score, allowing practical application to very large-scale problems; and (3) a method for using acoustic confusability measures to make principled

Type: Application

Filed: October 31, 2007

Publication date: May 29, 2008

Inventors: Harry Printz, Narren Chittar
METHODS AND APPARATUS TO OPERATE AN AUDIENCE METERING DEVICE WITH VOICE COMMANDS

Publication number: 20080120105

Abstract: Methods and apparatus to operate an audience metering device with voice commands are described herein. An example method to identify audience members based on voice, includes: obtaining an audio input signal including a program audio signal and a human voice signal; receiving an audio line signal from an audio output line of a monitored media device; processing the audio line signal with a filter having adaptive weights to generate a delayed and attenuated line signal; subtracting the delayed and attenuated line signal from the audio input signal to develop a residual audio signal; identifying a person that spoke to create the human voice signal based on the residual audio signal; and logging an identity of the person as an audience member.

Type: Application

Filed: February 1, 2008

Publication date: May 22, 2008

Inventor: VENUGOPAL SRINIVASAN
Method for forming words

Publication number: 20080091410

Abstract: A method of forming words utilizing a character actuator unit in which the character actuators are segregated into certain categories. First and second categories are employed and activated simultaneously to generate the beginning and ending of a word. First and second actuating categories may be combined with third and fourth categories of actuators to further form and modify words in any languages.

Type: Application

Filed: January 4, 2007

Publication date: April 17, 2008

Inventor: Sherrie Benson