Training (epo) Patents (Class 704/E15.008)
  • Publication number: 20130332158
    Abstract: The technology of the present application provides a speech recognition system with at least two different speech recognition engines or a single engine speech recognition engine with at least two different modes of operation. The first speech recognition being used to match audio to text, which text may be words or phrases. The matched audio and text is used by a training module to train a user profile for a natural language speech recognition engine, which is at least one of the two different speech recognition engines or modes. An evaluation module evaluates when the user profile is sufficiently trained to convert the speech recognition engine from the first speech recognition engine or mode to the natural language speech recognition or mode.
    Type: Application
    Filed: June 8, 2012
    Publication date: December 12, 2013
    Applicant: NVOQ INCORPORATED
    Inventors: Charles Corfield, Brian Marquette
  • Publication number: 20130262106
    Abstract: A system and method for adapting a language model to a specific environment by receiving interactions captured the specific environment, generating a collection of documents from documents retrieved from external resources, detecting in the collection of documents terms related to the environment that are not included in an initial language model and adapting the initial language model to include the terms detected.
    Type: Application
    Filed: March 29, 2012
    Publication date: October 3, 2013
    Inventors: Eyal HURVITZ, Ezra Daya, Oren Pereg, Moshe Wasserblat
  • Publication number: 20130262114
    Abstract: Different advantageous embodiments provide a crowdsourcing method for modeling user intent in conversational interfaces. One or more stimuli are presented to a plurality of describers. One or more sets of describer data are captured from the plurality of describers using a data collection mechanism. The one or more sets of describer data are processed to generate one or more models. Each of the one or more models is associated with a specific stimulus from the one or more stimuli.
    Type: Application
    Filed: April 3, 2012
    Publication date: October 3, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Christopher John Brockett, Piali Choudhury, William Brennan Dolan, Yun-Cheng Ju, Patrick Pantel, Noelle Mallory Sophy, Svitlana Volkova
  • Publication number: 20130132077
    Abstract: Systems and methods for semi-supervised source separation using non-negative techniques are described. In some embodiments, various techniques disclosed herein may enable the separation of signals present within a mixture, where one or more of the signals may be emitted by one or more different sources. In audio-related applications, for instance, a signal mixture may include speech (e.g., from a human speaker) and noise (e.g., background noise). In some cases, speech may be separated from noise using a speech model developed from training data. A noise model may be created, for example, during the separation process (e.g., “on-the-fly”) and in the absence of corresponding training data.
    Type: Application
    Filed: May 27, 2011
    Publication date: May 23, 2013
    Inventors: Gautham J. Mysore, Paris Smaragdis
  • Publication number: 20130006632
    Abstract: The invention involves the loading and unloading of dynamic section grammars and language models in a speech recognition system. The values of the sections of the structured document are either determined in advance from a collection of documents of the same domain, document type, and speaker; or collected incrementally from documents of the same domain, document type, and speaker; or added incrementally to an already existing set of values. Speech recognition in the context of the given field is constrained to the contents of these dynamic values. If speech recognition fails or produces a poor match within this grammar or section language model, speech recognition against a larger, more general vocabulary that is not constrained to the given section is performed.
    Type: Application
    Filed: September 12, 2012
    Publication date: January 3, 2013
    Inventors: Alwin B. Carus, Larissa Lapshina, Raghu Vemula
  • Publication number: 20130006633
    Abstract: Techniques are provided to recognize a speaker's voice. In one embodiment, received audio data may be separated into a plurality of signals. For each signal, the signal may be associated with value/s for one or more features (e.g., Mel-Frequency Cepstral coefficients). The received data may be clustered (e.g., by clustering features associated with the signals). A predominate voice cluster may be identified and associated with a user. A speech model (e.g., a Gaussian Mixture Model or Hidden Markov Model) may be trained based on data associated with the predominate cluster. A received audio signal may then be processed using the speech model to, e.g.: determine who was speaking; determine whether the user was speaking; determining whether anyone was speaking; and/or determine what words were said. A context of the device or the user may then be inferred based at least partly on the processed signal.
    Type: Application
    Filed: January 5, 2012
    Publication date: January 3, 2013
    Applicant: QUALCOMM Incorporated
    Inventors: Leonard Henry Grokop, Vidya Narayanan
  • Publication number: 20130006630
    Abstract: A state detecting apparatus includes: a processor to execute acquiring utterance data related to uttered speech, computing a plurality of statistical quantities for feature parameters regarding features of the utterance data, creating, on the basis of the plurality of statistical quantities regarding the utterance data and another plurality of statistical quantities regarding reference utterance data based on other uttered speech, pseudo-utterance data having at least one statistical quantity equal to a statistical quantity in the other plurality of statistical quantities, computing a plurality of statistical quantities for synthetic utterance data synthesized on the basis of the pseudo-utterance data and the utterance data, and determining, on the basis of a comparison between statistical quantities of the synthetic utterance data and statistical quantities of the reference utterance data, whether the speaker who produced the uttered speech is in a first state or a second state; and a memory.
    Type: Application
    Filed: April 13, 2012
    Publication date: January 3, 2013
    Applicant: FUJITSU LIMITED
    Inventors: Shoji HAYAKAWA, Naoshi Matsuo
  • Publication number: 20120290302
    Abstract: A Chinese speech recognition system and method is disclosed. Firstly, a speech signal is received and recognized to output a word lattice. Next, the word lattice is received, and word arcs of the word lattice are rescored and reranked with a prosodic break model, a prosodic state model, a syllable prosodic-acoustic model, a syllable-juncture prosodic-acoustic model and a factored language model, so as to output a language tag, a prosodic tag and a phonetic segmentation tag, which correspond to the speech signal. The present invention performs rescoring in a two-stage way to promote the recognition rate of basic speech information and labels the language tag, prosodic tag and phonetic segmentation tag to provide the prosodic structure and language information for the rear-stage voice conversion and voice synthesis.
    Type: Application
    Filed: April 13, 2012
    Publication date: November 15, 2012
    Inventors: Jyh-Her YANG, Chen-Yu Chiang, Ming-Chieh Liu, Yih-Ru Wang, Yuan-Fu Liao, Sin-Horng Chen
  • Publication number: 20120245939
    Abstract: A speech recognition system receives and analyzes speech input from a user in order to recognize and accept a response from the user. Under certain conditions, information about the response expected from the user may be available. In these situations, the available information about the expected response is used to modify the behavior of the speech recognition system by taking this information into account. The modified behavior of the speech recognition system according to the invention has several embodiments including: comparing the observed speech features to the models of the expected response separately from the usual hypothesis search in order to speed up the recognition system; modifying the usual hypothesis search to emphasize the expected response; updating and adapting the models when the recognized speech matches the expected response to improve the accuracy of the recognition system.
    Type: Application
    Filed: June 8, 2012
    Publication date: September 27, 2012
    Inventors: Keith Braho, Amro El-Jaroudi
  • Publication number: 20120232902
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating an acoustic model for use in speech recognition. A system configured to practice the method first receives training data and identifies non-contextual lexical-level features in the training data. Then the system infers sentence-level features from the training data and generates a set of decision trees by node-splitting based on the non-contextual lexical-level features and the sentence-level features. The system decorrelates training vectors, based on the training data, for each decision tree in the set of decision trees to approximate full-covariance Gaussian models, and then can train an acoustic model for use in speech recognition based on the training data, the set of decision trees, and the training vectors.
    Type: Application
    Filed: March 8, 2011
    Publication date: September 13, 2012
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Enrico BOCCHIERI, Diamantino Antonio Caseiro, Dimitrios Dimitriadis
  • Publication number: 20120221333
    Abstract: Techniques are disclosed for using phonetic features for speech recognition. For example, a method comprises the steps of obtaining a first dictionary and a training data set associated with a speech recognition system, computing one or more support parameters from the training data set, transforming the first dictionary into a second dictionary, wherein the second dictionary is a function of one or more phonetic labels of the first dictionary, and using the one or more support parameters to select one or more samples from the second dictionary to create a set of one or more exemplar-based class identification features for a pattern recognition task.
    Type: Application
    Filed: February 24, 2011
    Publication date: August 30, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dimitri Kanevsky, David Nahamoo, Bhuvana Ramabhadran, Tara N. Sainath
  • Publication number: 20120101820
    Abstract: A method is disclosed for applying a multi-state barge-in acoustic model in a spoken dialogue system. The method includes receiving an audio speech input from the user during the presentation of a prompt, accumulating the audio speech input from the user, applying a non-speech component having at least two one-state Hidden Markov Models (HMMs) to the audio speech input from the user, applying a speech component having at least five three-state HMMs to the audio speech input from the user, in which each of the five three-state HMMs represents a different phonetic category, determining whether the audio speech input is a barge-in-speech input from the user, and if the audio speech input is determined to be the barge-in-speech input from the user, terminating the presentation of the prompt.
    Type: Application
    Filed: October 24, 2011
    Publication date: April 26, 2012
    Applicant: AT&T Intellectual Property I, L.P.
    Inventor: Andrej Ljolje
  • Publication number: 20120059654
    Abstract: An objective is to provide a technique for accurately reproducing features of a fundamental frequency of a target-speaker's voice on the basis of only a small amount of learning data. A learning apparatus learns shift amounts from a reference source F0 pattern to a target F0 pattern of a target-speaker's voice. The learning apparatus associates a source F0 pattern of a learning text to a target F0 pattern of the same learning text by associating their peaks and troughs. For each of points on the target F0 pattern, the learning apparatus obtains shift amounts in a time-axis direction and in a frequency-axis direction from a corresponding point on the source F0 pattern in reference to a result of the association, and learns a decision tree using, as an input feature vector, linguistic information obtained by parsing the learning text, and using, as an output feature vector, the calculated shift amounts.
    Type: Application
    Filed: March 16, 2010
    Publication date: March 8, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Masafumi Nishimura, Ryuki Tachibana
  • Publication number: 20120035928
    Abstract: A phonetic vocabulary for a speech recognition system is adapted to a particular speaker's pronunciation. A speaker can be attributed specific pronunciation styles, which can be identified from specific pronunciation examples. Consequently, a phonetic vocabulary can be reduced in size, which can improve recognition accuracy and recognition speed.
    Type: Application
    Filed: October 13, 2011
    Publication date: February 9, 2012
    Applicant: Nuance Communications, Inc.
    Inventors: Nitendra Rajput, Ashish Verma
  • Publication number: 20120022869
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for enhancing speech recognition accuracy. In one aspect, a method includes receiving an audio signal that corresponds to an utterance recorded by a mobile device, determining a geographic location associated with the mobile device, adapting one or more acoustic models for the geographic location, and performing speech recognition on the audio signal using the one or more acoustic models model that are adapted for the geographic location.
    Type: Application
    Filed: September 30, 2011
    Publication date: January 26, 2012
    Applicant: GOOGLE, INC.
    Inventors: Matthew I. Lloyd, Trausti Kristjansson
  • Publication number: 20120010885
    Abstract: A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data.
    Type: Application
    Filed: September 19, 2011
    Publication date: January 12, 2012
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Dilek Zeynep Hakkani-Tür, Giuseppe Riccardi
  • Publication number: 20110144973
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for a speech recognition application for directory assistance that is based on a user's spoken search query. The spoken search query is received by a portable device and portable device then determines its present location. Upon determining the location of the portable device, that information is incorporated into a local language model that is used to process the search query. Finally, the portable device outputs the results of the search query based on the local language model.
    Type: Application
    Filed: December 15, 2009
    Publication date: June 16, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Enrico Bocchieri, Diamantino Antonio Caseiro
  • Publication number: 20110144991
    Abstract: Methods for compressing a transform associated with a feature space are presented. For example, a method for compressing a transform associated with a feature space includes obtaining the transform including a plurality of transform parameters, assigning each of a plurality of quantization levels for the plurality of transform parameters to one of a plurality of quantization values, and assigning each of the plurality of transform parameters to one of the plurality of quantization values to which one of the plurality of quantization levels is assigned. One or more of obtaining the transform, assigning of each of the plurality of quantization levels, and assigning of each of the transform parameters are implemented as instruction code executed on a processor device. Further, a Viterbi algorithm may be employed for use in non-uniform level/value assignments.
    Type: Application
    Filed: December 11, 2009
    Publication date: June 16, 2011
    Applicant: International Business Machines Corporation
    Inventors: Petr Fousek, Vaibhava Goel, Etienne Marcheret, Peder Andreas Olsen
  • Publication number: 20110144992
    Abstract: Described is a technology for performing unsupervised learning using global features extracted from unlabeled examples. The unsupervised learning process may be used to train a log-linear model, such as for use in morphological segmentation of words. For example, segmentations of the examples are sampled based upon the global features to produce a segmented corpus and log-linear model, which are then iteratively reprocessed to produce a final segmented corpus and a log-linear model.
    Type: Application
    Filed: December 15, 2009
    Publication date: June 16, 2011
    Applicant: Microsoft Corporation
    Inventors: Kristina N. Toutanova, Colin Andrew Cherry, Hoifung Poon
  • Publication number: 20110035216
    Abstract: The invention can recognize any several languages at the same time without using samples. The important skill is that features of known words in any language are extracted from unknown words or continuous voices. These unknown words represented by matrices are spread in the 144-dimensional space. The feature of a known word of any language represented by a matrix is simulated by the surrounding unknown words. The invention includes 12 elastic frames of equal length without filter and without overlap to normalize the signal waveform of variable length for a word, which has one to several syllables, into a 12×12 matrix as a feature of the word. The invention can improve the feature such that the speech recognition of an unknown sentence is correct. The invention can correctly recognize any languages without samples, such as English, Chinese, German, French, Japanese, Korean, Russian, Cantonese, Taiwanese, etc.
    Type: Application
    Filed: August 5, 2009
    Publication date: February 10, 2011
    Inventors: Tze Fen LI, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
  • Publication number: 20100268536
    Abstract: A method and apparatus for continuously improving the performance of semantic classifiers in the scope of spoken dialog systems are disclosed. Rule-based or statistical classifiers are replaced with better performing rule-based or statistical classifiers and/or certain parameters of existing classifiers are modified. The replacement classifiers or new parameters are trained and tested on a collection of transcriptions and annotations of utterances which are generated manually or in a partially automated fashion. Automated quality assurance leads to more accurate training and testing data, higher classification performance, and feedback into the design of the spoken dialog system by suggesting changes to improve system behavior.
    Type: Application
    Filed: April 17, 2009
    Publication date: October 21, 2010
    Inventors: David Suendermann, Keelan Evanini, Jackson Liscombe, Krishna Dayanidhi, Roberto Pieraccini
  • Publication number: 20100204988
    Abstract: A speech recognition method includes receiving a speech input signal in a first noise environment which includes a sequence of observations, determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, adapting the model trained in a second noise environment to that of the first environment, wherein adapting the model trained in the second environment to that of the first environment includes using second order or higher order Taylor expansion coefficients derived for a group of probability distributions and the same expansion coefficient is used for the whole group.
    Type: Application
    Filed: April 20, 2010
    Publication date: August 12, 2010
    Inventors: Haitian XU, Kean Kheong Chin
  • Publication number: 20100161332
    Abstract: A method and apparatus are provided that use narrowband data and wideband data to train a wideband acoustic model.
    Type: Application
    Filed: March 8, 2010
    Publication date: June 24, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Michael L. Seltzer, Alejandro Acero
  • Publication number: 20100161331
    Abstract: In many application environments, it is desirable to provide voice access to tables on Internet pages, where the user asks a subject-related question in a natural language and receives an adequate answer from the table read out to him in a natural language. A method is disclosed for preparing information presented in a tabular form for a speech dialogue system so that the information of the table can be consulted in a user dialogue in a targeted manner.
    Type: Application
    Filed: October 25, 2006
    Publication date: June 24, 2010
    Applicant: Siemens Aktiengesellschaft
    Inventors: Hans-Ulrich Block, Manfred Gehrke, Stefanie Schachchti
  • Publication number: 20100153109
    Abstract: Machine-readable media, methods, apparatus and system for speech segmentation are described. In some embodiments, a fuzzy rule may be determined to discriminate a speech segment from a non-speech segment. An antecedent of the fuzzy rule may include an input variable and an input variable membership. A consequent of the fuzzy rule may include an output variable and an output variable membership. An instance of the input variable may be extracted from a segment. An input variable membership function associated with the input variable membership and an output variable membership function associated with the output variable membership may be trained. The instance of the input variable, the input variable membership function, the output variable, and the output variable membership function may be operated, to determine whether the segment is the speech segment or the non-speech segment.
    Type: Application
    Filed: December 27, 2006
    Publication date: June 17, 2010
    Inventors: Robert Du, Ye Tao, Daren Zu
  • Publication number: 20100094629
    Abstract: A weighting factor learning system includes an audio recognition section that recognizes learning audio data and outputting the recognition result; a weighting factor updating section that updates a weighting factor applied to a score obtained from an acoustic model and a language model so that the difference between a correct-answer score calculated with the use of a correct-answer text of the learning audio data and a score of the recognition result becomes large; a convergence determination section that determines, with the use of the score after updating, whether to return to the weighting factor updating section to update the weighting factor again; and a weighting factor convergence determination section that determines, with the use of the score after updating, whether to return to the audio recognition section to perform the process again and update the weighting factor using the weighting factor updating section.
    Type: Application
    Filed: February 19, 2008
    Publication date: April 15, 2010
    Inventors: Tadashi Emori, Yoshifumi Onishi
  • Publication number: 20100088088
    Abstract: An automated emotional recognition system is adapted to determine emotional states of a speaker based on the analysis of a speech signal. The emotional recognition system includes at least one server function and at least one client function in communication with the at least one server function for receiving assistance in determining the emotional states of the speaker. The at least one client function includes an emotional features calculator adapted to receive the speech signal and to extract therefrom a set of speech features indicative of the emotional state of the speaker. The emotional state recognition system further includes at least one emotional state decider adapted to determine the emotional state of the speaker exploiting the set of speech features based on a decision model. The server function includes at least a decision model trainer adapted to update the selected decision model according to the speech signal.
    Type: Application
    Filed: January 31, 2007
    Publication date: April 8, 2010
    Inventors: Gianmario Bollano, Donato Ettorre, Antonio Esiliato
  • Publication number: 20100057453
    Abstract: Discrimination between at least two classes of events in an input signal is carried out in the following way. A set of frames containing an input signal is received, and at least two different feature vectors are determined for each of said frames. Said at least two different feature vectors are classified using respective sets of preclassifiers trained for said at least two classes of events. Values for at least one weighting factor are determined based on outputs of said preclassifiers for each of said frames. A combined feature vector is calculated for each of said frames by applying said at least one weighting factor to said at least two different feature vectors. Said combined feature vector is classified using a set of classifiers trained for said at least two classes of events.
    Type: Application
    Filed: November 16, 2006
    Publication date: March 4, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Zica Valsan
  • Publication number: 20100042404
    Abstract: A method of generating a natural language model for use in a spoken dialog system is disclosed. The method comprises using sample utterances and creating a number of hand crafted rules for each call-type defined in a labeling guide. A first NLU model is generated and tested using the hand crafted rules and sample utterances. A second NLU model is built using the sample utterances as new training data and using the hand crafted rules. The second NLU model is tested for performance using a first batch of labeled data. A series of NLU models are built by adding a previous batch of labeled data to training data and using a new batch of labeling data as test data to generate the series of NLU models with training data that increases constantly. If not all the labeling data is received, the method comprises repeating the step of building a series of NLU models until all labeling data is received.
    Type: Application
    Filed: October 20, 2009
    Publication date: February 18, 2010
    Applicant: AT&T Corp.
    Inventors: Narendra K. Gupta, Mazin G. Rahim, Gokhan Tur, Antony Van der Mude
  • Publication number: 20090259469
    Abstract: A method and apparatus for performing speech recognition receives an audio signal, generates a sequence of frames of the audio signal, transforms each frame of the audio signal into a set of narrow band feature vectors using a narrow passband, couples the narrow band feature vectors to a speech model, and determines whether the audio signal is a wide band signal. When the audio signal is determined to be a wide band signal, a pass band parameter of each of one or more passbands that are outside the narrow passband is generated for each frame and the one or more band energy parameters are coupled to the speech model.
    Type: Application
    Filed: April 14, 2008
    Publication date: October 15, 2009
    Applicant: MOTOROLA, INC.
    Inventors: Changxue Ma, Yuan-Jun Wei
  • Publication number: 20090132249
    Abstract: A modifying method for a speech model and a modifying module thereof are provided. The modifying method is as follows. First, a correct sequence of a speech is generated according to a correct sequence generating method and the speech model. Next, a candidate sequence generating method is selected from a plurality of candidate sequence generating methods, and a candidate sequence of the speech is generated according to the selected candidate sequence generating method and the speech model. Finally, the speech model is modified according to the correct sequence and the candidate sequence. Therefore, the present invention increases a discrimination of the speech model.
    Type: Application
    Filed: January 10, 2008
    Publication date: May 21, 2009
    Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE
    Inventors: Jia-Jang Tu, Yuan-Fu Liao
  • Publication number: 20090063145
    Abstract: Combined active and semi-supervised learning to reduce an amount of manual labeling when training a spoken language understanding model classifier. The classifier may be trained with human-labeled utterance data. Ones of a group of unselected utterance data may be selected for manual labeling via active learning. The classifier may be changed, via semi-supervised learning, based on the selected ones of the unselected utterance data.
    Type: Application
    Filed: January 12, 2005
    Publication date: March 5, 2009
    Applicant: AT&T Corp.
    Inventors: Dilek Z. Hakkani-Tur, Robert Elias Schapire, Gokham Tur
  • Publication number: 20090043576
    Abstract: Systems and methods for improving the performance of a speech recognition system. In some embodiments a tuner module and/or a tester module are configured to cooperate with a speech recognition system. The tester and tuner modules can be configured to cooperate with each other. In one embodiment, the tuner module may include a module for playing back a selected portion of a digital data audio file, a module for creating and/or editing a transcript of the selected portion, and/or a module for displaying information associated with a decoding of the selected portion, the decoding generated by a speech recognition engine. In other embodiments, the tester module can include an editor for creating and/or modifying a grammar, a module for receiving a selected portion of a digital audio file and its corresponding transcript, and a scoring module for producing scoring statistics of the decoding based at least in part on the transcript.
    Type: Application
    Filed: October 21, 2008
    Publication date: February 12, 2009
    Applicant: LumenVox, LLC
    Inventors: Edward S. Miller, James F. Blake, II, Keith C. Herold, Michael D. Bergman, Kyle N. Danielson, Alexandra L. Auckland
  • Publication number: 20080319746
    Abstract: A keyword analysis device obtains word vectors represented by the documents by analyzing keywords contained in each of documents input in a designated period. A topic cluster extraction device extracts topic clusters belonging to the same topic from a plurality of documents. A keyword extraction device extracts, as a characteristic keyword group, a predetermined number of keywords from the topic cluster in descending order of appearance frequency. A topic structurization determination device determines whether the topic can be structurized, by segmenting the topic cluster into subtopic clusters with reference to the number of documents, the variance of dates contained in the documents, or the C-value of keyword contained in the documents, as a determination criterion. And a keyword presentation device presents the characteristic keyword group in the subtopic cluster upon arranging the keyword group on the basis of the date information.
    Type: Application
    Filed: March 25, 2008
    Publication date: December 25, 2008
    Inventors: Masayuki Okamoto, Masaaki Kikuchi, Kazuyuki Goto
  • Publication number: 20080195387
    Abstract: A method and apparatus for determining whether a speaker uttering an utterance belongs to a predetermined set comprising known speakers, wherein a training utterance is available for each known speaker. The method and apparatus test whether features extracted from the tested utterance provide a score exceeding a threshold when matched against one or more of models constructed upon voice samples of each known speaker. The method and system further provide optional enhancements such as determining, using, and updating model normalization parameters, a fast scoring algorithm, summed calls handling, or quality evaluation for the tested utterance.
    Type: Application
    Filed: October 19, 2006
    Publication date: August 14, 2008
    Applicant: NICE SYSTEMS LTD.
    Inventors: Yaniv ZIGEL, Moshe WASSERBLAT
  • Publication number: 20080167873
    Abstract: A method for pronunciation of English alphas according to the indications at different orientations of the alpha, comprises the steps of: dividing an area around an alpha into six sections, indicating short sounds, long sounds and strong sounds by points, lines and slashes; that put a small piece of line (in different angle) or point on an alpha indicating that it is pronounced by the pronunciation of another alpha; using underlines to indicate long sounds and short sounds of phonetic symbols of a set of double alphas; using a delete line to indicate that the alpha will not be pronounced, using a space area to divide syllables of a word; using a vertical cut line to indicate that one alpha is pronounced by two sounds; indicating an original sound line at an upper side of the first stroke to represents that the alpha is pronounced with an original sound; and a “?” under a double alpha set representing that the alpha is pronounced with a reverse sound.
    Type: Application
    Filed: January 8, 2007
    Publication date: July 10, 2008
    Inventor: Wei-Chou Su
  • Publication number: 20080147404
    Abstract: Speech is processed that may be colored by speech accent. A method for recognizing speech includes maintaining a model of speech accent that is established based on training speech data, wherein the training speech data includes at least a first set of training speech data, and wherein establishing the model of speech accent includes not using any phone or phone-class transcription of the first set of training speech data. Related systems are also presented. A system for recognizing speech includes an accent identification module that is configured to identify accent of the speech to be recognized; and a recognizer that is configured to use models to recognize the speech to be recognized, wherein the models include at least an acoustic model that has been adapted for the identified accent using training speech data of a language, other than primary language of the speech to be recognized, that is associated with the identified accent. Related methods are also presented.
    Type: Application
    Filed: May 15, 2001
    Publication date: June 19, 2008
    Applicant: NuSuara Technologies SDN BHD
    Inventors: Wai Kat Liu, Pascale Fung
  • Publication number: 20080126089
    Abstract: Efficient empirical determination, computation, and use of an acoustic confusability measure comprises: (1) an empirically derived acoustic confusability measure, comprising a means for determining the acoustic confusability between any two textual phrases in a given language, where the measure of acoustic confusability is empirically derived from examples of the application of a specific speech recognition technology, where the procedure does not require access to the internal computational models of the speech recognition technology, and does not depend upon any particular internal structure or modeling technique, and where the procedure is based upon iterative improvement from an initial estimate; (2) techniques for efficient computation of empirically derived acoustic confusability measure, comprising means for efficient application of an acoustic confusability score, allowing practical application to very large-scale problems; and (3) a method for using acoustic confusability measures to make principled
    Type: Application
    Filed: October 31, 2007
    Publication date: May 29, 2008
    Inventors: Harry Printz, Narren Chittar
  • Publication number: 20080120105
    Abstract: Methods and apparatus to operate an audience metering device with voice commands are described herein. An example method to identify audience members based on voice, includes: obtaining an audio input signal including a program audio signal and a human voice signal; receiving an audio line signal from an audio output line of a monitored media device; processing the audio line signal with a filter having adaptive weights to generate a delayed and attenuated line signal; subtracting the delayed and attenuated line signal from the audio input signal to develop a residual audio signal; identifying a person that spoke to create the human voice signal based on the residual audio signal; and logging an identity of the person as an audience member.
    Type: Application
    Filed: February 1, 2008
    Publication date: May 22, 2008
    Inventor: VENUGOPAL SRINIVASAN
  • Publication number: 20080091410
    Abstract: A method of forming words utilizing a character actuator unit in which the character actuators are segregated into certain categories. First and second categories are employed and activated simultaneously to generate the beginning and ending of a word. First and second actuating categories may be combined with third and fourth categories of actuators to further form and modify words in any languages.
    Type: Application
    Filed: January 4, 2007
    Publication date: April 17, 2008
    Inventor: Sherrie Benson