Speech Classification Or Search (epo) Patents (Class 704/E15.014)
  • Patent number: 11847423
    Abstract: To prevent intent classifiers from potentially choosing intents that are ineligible for the current input due to policies, dynamic intent classification systems and methods are provided that dynamically control the possible set of intents using environment variables (also referred to as external variables). Associations between environment variables and ineligible intents, referred to as culling rules, are used.
    Type: Grant
    Filed: December 27, 2022
    Date of Patent: December 19, 2023
    Assignee: Verint Americas Inc.
    Inventor: Ian Roy Beaver
  • Patent number: 11763092
    Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances.
    Type: Grant
    Filed: March 30, 2021
    Date of Patent: September 19, 2023
    Assignee: Oracle International Corporation
    Inventors: Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi, Crystal C. Pan, Vladislav Blinov, Cong Duy Vu Hoang, Elias Luqman Jalaluddin, Duy Vu, Balakota Srinivas Vinnakota
  • Patent number: 11735164
    Abstract: A system, article, and method of automatic speech recognition with highly efficient decoding is accomplished by frequent beam width adjustment.
    Type: Grant
    Filed: August 9, 2021
    Date of Patent: August 22, 2023
    Assignee: Intel Corporation
    Inventors: Piotr Rozen, Joachim Hofer
  • Patent number: 11568175
    Abstract: To prevent intent classifiers from potentially choosing intents that are ineligible for the current input due to policies, dynamic intent classification systems and methods are provided that dynamically control the possible set of intents using environment variables (also referred to as external variables). Associations between environment variables and ineligible intents, referred to as culling rules, are used.
    Type: Grant
    Filed: August 5, 2019
    Date of Patent: January 31, 2023
    Assignee: Verint Americas Inc.
    Inventor: Ian Roy Beaver
  • Publication number: 20120323574
    Abstract: Event audio data that is based on verbal utterances associated with a medical event associated with a patient is received. A list of a plurality of candidate text strings that match interpretations of the event audio data is obtained, based on information included in a medical speech repository, information included in a speech accent repository, and a matching function. A selection of at least one of the candidate text strings included in the list is obtained. A population of at least one field of an electronic medical form is initiated, based on the obtained selection.
    Type: Application
    Filed: June 17, 2011
    Publication date: December 20, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Tao Wang, Bin Zhou
  • Publication number: 20120239402
    Abstract: A speech recognition device includes, a speech recognition section that conducts a search, by speech recognition, on audio data stored in a first memory section to extract word-spoken portions where plural words transferred are each spoken and, of the word-spoken portions extracted, rejects the word-spoken portion for the word designated as a rejecting object; an acquisition section that obtains a derived word of a designated search target word, the derived word being generated in accordance with a derived word generation rule stored in a second memory section or read out from the second memory section; a transfer section that transfers the derived word and the search target word to the speech recognition section, the derived word being and set to the outputting object or the rejecting object by the acquisition section; and an output section that outputs the word-spoken portion extracted and not rejected in the search.
    Type: Application
    Filed: February 1, 2012
    Publication date: September 20, 2012
    Applicant: Fujitsu Limited
    Inventors: Nobuyuki WASHIO, Shouji HARADA
  • Publication number: 20120197642
    Abstract: Embodiments of the present invention relate to a signal identifying method, including: obtaining signal characteristics of a current frame of input signals; deciding, according to the signal characteristics of the current frame and updated signal characteristics of a background signal frame before the current frame, whether the current frame is a background signal frame; detecting whether the current frame serving as a background signal frame is in a first type signal state; and adjusting a signal classification decision threshold according to whether the current frame serving as a background signal frame is in the first type signal state to enhance the speech signal identification capability.
    Type: Application
    Filed: April 12, 2012
    Publication date: August 2, 2012
    Applicant: Huawei Technologies Co., Ltd.
    Inventors: Yuanyuan Liu, Zhe Wang, Eyal Shlomot
  • Publication number: 20120166195
    Abstract: A state detection device includes: a first model generation unit to generate a first specific speaker model obtained by modeling speech features of a specific speaker in an undepressed state; a second model generation unit to generate a second specific speaker model obtained by modeling speech features of the specific speaker in the depressed state; a likelihood calculation unit to calculate a first likelihood as a likelihood of the first specific speaker model with respect to input voice, and a second likelihood as a likelihood of the second specific speaker model with respect to the input voice; and a state determination unit to determine a state of the speaker of the input voice using the first likelihood and the second likelihood.
    Type: Application
    Filed: October 5, 2011
    Publication date: June 28, 2012
    Applicant: FUJITSU LIMITED
    Inventors: Shoji HAYAKAWA, Naoshi Matsuo
  • Publication number: 20120143609
    Abstract: An approach for providing speech recognition is disclosed. A name is retrieved from a user based on data provided by the user. The user is prompted for a name of the user. A first audio input is received from the user in response to the prompt. Speech recognition is applied to the first audio input using a name grammar database to output a recognized name. A determination is made whether the recognized name matches the retrieved name. If no match is determined, the user is re-prompted for the name of the user for a second audio input. Speech recognition is applied to the second audio input using a confidence database having entries less than the name grammar database.
    Type: Application
    Filed: November 30, 2011
    Publication date: June 7, 2012
    Applicant: VERIZON PATENT AND LICENSING INC.
    Inventor: David Sannerud
  • Publication number: 20120072217
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating relevant responses to a user query with voice-enabled search. A system practicing the method receives a word lattice generated by an automatic speech recognizer based on a user speech and a prosodic analysis of the user speech, generates a reweighted word lattice based on the word lattice and the prosodic analysis, approximates based on the reweighted word lattice one or more relevant responses to the query, and presents to a user the responses to the query. The prosodic analysis examines metalinguistic information of the user speech and can identify the most salient subject matter of the speech, assess how confident a speaker is in the content of his or her speech, and identify the attitude, mood, emotion, sentiment, etc. of the speaker. Other information not described in the content of the speech can also be used.
    Type: Application
    Filed: September 17, 2010
    Publication date: March 22, 2012
    Applicant: AT&T Intellectual Property I, L.P
    Inventors: Srinivas BANGALORE, Junlan Feng, Michael Johnston, Taniya Mishra
  • Publication number: 20120072216
    Abstract: A method and device are configured to receive voice data from a user and perform speech recognition on the received voice data. A confidence score is calculated that represents the likelihood that received voice data has been accurately recognized. A likely age range is determined associated with the user based on the confidence score.
    Type: Application
    Filed: November 30, 2011
    Publication date: March 22, 2012
    Applicant: VERIZON PATENT AND LICENSING INC.
    Inventor: Kevin R. Witzman
  • Publication number: 20120065976
    Abstract: A method is disclosed herein that includes an act of causing a processor to receive a sample, wherein the sample is one of spoken utterance, an online handwriting sample, or a moving image sample. The method also comprises the act of causing the processor to decode the sample based at least in part upon an output of a combination of a deep structure and a context-dependent Hidden Markov Model (HMM), wherein the deep structure is configured to output a posterior probability of a context-dependent unit. The deep structure is a Deep Belief Network consisting of many layers of nonlinear units with connecting weights between layers trained by a pretraining step followed by a fine-tuning step.
    Type: Application
    Filed: September 15, 2010
    Publication date: March 15, 2012
    Applicant: Microsoft Corporation
    Inventors: Li Deng, Dong Yu, George Edward Dahl
  • Publication number: 20120046946
    Abstract: A system and method for merging audio data streams receive audio data streams from separate inputs, independently transform each data stream from the time to the frequency domain, and generate separate feature data sets for the transformed data streams. Feature data from each of the separate feature data sets is selected to form a merged feature data set that is output to a decoder for recognition purposes. The separate inputs can include an ear microphone and a mouth microphone.
    Type: Application
    Filed: August 20, 2010
    Publication date: February 23, 2012
    Applicant: ADACEL SYSTEMS, INC.
    Inventor: Chang-Qing Shu
  • Publication number: 20110307252
    Abstract: Described is the use of utterance classification based methods and other machine learning techniques to provide a telephony application or other voice menu application (e.g., an automotive application) that need not use Context-Free-Grammars to determine a user's spoken intent. A classifier receives text from an information retrieval-based speech recognizer and outputs a semantic label corresponding to the likely intent of a user's speech. The semantic label is then output, such as for use by a voice menu program in branching between menus. Also described is training, including training the language model from acoustic data without transcriptions, and training the classifier from speech-recognized acoustic data having associated semantic labels.
    Type: Application
    Filed: June 15, 2010
    Publication date: December 15, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Yun-Cheng Ju, James Garnet Droppo, III
  • Publication number: 20110307247
    Abstract: A method and a system for lexical navigation of a corpus of items are provided. For example, the method may include generating a data structure in a non-transitory, computer readable medium. The data structure may include a number of items, a number of keywords, and a frequency that each of the keywords is associated with each of the items. The method may further include generating a top-level lexical cloud that includes a subset of the keywords. Each keyword in the subset may be associated with a size that is proportional its frequency of occurrence. Finally, the method may include generating a plurality of lower-level lexical clouds by eliminating any one of the plurality of items not associated with a particular one of the keywords from the data structure, and generating the lower level lexical cloud as a second subset of the plurality of keywords that remain in the data structure.
    Type: Application
    Filed: June 14, 2010
    Publication date: December 15, 2011
    Inventor: Nathan Moroney
  • Publication number: 20110295605
    Abstract: This speech recognition system provides a function that is capable of adjusting memory usage according to the different target resources. It extracts a sequence of feature vectors from input speech signal. A module for constructing search space reads a text file and generates a word-level search space in an off-line phase. After removing redundancy, the word-level search space is expanded to a phone-level one and is represented by a tree-structure. This may be performed by combining the information from dictionary which gives the mapping from a word to its phonetic sequence(s). In the online phase, a decoder traverses the search space, takes the dictionary and at least one acoustic model as input, computes score of feature vectors and outputs decoding result.
    Type: Application
    Filed: December 28, 2010
    Publication date: December 1, 2011
    Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE
    Inventor: Shiuan-Sung LIN
  • Publication number: 20110213612
    Abstract: A system classifies the source of an input signal. The system determines whether a sound source belongs to classes that may include human speech, musical instruments, machine noise, or other classes of sound sources. The system is robust, performing classification despite variation in sound level and noise masking. Additionally, the system consumes relatively few computational resources and adapts over time to provide consistently accurate classification.
    Type: Application
    Filed: May 11, 2011
    Publication date: September 1, 2011
    Inventor: Pierre Zakarauskas
  • Publication number: 20110202337
    Abstract: For classifying different segments of a signal which has segments of at least a first type and second type, e.g. audio and speech segments, the signal is short-term classified on the basis of the at least one short-term feature extracted from the signal and a short-term classification result is delivered. The signal is also long-term classified on the basis of the at least one short-term feature and at least one long-term feature extracted from the signal and a long-term classification result is delivered. The short-term classification result and the long-term classification result are combined to provide an output signal indicating whether a segment of the signal is of the first type or of the second type.
    Type: Application
    Filed: January 11, 2011
    Publication date: August 18, 2011
    Inventors: Guillaume Fuchs, Stefan Bayer, Jens Hirschfeld, Juergen Herre, Jeremie Lecomte, Frederik Nagel, Nikolaus Rettelbach, Stefan Wabnik, Yoshikazu Yokotani
  • Publication number: 20110153646
    Abstract: A system and method for triaging of information feeds is provided. A plurality of information feeds are received. At least one topic is identified from each information feed. At least one topic is presented to a user in topic facet including a plurality of identified topics. A selection of one of the plurality of topics is received from the user. The user interface is updated to display only the feeds that contain the selected topic.
    Type: Application
    Filed: December 23, 2009
    Publication date: June 23, 2011
    Applicant: Palo Alto Research Center Incorporated
    Inventors: Lichan Hong, Gregorio Covertino, Bongwon Suh, Ed H. Chi
  • Publication number: 20110153327
    Abstract: According to embodiments of the present disclosure, a matching module is configured to accurately match a probe identity of an entity to a collection of entities. The matching module is configured to match the probe identity of the entity to the collection of entities based on a combination of phonetic matching processes and edit distance processes. The matching module is configured to create phonetic groups for name parts of identities in the collection. The matching module is configured to compare probe name parts of the probe identity to the name parts associated with the phonetic groups.
    Type: Application
    Filed: February 22, 2008
    Publication date: June 23, 2011
    Inventor: Anthony S. Iasso
  • Publication number: 20110144986
    Abstract: Described is a calibration model for use in a speech recognition system. The calibration model adjusts the confidence scores output by a speech recognition engine to thereby provide an improved calibrated confidence score for use by an application. The calibration model is one that has been trained for a specific usage scenario, e.g., for that application, based upon a calibration training set obtained from a previous similar/corresponding usage scenario or scenarios. Different calibration models may be used with different usage scenarios, e.g., during different conditions. The calibration model may comprise a maximum entropy classifier with distribution constraints, trained with continuous raw confidence scores and multi-valued word tokens, and/or other distributions and extracted features.
    Type: Application
    Filed: December 10, 2009
    Publication date: June 16, 2011
    Applicant: Microsoft Corporation
    Inventors: Dong Yu, Li Deng, Jinyu Li
  • Publication number: 20110122137
    Abstract: A video summarized method based on mining the story structure and semantic relations among concept entities has steps of processing a video to generate multiple important shots that are annotated with respective keywords: Performing a concept expansion process by using the keywords to create expansion trees for the annotated shots; rearranging the keywords of the expansion trees and classifying to calculate relations thereof; applying a graph entropy algorithm to determine significant shots and edges interconnected with the shots. Based on the determined result of the graph entropy algorithm, a structured relational graph is built to display the significant shots and edges thereof. Consequently, users can more rapidly browse the content of a video and comprehend if different shots are related.
    Type: Application
    Filed: November 23, 2009
    Publication date: May 26, 2011
    Applicant: NATIONAL CHENG KUNG UNIVERSITY
    Inventors: Jhing-Fa WANG, Bo-Wei CHEN, Jia-Ching WANG, Chia-Hung CHANG
  • Publication number: 20110119057
    Abstract: Disclosed are systems, methods, and computer-program products for segmenting content of an input signal and applications thereof. In an embodiment, the system includes simulated neurons, a phase modulator, and an entity-identifier module. Each simulated neuron is connected to one or more other simulated neurons and is associated with an activity and a phase. The activity and the phase of each simulated neuron is set based on the activity and the phase of the one or more other simulated neurons connected to each simulated neuron. The phase modulator includes individual modulators, each configured to modulate the activity and the phase of each of the plurality of simulated neurons based on a modulation function. The entity-identifier module is configured to identify one or more distinct entities (e.g., objects, sound sources, etc.) included in the input signal based on the one or more distinct collections of simulated neurons that have substantially distinct phases.
    Type: Application
    Filed: November 18, 2009
    Publication date: May 19, 2011
    Applicant: The Intellisis Corporation
    Inventors: Douglas A. Moore, Kristi H. Tsukida, Paulo B. Ang
  • Publication number: 20110082695
    Abstract: An electronic device includes a call analysis module that is configured to analyze characteristics of a phone call and to generate an indicium that represents a prevailing mood associated with the phone call based on the analyzed characteristics.
    Type: Application
    Filed: October 2, 2009
    Publication date: April 7, 2011
    Inventor: Emil Morgan Billing Bengt
  • Publication number: 20110071826
    Abstract: A method and apparatus for ordering results from a query is provided herein. During operation, a spoken query is received and converted to a textual representation, such as a word lattice. Search strings are then created from the word lattice. For example a set search strings may be created from the N-grams, such as unigrams and bigrams, of the word lattice. The search strings may be ordered and truncated based on confidence values assigned to the n-grams by the speech recognition system. The set of search strings are sent to at least one search engine, and search results are obtained. The search results are then re-arranged or reordered based on a semantic similarity between the search results and the word lattice.
    Type: Application
    Filed: September 23, 2009
    Publication date: March 24, 2011
    Applicant: MOTOROLA, INC.
    Inventors: Changxue Ma, Harry M. Bliss
  • Publication number: 20110064302
    Abstract: A method is disclosed for recognition of high-dimensional data in the presence of occlusion, including: receiving a target data that includes an occlusion and is of an unknown class, wherein the target data includes a known object; sampling a plurality of training data files comprising a plurality of distinct classes of the same object as that of the target data; and identifying the class of the target data through linear superposition of the sampled training data files using l1 minimization, wherein a linear superposition with a sparsest number of coefficients is used to identify the class of the target data.
    Type: Application
    Filed: January 29, 2009
    Publication date: March 17, 2011
    Inventors: Yi Ma, Allen Yang Yang, John Norbert Wright, Andrew William Wagner
  • Publication number: 20110066434
    Abstract: The invention can recognize all languages and input words. It needs m unknown voices to represent m categories of known words with similar pronunciations. Words can be pronounced in any languages, dialects or accents. Each will be classified into one of m categories represented by its most similar unknown voice. When user pronounces a word, the invention finds its F most similar unknown voices. All words in F categories represented by F unknown voices will be arranged according to their pronunciation similarity and alphabetic letters. The pronounced word should be among the top words. Since we only find the F most similar unknown voices from m (=500) unknown voices and since the same word can be classified into several categories, our recognition method is stable for all users and can fast and accurately recognize all languages (English, Chinese and etc.) and input much more words without using samples.
    Type: Application
    Filed: September 29, 2009
    Publication date: March 17, 2011
    Inventors: Tze-Fen LI, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
  • Publication number: 20110066425
    Abstract: Systems, methods, and apparatus provide clinical terminology services including a controlled medical vocabulary supplemented by local clinical content.
    Type: Application
    Filed: September 17, 2009
    Publication date: March 17, 2011
    Inventors: Darren S. Hudgins, Thomas A. Oniki
  • Publication number: 20110046951
    Abstract: A system and a method to generate statistical utterance classifiers optimized for the individual states of a spoken dialog system is disclosed. The system and method make use of large databases of transcribed and annotated utterances from calls collected in a dialog system in production and log data reporting the association between the state of the system at the moment when the utterances were recorded and the utterance. From the system state, being a vector of multiple system variables, subsets of these variables, certain variable ranges, quantized variable values, etc. can be extracted to produce a multitude of distinct utterance subsets matching every possible system state. For each of these subset and variable combinations, statistical classifiers can be trained, tuned, and tested, and the classifiers can be stored together with the performance results and the state subset and variable combination.
    Type: Application
    Filed: August 21, 2009
    Publication date: February 24, 2011
    Inventors: David Suendermann, Jackson Liscombe, Krishna Dayanidhi, Roberto Pieraccini
  • Publication number: 20100332226
    Abstract: A mobile terminal and controlling method thereof are disclosed, by which a specific content and another content associated with the specific content can be quickly searched using a user's voice. The present invention includes inputting a voice for a search for a specific content provided to the mobile terminal via a microphone, analyzing a meaning of the inputted voice, searching a memory for at least one content to which a voice name having a meaning associated with the analyzed voice is tagged, and displaying the searched at least one content.
    Type: Application
    Filed: June 30, 2010
    Publication date: December 30, 2010
    Applicant: LG ELECTRONICS INC.
    Inventors: In Jik Lee, Jong Keun Youn, Dae Sung Jung, Jae Min Joh, Sun Hwa Cha, Seung Heon Yang, Jae Hoon Yu
  • Publication number: 20100318536
    Abstract: System, computer implemented method and computer program product for preparing and navigating a query tree including a plurality of query nodes and informational nodes. Each query node is associated with a prompt, branching criteria and keywords. A current query node provides a prompt to a user and a user response is received and analyzed to identify branching criteria and keywords from the user response. The method navigates to another node in the query tree in consideration of the branching criteria received in the user response and a comparison between the keywords received in the user response and the keywords associated with the query nodes. The comparison may validate navigation to a destination node corresponding to the branching criteria or the comparison may indicate incorrect navigation of the query tree. Corrective navigation can be implemented in various ways based upon the keywords received in the user response.
    Type: Application
    Filed: June 12, 2009
    Publication date: December 16, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Justin P. Bandholz, William G. Pagan, William J. Piazza
  • Publication number: 20100299146
    Abstract: Improving speech capabilities of a multimodal application including receiving, by the multimodal browser, a media file having a metadata container; retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser; determining whether the speech artifact includes a grammar rule or a pronunciation rule; if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule; and if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule.
    Type: Application
    Filed: May 19, 2009
    Publication date: November 25, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ciprian Agapi, William K. Bodin, Charles W. Cross, JR.
  • Publication number: 20100169094
    Abstract: A speaker adaptation apparatus includes an acquiring unit configured to acquire an acoustic model including HMMs and decision trees for estimating what type of the phoneme or the word is included in a feature value used for speech recognition, the HMMs having a plurality of states on a phoneme-to-phoneme basis or a word-to-word basis, and the decision trees being configured to reply to questions relating to the feature value and output likelihoods in the respective states of the HMMs, and a speaker adaptation unit configured to adapt the decision trees to a speaker, the decision trees being adapted using speaker adaptation data vocalized by the speaker of an input speech.
    Type: Application
    Filed: September 17, 2009
    Publication date: July 1, 2010
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masami Akamine, Jitendra Ajmera, Partha Lal
  • Publication number: 20100153107
    Abstract: A trend evaluation device includes trend evaluation means having at least one of relative cooccurrence probability calculation means for calculating a change of cooccurrence probability of a keyword and an associated word and relative associated word similarity calculation means for calculating a change degree of a conversation topic concerning the keyword, so as to calculate a trend score by considering one or more combinations of the relative cooccurrence probability and the relative associated word similarity obtained by these means.
    Type: Application
    Filed: September 25, 2006
    Publication date: June 17, 2010
    Applicant: NEC CORPORATION
    Inventor: Hideki Kawai
  • Publication number: 20100145681
    Abstract: The invention for processing speech that is described herein measures the periodic changes of multiple acoustic features in a digitized utterance without regard for lexical, sublexical, or prosodic features. These measurements of periodic, simultaneous changes of multiple acoustic features are assembled into transformational structures. Various types of transformational structures are identified, quantified, and displayed by the invention. The invention is useful for the study of such speaker characteristics as cognitive, emotional, linguistic, and behavioral functioning, and may be employed in the study of other phenomena of interest to the user.
    Type: Application
    Filed: December 8, 2008
    Publication date: June 10, 2010
    Inventor: Daniel M. Begel
  • Publication number: 20100138223
    Abstract: An object of the present invention is to allow classification of sequentially input speech signals with good accuracy based on similarity of speakers and environments by using a realistic memory use amount, a realistic processing speed, and an on-line operation. A speech classification probability calculation means 103 calculates a probability (probability of classification into each cluster) that a latest one of the speech signals (speech data) belongs to each cluster based on a generative model which is a probability model. A parameter updating means 107 successively estimates parameters that define the generative model based on the probability of classification of the speech data into each cluster calculated by the speech classification probability calculation means 103 (in FIG. 1).
    Type: Application
    Filed: March 13, 2008
    Publication date: June 3, 2010
    Inventor: Takafumi Koshinaka
  • Publication number: 20100106498
    Abstract: Disclosed herein are systems, methods, and computer readable-media for targeted advertising, the method including receiving an audio stream containing user speech from a first device, generating text based on the speech contained in the audio stream, identifying at least one key phrase in the text, receiving from an advertiser an advertisement related to the identified at least one key phrase, and displaying the advertisement. In one aspect, the method further includes receiving from an advertiser a set of rules associated with the received advertisement and displaying the advertisement in accordance with the associated set of rules. The first device can be a converged voice and data communications device connected to a network. The communications device can generate text based on the speech. In one aspect, the method displays the advertisement on one or both of a converged voice and data communications device and a second communications device. A central server can generate text based on the speech.
    Type: Application
    Filed: October 24, 2008
    Publication date: April 29, 2010
    Applicant: AT&T Intellectual Property I, L.P.
    Inventor: Patrick Jason MORRISON
  • Publication number: 20100094626
    Abstract: It is an object of the present invention to provide a method and apparatus for locating a keyword of a speech and a speech recognition system. The method includes the steps of: by extracting feature parameters from frames constituting the recognition target speech, forming a feature parameter vector sequence that represents the recognition target speech; by normalizing of the feature parameter vector sequence with use of a codebook containing a plurality of codebook vectors, obtaining a feature trace of the recognition target speech in a vector space; and specifying the position of a keyword by matching prestored keyword template traces with the feature trace. According to the present invention, a keyword template trace and a feature space trace of a target speech are drawn in accordance with an identical codebook. This causes resampling to be unnecessary in performing linear movement matching of speech wave frames having similar phonological feature structures.
    Type: Application
    Filed: September 27, 2007
    Publication date: April 15, 2010
    Inventors: Fengqin Li, Yadong Wu, Qinqtao Yang, Chen Chen
  • Publication number: 20100057452
    Abstract: The described implementations relate to speech interfaces and in some instances to speech pattern recognition techniques that enable speech interfaces. One system includes a feature pipeline configured to produce speech feature vectors from input speech. This system also includes a classifier pipeline configured to classify individual speech feature vectors utilizing multi-level classification.
    Type: Application
    Filed: August 28, 2008
    Publication date: March 4, 2010
    Applicant: Microsoft Corporation
    Inventors: Kunal Mukerjee, Brendan Meeder
  • Publication number: 20100023331
    Abstract: An automated method is described for developing an automated speech input semantic classification system such as a call routing system. A set of semantic classifications is defined for classification of input speech utterances, where each semantic classification represents a specific semantic classification of the speech input. The semantic classification system is trained from training data having little or no in-domain manually transcribed training data, and then operated to assign input speech utterances to the defined semantic classifications. Adaptation training data based on input speech utterances is collected with manually assigned semantic labels. When the adaptation training data satisfies a pre-determined adaptation criteria, the semantic classification system is automatically retrained based on the adaptation training data.
    Type: Application
    Filed: July 15, 2009
    Publication date: January 28, 2010
    Applicant: Nuance Communications, Inc.
    Inventors: Nicolae Duta, Rèal Tremblay, Andy Mauro, Douglas Peters
  • Publication number: 20090313025
    Abstract: A method and system are disclosed that automatically segment speech to generate a speech inventory. The method includes initializing a Hidden Markov Model (HMM) using seed input data, performing a segmentation of the HMM into speech units to generate phone labels, correcting the segmentation of the speech units. Correcting the segmentation of the speech units includes re-estimating the HMM based on a current version of the phone labels, embedded re-estimating of the HMM, and updating the current version of the phone labels using spectral boundary correction. The system includes modules configured to control a processor to perform steps of the method.
    Type: Application
    Filed: August 20, 2009
    Publication date: December 17, 2009
    Applicant: AT&T Corp.
    Inventors: Alistair D. CONKIE, Yeon-Jun KIM
  • Publication number: 20090313008
    Abstract: An information apparatus for use in mobile unit, which is mounted on a mobile unit, includes at least a broadcast receiver 11 and 13 for receiving a broadcasting signal containing a broadcasting station name; a recognition dictionary 30 for registering the broadcasting station name; and a voice recognition section 27 for performing voice recognition of a voice input indicating the broadcasting station name, and carries out, referring to the dictionary, tuning to the broadcasting station associated with the broadcasting station name corresponding to a voice recognition result.
    Type: Application
    Filed: April 4, 2006
    Publication date: December 17, 2009
    Inventors: Reiko Okada, Tadashi Suzuki, Yuzo Maruta
  • Publication number: 20090271196
    Abstract: Methods, systems, and machine-readable media are disclosed for processing a signal representing speech. According to one embodiment, processing a signal representing speech can comprise receiving a frame of the signal representing speech. The frame can be classified as unvoiced or voiced based on occurrence of one or more events within the frame. For example, the one or more events can comprise one or more glottal pulses. In response to classifying the frame as voiced, the frame can be processed.
    Type: Application
    Filed: October 23, 2008
    Publication date: October 29, 2009
    Applicant: Red Shift Company, LLC
    Inventors: Joel K. Nyquist, Erik N. Reckase, Matthew D. Robinson, John F. Remillard
  • Publication number: 20090265163
    Abstract: Methods and systems to exchange and display data among a plurality of devices in response to one or more of user input and context-based information. User input may include one or more of motion, speech, text, pointing, and touch-selecting. Context-based information may include one or more of user location, which may be relative to one or more devices, background audio, information related to one or more products and/or services, and user-based context information. User context-based information may correspond one or more of prior transactions, prior activities, prior content exposure, and demographic information. Also disclosed herein are methods and systems to correlate user speech to one or more of commands and data objects, with respect to context-based information. Methods and systems to recognize speech may be implemented in combination with methods and systems to exchange and/or display of data among a plurality of devices, and in other environments.
    Type: Application
    Filed: February 12, 2009
    Publication date: October 22, 2009
    Applicant: Phone Through, Inc.
    Inventors: Lehmann Li, Donald Addiss
  • Publication number: 20090240498
    Abstract: Systems and methods to perform short text segment similarity measures. Illustratively, a short text segment similarity environment comprises a short text engine operative to process data representative of short segments of text and an instruction set comprising at least one instruction to instruct the short text engine to process data representative of short text segment inputs according to a selected short text similarity identification paradigm. Illustratively, two or more short text segments can be received as input by the short text engine and a request to identify similarities among the two or more short text segments. Responsive to the request and data input, the short text engine executes a selected similarity identification technique in accordance with the sort text similarity identification paradigm to process the received data and to identify similarities between the short text segment inputs.
    Type: Application
    Filed: March 19, 2008
    Publication date: September 24, 2009
    Applicant: Microsoft Corporation
    Inventors: Wen-tau Yih, Alexei V. Bocharov, Christopher A. Meek
  • Publication number: 20090204399
    Abstract: Necessary portions of stored speech data representing conference content are summarized and reproduced in a predetermined time. Conference speech is summarized and reproduced using a speech data summarizing and reproducing apparatus comprising a speech data divider for dividing and structuring conference speech data into several utterance unit data based on utterers, distributed documents, the occurrence frequency of words in speech recognition results, and pauses, an importance level calculator for determining important utterance unit data based on the occurrence frequency of keywords, the information of utterers, and data specified by the user, a summarizer for extracting important utterance unit data and summarizing them within a specified time, and a speech data reproducer for reproducing the summarized speech data in chronological order or an order of importance levels with auxiliary information added thereto.
    Type: Application
    Filed: May 7, 2007
    Publication date: August 13, 2009
    Applicant: NEC CORPORATION
    Inventor: Susumu Akamine
  • Publication number: 20090171661
    Abstract: Techniques for assessing pronunciation abilities of a user are provided. The techniques include recording a sentence spoken by a user, performing a classification of the spoken sentence, wherein the classification is performed with respect to at least one N-ordered class, and wherein the spoken sentence is represented by a set of at least one acoustic feature extracted from the spoken sentence, and determining a score based on the classification, wherein the score is used to determine an optimal set of at least one question to assess pronunciation ability of the user without human intervention.
    Type: Application
    Filed: June 27, 2008
    Publication date: July 2, 2009
    Applicant: International Business Machines Corporation
    Inventors: Jayadeva, Sachindra Joshi, Himanshu Pant, Ashish Verma
  • Publication number: 20090150155
    Abstract: The present invention aims at extracting a keyword of conversation without preparations by advanced anticipation of keywords of conversation.
    Type: Application
    Filed: March 14, 2008
    Publication date: June 11, 2009
    Applicant: PANASONIC CORPORATION
    Inventors: Mitsuru Endo, Maki Yamada, Keiko Morii, Tomohiro Konuma, Kazuya Nomura
  • Publication number: 20090119105
    Abstract: The example embodiment of the present invention provides an acoustic model adaptation method for enhancing recognition performance for a non-native speaker's speech. In order to adapt acoustic models, first, pronunciation variations are examined by analyzing a non-native speaker's speech. Thereafter, based on variation pronunciation of a non-native speaker's speech, acoustic models are adapted in a state-tying step during a training process of acoustic models. When the present invention for adapting acoustic models and a conventional acoustic model adaptation scheme are combined, more-enhanced recognition performance can be obtained. The example embodiment of the present invention enhances recognition performance for a non-native speaker's speech while reducing the degradation of recognition performance for a native speaker's speech.
    Type: Application
    Filed: March 30, 2007
    Publication date: May 7, 2009
    Inventors: Hong Kook Kim, Yoo Rhee Oh, Jae Sam Yoon
  • Publication number: 20090112593
    Abstract: A system is provided for recognizing speech for searching a database. The system receives speech input as a spoken search request and then processes the speech input in a speech recognition step using a vocabulary for recognizing the spoken request. By processing the speech input words recognized in the speech input and included in the vocabulary are obtained to form at least one hypothesis. The hypothesis is then utilized to search a database using the at least one hypothesis as a search query. A search result is then received from the database and provided to the user.
    Type: Application
    Filed: October 24, 2008
    Publication date: April 30, 2009
    Applicant: Harman Becker Automotive Systems GmbH
    Inventors: Lars Konig, Andreas Low, Udo Haiber