Speech Classification Or Search (epo) Patents (Class 704/E15.014)

E Subclasses

Using distance or distortion measures between unknown speech and reference templates (epo) (Class 704/E15.015)

Using dynamic programming techniques, e.g., dynamic time warping (dtw), etc. (epo) (Class 704/E15.016)

Using artificial neural networks (epo) (Class 704/E15.017)

Using natural language modeling (epo) (Class 704/E15.018)

Using statistical models, e.g., hidden markov models (hmms), etc. (epo) (Class 704/E15.027)

Recognition networks (epo) (Class 704/E15.038)

Dynamic intent classification based on environment variables

Patent number: 11847423

Abstract: To prevent intent classifiers from potentially choosing intents that are ineligible for the current input due to policies, dynamic intent classification systems and methods are provided that dynamically control the possible set of intents using environment variables (also referred to as external variables). Associations between environment variables and ineligible intents, referred to as culling rules, are used.

Type: Grant

Filed: December 27, 2022

Date of Patent: December 19, 2023

Assignee: Verint Americas Inc.

Inventor: Ian Roy Beaver
Techniques for out-of-domain (OOD) detection

Patent number: 11763092

Abstract: The present disclosure relates to techniques for identifying out-of-domain utterances.

Type: Grant

Filed: March 30, 2021

Date of Patent: September 19, 2023

Assignee: Oracle International Corporation

Inventors: Thanh Long Duong, Mark Edward Johnson, Vishal Vishnoi, Crystal C. Pan, Vladislav Blinov, Cong Duy Vu Hoang, Elias Luqman Jalaluddin, Duy Vu, Balakota Srinivas Vinnakota
Method and system of automatic speech recognition with highly efficient decoding

Patent number: 11735164

Abstract: A system, article, and method of automatic speech recognition with highly efficient decoding is accomplished by frequent beam width adjustment.

Type: Grant

Filed: August 9, 2021

Date of Patent: August 22, 2023

Assignee: Intel Corporation

Inventors: Piotr Rozen, Joachim Hofer
Dynamic intent classification based on environment variables

Patent number: 11568175

Abstract: To prevent intent classifiers from potentially choosing intents that are ineligible for the current input due to policies, dynamic intent classification systems and methods are provided that dynamically control the possible set of intents using environment variables (also referred to as external variables). Associations between environment variables and ineligible intents, referred to as culling rules, are used.

Type: Grant

Filed: August 5, 2019

Date of Patent: January 31, 2023

Assignee: Verint Americas Inc.

Inventor: Ian Roy Beaver
SPEECH TO TEXT MEDICAL FORMS

Publication number: 20120323574

Abstract: Event audio data that is based on verbal utterances associated with a medical event associated with a patient is received. A list of a plurality of candidate text strings that match interpretations of the event audio data is obtained, based on information included in a medical speech repository, information included in a speech accent repository, and a matching function. A selection of at least one of the candidate text strings included in the list is obtained. A population of at least one field of an electronic medical form is initiated, based on the obtained selection.

Type: Application

Filed: June 17, 2011

Publication date: December 20, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Tao Wang, Bin Zhou
SPEECH RECOGNITION DEVICE AND METHOD

Publication number: 20120239402

Abstract: A speech recognition device includes, a speech recognition section that conducts a search, by speech recognition, on audio data stored in a first memory section to extract word-spoken portions where plural words transferred are each spoken and, of the word-spoken portions extracted, rejects the word-spoken portion for the word designated as a rejecting object; an acquisition section that obtains a derived word of a designated search target word, the derived word being generated in accordance with a derived word generation rule stored in a second memory section or read out from the second memory section; a transfer section that transfers the derived word and the search target word to the speech recognition section, the derived word being and set to the outputting object or the rejecting object by the acquisition section; and an output section that outputs the word-spoken portion extracted and not rejected in the search.

Type: Application

Filed: February 1, 2012

Publication date: September 20, 2012

Applicant: Fujitsu Limited

Inventors: Nobuyuki WASHIO, Shouji HARADA
SIGNAL PROCESSING METHOD, DEVICE, AND SYSTEM

Publication number: 20120197642

Abstract: Embodiments of the present invention relate to a signal identifying method, including: obtaining signal characteristics of a current frame of input signals; deciding, according to the signal characteristics of the current frame and updated signal characteristics of a background signal frame before the current frame, whether the current frame is a background signal frame; detecting whether the current frame serving as a background signal frame is in a first type signal state; and adjusting a signal classification decision threshold according to whether the current frame serving as a background signal frame is in the first type signal state to enhance the speech signal identification capability.

Type: Application

Filed: April 12, 2012

Publication date: August 2, 2012

Applicant: Huawei Technologies Co., Ltd.

Inventors: Yuanyuan Liu, Zhe Wang, Eyal Shlomot
STATE DETECTION DEVICE AND STATE DETECTING METHOD

Publication number: 20120166195

Abstract: A state detection device includes: a first model generation unit to generate a first specific speaker model obtained by modeling speech features of a specific speaker in an undepressed state; a second model generation unit to generate a second specific speaker model obtained by modeling speech features of the specific speaker in the depressed state; a likelihood calculation unit to calculate a first likelihood as a likelihood of the first specific speaker model with respect to input voice, and a second likelihood as a likelihood of the second specific speaker model with respect to the input voice; and a state determination unit to determine a state of the speaker of the input voice using the first likelihood and the second likelihood.

Type: Application

Filed: October 5, 2011

Publication date: June 28, 2012

Applicant: FUJITSU LIMITED

Inventors: Shoji HAYAKAWA, Naoshi Matsuo
METHOD AND SYSTEM FOR PROVIDING SPEECH RECOGNITION

Publication number: 20120143609

Abstract: An approach for providing speech recognition is disclosed. A name is retrieved from a user based on data provided by the user. The user is prompted for a name of the user. A first audio input is received from the user in response to the prompt. Speech recognition is applied to the first audio input using a name grammar database to output a recognized name. A determination is made whether the recognized name matches the retrieved name. If no match is determined, the user is re-prompted for the name of the user for a second audio input. Speech recognition is applied to the second audio input using a confidence database having entries less than the name grammar database.

Type: Application

Filed: November 30, 2011

Publication date: June 7, 2012

Applicant: VERIZON PATENT AND LICENSING INC.

Inventor: David Sannerud
SYSTEM AND METHOD FOR USING PROSODY FOR VOICE-ENABLED SEARCH

Publication number: 20120072217

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating relevant responses to a user query with voice-enabled search. A system practicing the method receives a word lattice generated by an automatic speech recognizer based on a user speech and a prosodic analysis of the user speech, generates a reweighted word lattice based on the word lattice and the prosodic analysis, approximates based on the reweighted word lattice one or more relevant responses to the query, and presents to a user the responses to the query. The prosodic analysis examines metalinguistic information of the user speech and can identify the most salient subject matter of the speech, assess how confident a speaker is in the content of his or her speech, and identify the attitude, mood, emotion, sentiment, etc. of the speaker. Other information not described in the content of the speech can also be used.

Type: Application

Filed: September 17, 2010

Publication date: March 22, 2012

Applicant: AT&T Intellectual Property I, L.P

Inventors: Srinivas BANGALORE, Junlan Feng, Michael Johnston, Taniya Mishra
AGE DETERMINATION USING SPEECH

Publication number: 20120072216

Abstract: A method and device are configured to receive voice data from a user and perform speech recognition on the received voice data. A confidence score is calculated that represents the likelihood that received voice data has been accurately recognized. A likely age range is determined associated with the user based on the confidence score.

Type: Application

Filed: November 30, 2011

Publication date: March 22, 2012

Applicant: VERIZON PATENT AND LICENSING INC.

Inventor: Kevin R. Witzman
DEEP BELIEF NETWORK FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION

Publication number: 20120065976

Abstract: A method is disclosed herein that includes an act of causing a processor to receive a sample, wherein the sample is one of spoken utterance, an online handwriting sample, or a moving image sample. The method also comprises the act of causing the processor to decode the sample based at least in part upon an output of a combination of a deep structure and a context-dependent Hidden Markov Model (HMM), wherein the deep structure is configured to output a posterior probability of a context-dependent unit. The deep structure is a Deep Belief Network consisting of many layers of nonlinear units with connecting weights between layers trained by a pretraining step followed by a fine-tuning step.

Type: Application

Filed: September 15, 2010

Publication date: March 15, 2012

Applicant: Microsoft Corporation

Inventors: Li Deng, Dong Yu, George Edward Dahl
SYSTEM AND METHOD FOR MERGING AUDIO DATA STREAMS FOR USE IN SPEECH RECOGNITION APPLICATIONS

Publication number: 20120046946

Abstract: A system and method for merging audio data streams receive audio data streams from separate inputs, independently transform each data stream from the time to the frequency domain, and generate separate feature data sets for the transformed data streams. Feature data from each of the separate feature data sets is selected to form a merged feature data set that is output to a decoder for recognition purposes. The separate inputs can include an ear microphone and a mouth microphone.

Type: Application

Filed: August 20, 2010

Publication date: February 23, 2012

Applicant: ADACEL SYSTEMS, INC.

Inventor: Chang-Qing Shu
Using Utterance Classification in Telephony and Speech Recognition Applications

Publication number: 20110307252

Abstract: Described is the use of utterance classification based methods and other machine learning techniques to provide a telephony application or other voice menu application (e.g., an automotive application) that need not use Context-Free-Grammars to determine a user's spoken intent. A classifier receives text from an information retrieval-based speech recognizer and outputs a semantic label corresponding to the likely intent of a user's speech. The semantic label is then output, such as for use by a voice menu program in branching between menus. Also described is training, including training the language model from acoustic data without transcriptions, and training the classifier from speech-recognized acoustic data having associated semantic labels.

Type: Application

Filed: June 15, 2010

Publication date: December 15, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Yun-Cheng Ju, James Garnet Droppo, III
METHOD AND SYSTEM FOR LEXICAL NAVIGATION OF ITEMS

Publication number: 20110307247

Abstract: A method and a system for lexical navigation of a corpus of items are provided. For example, the method may include generating a data structure in a non-transitory, computer readable medium. The data structure may include a number of items, a number of keywords, and a frequency that each of the keywords is associated with each of the items. The method may further include generating a top-level lexical cloud that includes a subset of the keywords. Each keyword in the subset may be associated with a size that is proportional its frequency of occurrence. Finally, the method may include generating a plurality of lower-level lexical clouds by eliminating any one of the plurality of items not associated with a particular one of the keywords from the data structure, and generating the lower level lexical cloud as a second subset of the plurality of keywords that remain in the data structure.

Type: Application

Filed: June 14, 2010

Publication date: December 15, 2011

Inventor: Nathan Moroney
SPEECH RECOGNITION SYSTEM AND METHOD WITH ADJUSTABLE MEMORY USAGE

Publication number: 20110295605

Abstract: This speech recognition system provides a function that is capable of adjusting memory usage according to the different target resources. It extracts a sequence of feature vectors from input speech signal. A module for constructing search space reads a text file and generates a word-level search space in an off-line phase. After removing redundancy, the word-level search space is expanded to a phone-level one and is represented by a tree-structure. This may be performed by combining the information from dictionary which gives the mapping from a word to its phonetic sequence(s). In the online phase, a decoder traverses the search space, takes the dictionary and at least one acoustic model as input, computes score of feature vectors and outputs decoding result.

Type: Application

Filed: December 28, 2010

Publication date: December 1, 2011

Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE

Inventor: Shiuan-Sung LIN
Acoustic Signal Classification System

Publication number: 20110213612

Abstract: A system classifies the source of an input signal. The system determines whether a sound source belongs to classes that may include human speech, musical instruments, machine noise, or other classes of sound sources. The system is robust, performing classification despite variation in sound level and noise masking. Additionally, the system consumes relatively few computational resources and adapts over time to provide consistently accurate classification.

Type: Application

Filed: May 11, 2011

Publication date: September 1, 2011

Inventor: Pierre Zakarauskas
Method and Discriminator for Classifying Different Segments of a Signal

Publication number: 20110202337

Abstract: For classifying different segments of a signal which has segments of at least a first type and second type, e.g. audio and speech segments, the signal is short-term classified on the basis of the at least one short-term feature extracted from the signal and a short-term classification result is delivered. The signal is also long-term classified on the basis of the at least one short-term feature and at least one long-term feature extracted from the signal and a long-term classification result is delivered. The short-term classification result and the long-term classification result are combined to provide an output signal indicating whether a segment of the signal is of the first type or of the second type.

Type: Application

Filed: January 11, 2011

Publication date: August 18, 2011

Inventors: Guillaume Fuchs, Stefan Bayer, Jens Hirschfeld, Juergen Herre, Jeremie Lecomte, Frederik Nagel, Nikolaus Rettelbach, Stefan Wabnik, Yoshikazu Yokotani
System And Method For Triaging Of Information Feeds

Publication number: 20110153646

Abstract: A system and method for triaging of information feeds is provided. A plurality of information feeds are received. At least one topic is identified from each information feed. At least one topic is presented to a user in topic facet including a plurality of identified topics. A selection of one of the plurality of topics is received from the user. The user interface is updated to display only the feeds that contain the selected topic.

Type: Application

Filed: December 23, 2009

Publication date: June 23, 2011

Applicant: Palo Alto Research Center Incorporated

Inventors: Lichan Hong, Gregorio Covertino, Bongwon Suh, Ed H. Chi
SYSTEMS AND METHODS FOR IDENTITY MATCHING BASED ON PHONETIC AND EDIT DISTANCE MATCHING

Publication number: 20110153327

Abstract: According to embodiments of the present disclosure, a matching module is configured to accurately match a probe identity of an entity to a collection of entities. The matching module is configured to match the probe identity of the entity to the collection of entities based on a combination of phonetic matching processes and edit distance processes. The matching module is configured to create phonetic groups for name parts of identities in the collection. The matching module is configured to compare probe name parts of the probe identity to the name parts associated with the phonetic groups.

Type: Application

Filed: February 22, 2008

Publication date: June 23, 2011

Inventor: Anthony S. Iasso
CONFIDENCE CALIBRATION IN AUTOMATIC SPEECH RECOGNITION SYSTEMS

Publication number: 20110144986

Abstract: Described is a calibration model for use in a speech recognition system. The calibration model adjusts the confidence scores output by a speech recognition engine to thereby provide an improved calibrated confidence score for use by an application. The calibration model is one that has been trained for a specific usage scenario, e.g., for that application, based upon a calibration training set obtained from a previous similar/corresponding usage scenario or scenarios. Different calibration models may be used with different usage scenarios, e.g., during different conditions. The calibration model may comprise a maximum entropy classifier with distribution constraints, trained with continuous raw confidence scores and multi-valued word tokens, and/or other distributions and extracted features.

Type: Application

Filed: December 10, 2009

Publication date: June 16, 2011

Applicant: Microsoft Corporation

Inventors: Dong Yu, Li Deng, Jinyu Li
VIDEO SUMMARIZATION METHOD BASED ON MINING STORY STRUCTURE AND SEMANTIC RELATIONS AMONG CONCEPT ENTITIES THEREOF

Publication number: 20110122137

Abstract: A video summarized method based on mining the story structure and semantic relations among concept entities has steps of processing a video to generate multiple important shots that are annotated with respective keywords: Performing a concept expansion process by using the keywords to create expansion trees for the annotated shots; rearranging the keywords of the expansion trees and classifying to calculate relations thereof; applying a graph entropy algorithm to determine significant shots and edges interconnected with the shots. Based on the determined result of the graph entropy algorithm, a structured relational graph is built to display the significant shots and edges thereof. Consequently, users can more rapidly browse the content of a video and comprehend if different shots are related.

Type: Application

Filed: November 23, 2009

Publication date: May 26, 2011

Applicant: NATIONAL CHENG KUNG UNIVERSITY

Inventors: Jhing-Fa WANG, Bo-Wei CHEN, Jia-Ching WANG, Chia-Hung CHANG
Neural Segmentation of an Input Signal and Applications Thereof

Publication number: 20110119057

Abstract: Disclosed are systems, methods, and computer-program products for segmenting content of an input signal and applications thereof. In an embodiment, the system includes simulated neurons, a phase modulator, and an entity-identifier module. Each simulated neuron is connected to one or more other simulated neurons and is associated with an activity and a phase. The activity and the phase of each simulated neuron is set based on the activity and the phase of the one or more other simulated neurons connected to each simulated neuron. The phase modulator includes individual modulators, each configured to modulate the activity and the phase of each of the plurality of simulated neurons based on a modulation function. The entity-identifier module is configured to identify one or more distinct entities (e.g., objects, sound sources, etc.) included in the input signal based on the one or more distinct collections of simulated neurons that have substantially distinct phases.

Type: Application

Filed: November 18, 2009

Publication date: May 19, 2011

Applicant: The Intellisis Corporation

Inventors: Douglas A. Moore, Kristi H. Tsukida, Paulo B. Ang
METHODS, ELECTRONIC DEVICES, AND COMPUTER PROGRAM PRODUCTS FOR GENERATING AN INDICIUM THAT REPRESENTS A PREVAILING MOOD ASSOCIATED WITH A PHONE CALL

Publication number: 20110082695

Abstract: An electronic device includes a call analysis module that is configured to analyze characteristics of a phone call and to generate an indicium that represents a prevailing mood associated with the phone call based on the analyzed characteristics.

Type: Application

Filed: October 2, 2009

Publication date: April 7, 2011

Inventor: Emil Morgan Billing Bengt
METHOD AND APPARATUS FOR ORDERING RESULTS OF A QUERY

Publication number: 20110071826

Abstract: A method and apparatus for ordering results from a query is provided herein. During operation, a spoken query is received and converted to a textual representation, such as a word lattice. Search strings are then created from the word lattice. For example a set search strings may be created from the N-grams, such as unigrams and bigrams, of the word lattice. The search strings may be ordered and truncated based on confidence values assigned to the n-grams by the speech recognition system. The set of search strings are sent to at least one search engine, and search results are obtained. The search results are then re-arranged or reordered based on a semantic similarity between the search results and the word lattice.

Type: Application

Filed: September 23, 2009

Publication date: March 24, 2011

Applicant: MOTOROLA, INC.

Inventors: Changxue Ma, Harry M. Bliss
RECOGNITION VIA HIGH-DIMENSIONAL DATA CLASSIFICATION

Publication number: 20110064302

Abstract: A method is disclosed for recognition of high-dimensional data in the presence of occlusion, including: receiving a target data that includes an occlusion and is of an unknown class, wherein the target data includes a known object; sampling a plurality of training data files comprising a plurality of distinct classes of the same object as that of the target data; and identifying the class of the target data through linear superposition of the sampled training data files using l1 minimization, wherein a linear superposition with a sparsest number of coefficients is used to identify the class of the target data.

Type: Application

Filed: January 29, 2009

Publication date: March 17, 2011

Inventors: Yi Ma, Allen Yang Yang, John Norbert Wright, Andrew William Wagner
Method for Speech Recognition on All Languages and for Inputing words using Speech Recognition

Publication number: 20110066434

Abstract: The invention can recognize all languages and input words. It needs m unknown voices to represent m categories of known words with similar pronunciations. Words can be pronounced in any languages, dialects or accents. Each will be classified into one of m categories represented by its most similar unknown voice. When user pronounces a word, the invention finds its F most similar unknown voices. All words in F categories represented by F unknown voices will be arranged according to their pronunciation similarity and alphabetic letters. The pronounced word should be among the top words. Since we only find the F most similar unknown voices from m (=500) unknown voices and since the same word can be classified into several categories, our recognition method is stable for all users and can fast and accurately recognize all languages (English, Chinese and etc.) and input much more words without using samples.

Type: Application

Filed: September 29, 2009

Publication date: March 17, 2011

Inventors: Tze-Fen LI, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
SYSTEMS, METHODS, AND APPARATUS FOR AUTOMATED MAPPING AND INTEGRATED WORKFLOW OF A CONTROLLED MEDICAL VOCABULARY

Publication number: 20110066425

Abstract: Systems, methods, and apparatus provide clinical terminology services including a controlled medical vocabulary supplemented by local clinical content.

Type: Application

Filed: September 17, 2009

Publication date: March 17, 2011

Inventors: Darren S. Hudgins, Thomas A. Oniki
SYSTEM AND METHOD FOR BUILDING OPTIMAL STATE-DEPENDENT STATISTICAL UTTERANCE CLASSIFIERS IN SPOKEN DIALOG SYSTEMS

Publication number: 20110046951

Abstract: A system and a method to generate statistical utterance classifiers optimized for the individual states of a spoken dialog system is disclosed. The system and method make use of large databases of transcribed and annotated utterances from calls collected in a dialog system in production and log data reporting the association between the state of the system at the moment when the utterances were recorded and the utterance. From the system state, being a vector of multiple system variables, subsets of these variables, certain variable ranges, quantized variable values, etc. can be extracted to produce a multitude of distinct utterance subsets matching every possible system state. For each of these subset and variable combinations, statistical classifiers can be trained, tuned, and tested, and the classifiers can be stored together with the performance results and the state subset and variable combination.

Type: Application

Filed: August 21, 2009

Publication date: February 24, 2011

Inventors: David Suendermann, Jackson Liscombe, Krishna Dayanidhi, Roberto Pieraccini
MOBILE TERMINAL AND CONTROLLING METHOD THEREOF

Publication number: 20100332226

Abstract: A mobile terminal and controlling method thereof are disclosed, by which a specific content and another content associated with the specific content can be quickly searched using a user's voice. The present invention includes inputting a voice for a search for a specific content provided to the mobile terminal via a microphone, analyzing a meaning of the inputted voice, searching a memory for at least one content to which a voice name having a meaning associated with the analyzed voice is tagged, and displaying the searched at least one content.

Type: Application

Filed: June 30, 2010

Publication date: December 30, 2010

Applicant: LG ELECTRONICS INC.

Inventors: In Jik Lee, Jong Keun Youn, Dae Sung Jung, Jae Min Joh, Sun Hwa Cha, Seung Heon Yang, Jae Hoon Yu
QUERY TREE NAVIGATION

Publication number: 20100318536

Abstract: System, computer implemented method and computer program product for preparing and navigating a query tree including a plurality of query nodes and informational nodes. Each query node is associated with a prompt, branching criteria and keywords. A current query node provides a prompt to a user and a user response is received and analyzed to identify branching criteria and keywords from the user response. The method navigates to another node in the query tree in consideration of the branching criteria received in the user response and a comparison between the keywords received in the user response and the keywords associated with the query nodes. The comparison may validate navigation to a destination node corresponding to the branching criteria or the comparison may indicate incorrect navigation of the query tree. Corrective navigation can be implemented in various ways based upon the keywords received in the user response.

Type: Application

Filed: June 12, 2009

Publication date: December 16, 2010

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Justin P. Bandholz, William G. Pagan, William J. Piazza
Speech Capabilities Of A Multimodal Application

Publication number: 20100299146

Abstract: Improving speech capabilities of a multimodal application including receiving, by the multimodal browser, a media file having a metadata container; retrieving, by the multimodal browser, from the metadata container a speech artifact related to content stored in the media file for inclusion in the speech engine available to the multimodal browser; determining whether the speech artifact includes a grammar rule or a pronunciation rule; if the speech artifact includes a grammar rule, modifying, by the multimodal browser, the grammar of the speech engine to include the grammar rule; and if the speech artifact includes a pronunciation rule, modifying, by the multimodal browser, the lexicon of the speech engine to include the pronunciation rule.

Type: Application

Filed: May 19, 2009

Publication date: November 25, 2010

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ciprian Agapi, William K. Bodin, Charles W. Cross, JR.
SPEAKER ADAPTATION APPARATUS AND PROGRAM THEREOF

Publication number: 20100169094

Abstract: A speaker adaptation apparatus includes an acquiring unit configured to acquire an acoustic model including HMMs and decision trees for estimating what type of the phoneme or the word is included in a feature value used for speech recognition, the HMMs having a plurality of states on a phoneme-to-phoneme basis or a word-to-word basis, and the decision trees being configured to reply to questions relating to the feature value and output likelihoods in the respective states of the HMMs, and a speaker adaptation unit configured to adapt the decision trees to a speaker, the decision trees being adapted using speaker adaptation data vocalized by the speaker of an input speech.

Type: Application

Filed: September 17, 2009

Publication date: July 1, 2010

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Masami Akamine, Jitendra Ajmera, Partha Lal
TREND EVALUATION DEVICE, ITS METHOD, AND PROGRAM

Publication number: 20100153107

Abstract: A trend evaluation device includes trend evaluation means having at least one of relative cooccurrence probability calculation means for calculating a change of cooccurrence probability of a keyword and an associated word and relative associated word similarity calculation means for calculating a change degree of a conversation topic concerning the keyword, so as to calculate a trend score by considering one or more combinations of the relative cooccurrence probability and the relative associated word similarity obtained by these means.

Type: Application

Filed: September 25, 2006

Publication date: June 17, 2010

Applicant: NEC CORPORATION

Inventor: Hideki Kawai
Method and system to identify, quantify, and display acoustic transformational structures in speech

Publication number: 20100145681

Abstract: The invention for processing speech that is described herein measures the periodic changes of multiple acoustic features in a digitized utterance without regard for lexical, sublexical, or prosodic features. These measurements of periodic, simultaneous changes of multiple acoustic features are assembled into transformational structures. Various types of transformational structures are identified, quantified, and displayed by the invention. The invention is useful for the study of such speaker characteristics as cognitive, emotional, linguistic, and behavioral functioning, and may be employed in the study of other phenomena of interest to the user.

Type: Application

Filed: December 8, 2008

Publication date: June 10, 2010

Inventor: Daniel M. Begel
SPEECH CLASSIFICATION APPARATUS, SPEECH CLASSIFICATION METHOD, AND SPEECH CLASSIFICATION PROGRAM

Publication number: 20100138223

Abstract: An object of the present invention is to allow classification of sequentially input speech signals with good accuracy based on similarity of speakers and environments by using a realistic memory use amount, a realistic processing speed, and an on-line operation. A speech classification probability calculation means 103 calculates a probability (probability of classification into each cluster) that a latest one of the speech signals (speech data) belongs to each cluster based on a generative model which is a probability model. A parameter updating means 107 successively estimates parameters that define the generative model based on the probability of classification of the speech data into each cluster calculated by the speech classification probability calculation means 103 (in FIG. 1).

Type: Application

Filed: March 13, 2008

Publication date: June 3, 2010

Inventor: Takafumi Koshinaka
SYSTEM AND METHOD FOR TARGETED ADVERTISING

Publication number: 20100106498

Abstract: Disclosed herein are systems, methods, and computer readable-media for targeted advertising, the method including receiving an audio stream containing user speech from a first device, generating text based on the speech contained in the audio stream, identifying at least one key phrase in the text, receiving from an advertiser an advertisement related to the identified at least one key phrase, and displaying the advertisement. In one aspect, the method further includes receiving from an advertiser a set of rules associated with the received advertisement and displaying the advertisement in accordance with the associated set of rules. The first device can be a converged voice and data communications device connected to a network. The communications device can generate text based on the speech. In one aspect, the method displays the advertisement on one or both of a converged voice and data communications device and a second communications device. A central server can generate text based on the speech.

Type: Application

Filed: October 24, 2008

Publication date: April 29, 2010

Applicant: AT&T Intellectual Property I, L.P.

Inventor: Patrick Jason MORRISON
METHOD AND APPARATUS FOR LOCATING SPEECH KEYWORD AND SPEECH RECOGNITION SYSTEM

Publication number: 20100094626

Abstract: It is an object of the present invention to provide a method and apparatus for locating a keyword of a speech and a speech recognition system. The method includes the steps of: by extracting feature parameters from frames constituting the recognition target speech, forming a feature parameter vector sequence that represents the recognition target speech; by normalizing of the feature parameter vector sequence with use of a codebook containing a plurality of codebook vectors, obtaining a feature trace of the recognition target speech in a vector space; and specifying the position of a keyword by matching prestored keyword template traces with the feature trace. According to the present invention, a keyword template trace and a feature space trace of a target speech are drawn in accordance with an identical codebook. This causes resampling to be unnecessary in performing linear movement matching of speech wave frames having similar phonological feature structures.

Type: Application

Filed: September 27, 2007

Publication date: April 15, 2010

Inventors: Fengqin Li, Yadong Wu, Qinqtao Yang, Chen Chen
SPEECH INTERFACES

Publication number: 20100057452

Abstract: The described implementations relate to speech interfaces and in some instances to speech pattern recognition techniques that enable speech interfaces. One system includes a feature pipeline configured to produce speech feature vectors from input speech. This system also includes a classifier pipeline configured to classify individual speech feature vectors utilizing multi-level classification.

Type: Application

Filed: August 28, 2008

Publication date: March 4, 2010

Applicant: Microsoft Corporation

Inventors: Kunal Mukerjee, Brendan Meeder
Speech recognition semantic classification training

Publication number: 20100023331

Abstract: An automated method is described for developing an automated speech input semantic classification system such as a call routing system. A set of semantic classifications is defined for classification of input speech utterances, where each semantic classification represents a specific semantic classification of the speech input. The semantic classification system is trained from training data having little or no in-domain manually transcribed training data, and then operated to assign input speech utterances to the defined semantic classifications. Adaptation training data based on input speech utterances is collected with manually assigned semantic labels. When the adaptation training data satisfies a pre-determined adaptation criteria, the semantic classification system is automatically retrained based on the adaptation training data.

Type: Application

Filed: July 15, 2009

Publication date: January 28, 2010

Applicant: Nuance Communications, Inc.

Inventors: Nicolae Duta, Rèal Tremblay, Andy Mauro, Douglas Peters
Automatic Segmentation in Speech Synthesis

Publication number: 20090313025

Abstract: A method and system are disclosed that automatically segment speech to generate a speech inventory. The method includes initializing a Hidden Markov Model (HMM) using seed input data, performing a segmentation of the HMM into speech units to generate phone labels, correcting the segmentation of the speech units. Correcting the segmentation of the speech units includes re-estimating the HMM based on a current version of the phone labels, embedded re-estimating of the HMM, and updating the current version of the phone labels using spectral boundary correction. The system includes modules configured to control a processor to perform steps of the method.

Type: Application

Filed: August 20, 2009

Publication date: December 17, 2009

Applicant: AT&T Corp.

Inventors: Alistair D. CONKIE, Yeon-Jun KIM
Information apparatus for use in mobile unit

Publication number: 20090313008

Abstract: An information apparatus for use in mobile unit, which is mounted on a mobile unit, includes at least a broadcast receiver 11 and 13 for receiving a broadcasting signal containing a broadcasting station name; a recognition dictionary 30 for registering the broadcasting station name; and a voice recognition section 27 for performing voice recognition of a voice input indicating the broadcasting station name, and carries out, referring to the dictionary, tuning to the broadcasting station associated with the broadcasting station name corresponding to a voice recognition result.

Type: Application

Filed: April 4, 2006

Publication date: December 17, 2009

Inventors: Reiko Okada, Tadashi Suzuki, Yuzo Maruta
CLASSIFYING PORTIONS OF A SIGNAL REPRESENTING SPEECH

Publication number: 20090271196

Abstract: Methods, systems, and machine-readable media are disclosed for processing a signal representing speech. According to one embodiment, processing a signal representing speech can comprise receiving a frame of the signal representing speech. The frame can be classified as unvoiced or voiced based on occurrence of one or more events within the frame. For example, the one or more events can comprise one or more glottal pulses. In response to classifying the frame as voiced, the frame can be processed.

Type: Application

Filed: October 23, 2008

Publication date: October 29, 2009

Applicant: Red Shift Company, LLC

Inventors: Joel K. Nyquist, Erik N. Reckase, Matthew D. Robinson, John F. Remillard
SYSTEMS AND METHODS TO ENABLE INTERACTIVITY AMONG A PLURALITY OF DEVICES

Publication number: 20090265163

Abstract: Methods and systems to exchange and display data among a plurality of devices in response to one or more of user input and context-based information. User input may include one or more of motion, speech, text, pointing, and touch-selecting. Context-based information may include one or more of user location, which may be relative to one or more devices, background audio, information related to one or more products and/or services, and user-based context information. User context-based information may correspond one or more of prior transactions, prior activities, prior content exposure, and demographic information. Also disclosed herein are methods and systems to correlate user speech to one or more of commands and data objects, with respect to context-based information. Methods and systems to recognize speech may be implemented in combination with methods and systems to exchange and/or display of data among a plurality of devices, and in other environments.

Type: Application

Filed: February 12, 2009

Publication date: October 22, 2009

Applicant: Phone Through, Inc.

Inventors: Lehmann Li, Donald Addiss
SIMILIARITY MEASURES FOR SHORT SEGMENTS OF TEXT

Publication number: 20090240498

Abstract: Systems and methods to perform short text segment similarity measures. Illustratively, a short text segment similarity environment comprises a short text engine operative to process data representative of short segments of text and an instruction set comprising at least one instruction to instruct the short text engine to process data representative of short text segment inputs according to a selected short text similarity identification paradigm. Illustratively, two or more short text segments can be received as input by the short text engine and a request to identify similarities among the two or more short text segments. Responsive to the request and data input, the short text engine executes a selected similarity identification technique in accordance with the sort text similarity identification paradigm to process the received data and to identify similarities between the short text segment inputs.

Type: Application

Filed: March 19, 2008

Publication date: September 24, 2009

Applicant: Microsoft Corporation

Inventors: Wen-tau Yih, Alexei V. Bocharov, Christopher A. Meek
SPEECH DATA SUMMARIZING AND REPRODUCING APPARATUS, SPEECH DATA SUMMARIZING AND REPRODUCING METHOD, AND SPEECH DATA SUMMARIZING AND REPRODUCING PROGRAM

Publication number: 20090204399

Abstract: Necessary portions of stored speech data representing conference content are summarized and reproduced in a predetermined time. Conference speech is summarized and reproduced using a speech data summarizing and reproducing apparatus comprising a speech data divider for dividing and structuring conference speech data into several utterance unit data based on utterers, distributed documents, the occurrence frequency of words in speech recognition results, and pauses, an importance level calculator for determining important utterance unit data based on the occurrence frequency of keywords, the information of utterers, and data specified by the user, a summarizer for extracting important utterance unit data and summarizing them within a specified time, and a speech data reproducer for reproducing the summarized speech data in chronological order or an order of importance levels with auxiliary information added thereto.

Type: Application

Filed: May 7, 2007

Publication date: August 13, 2009

Applicant: NEC CORPORATION

Inventor: Susumu Akamine
METHOD FOR ASSESSING PRONUNCIATION ABILITIES

Publication number: 20090171661

Abstract: Techniques for assessing pronunciation abilities of a user are provided. The techniques include recording a sentence spoken by a user, performing a classification of the spoken sentence, wherein the classification is performed with respect to at least one N-ordered class, and wherein the spoken sentence is represented by a set of at least one acoustic feature extracted from the spoken sentence, and determining a score based on the classification, wherein the score is used to determine an optimal set of at least one question to assess pronunciation ability of the user without human intervention.

Type: Application

Filed: June 27, 2008

Publication date: July 2, 2009

Applicant: International Business Machines Corporation

Inventors: Jayadeva, Sachindra Joshi, Himanshu Pant, Ashish Verma
KEYWORD EXTRACTING DEVICE

Publication number: 20090150155

Abstract: The present invention aims at extracting a keyword of conversation without preparations by advanced anticipation of keywords of conversation.

Type: Application

Filed: March 14, 2008

Publication date: June 11, 2009

Applicant: PANASONIC CORPORATION

Inventors: Mitsuru Endo, Maki Yamada, Keiko Morii, Tomohiro Konuma, Kazuya Nomura
Acoustic Model Adaptation Methods Based on Pronunciation Variability Analysis for Enhancing the Recognition of Voice of Non-Native Speaker and Apparatus Thereof

Publication number: 20090119105

Abstract: The example embodiment of the present invention provides an acoustic model adaptation method for enhancing recognition performance for a non-native speaker's speech. In order to adapt acoustic models, first, pronunciation variations are examined by analyzing a non-native speaker's speech. Thereafter, based on variation pronunciation of a non-native speaker's speech, acoustic models are adapted in a state-tying step during a training process of acoustic models. When the present invention for adapting acoustic models and a conventional acoustic model adaptation scheme are combined, more-enhanced recognition performance can be obtained. The example embodiment of the present invention enhances recognition performance for a non-native speaker's speech while reducing the degradation of recognition performance for a native speaker's speech.

Type: Application

Filed: March 30, 2007

Publication date: May 7, 2009

Inventors: Hong Kook Kim, Yoo Rhee Oh, Jae Sam Yoon
SYSTEM FOR RECOGNIZING SPEECH FOR SEARCHING A DATABASE

Publication number: 20090112593

Abstract: A system is provided for recognizing speech for searching a database. The system receives speech input as a spoken search request and then processes the speech input in a speech recognition step using a vocabulary for recognizing the spoken request. By processing the speech input words recognized in the speech input and included in the vocabulary are obtained to form at least one hypothesis. The hypothesis is then utilized to search a database using the at least one hypothesis as a search query. A search result is then received from the database and provided to the user.

Type: Application

Filed: October 24, 2008

Publication date: April 30, 2009

Applicant: Harman Becker Automotive Systems GmbH

Inventors: Lars Konig, Andreas Low, Udo Haiber

1 2 next