Similarity Patents (Class 704/239)
  • Publication number: 20120232899
    Abstract: A system and method for identification of a speaker by phonograms of oral speech is disclosed. Similarity between a first phonogram of the speaker and a second, or sample, phonogram is evaluated by matching formant frequencies in referential utterances of a speech signal, where the utterances for comparison are selected from the first phonogram and the second phonogram. Referential utterances of speech signals are selected from the first phonogram and the second phonogram, where the referential utterances include formant paths of at least three formant frequencies. The selected referential utterances including at least two identical formant frequencies are compared therebetween. Similarity of the compared referential utterances from matching other formant frequencies is evaluated, where similarity of the phonograms is determined from evaluation of similarity of all the compared referential utterances.
    Type: Application
    Filed: March 23, 2012
    Publication date: September 13, 2012
    Applicant: Obschestvo s orgranichennoi otvetstvennost'yu "Centr Rechevyh Technologij"
    Inventor: Sergey Lvovich Koval
  • Publication number: 20120232900
    Abstract: The present invention relates to a method for speaker recognition, comprising the steps of obtaining and storing speaker information for at least one target speaker; obtaining a plurality of speech samples from a plurality of telephone calls from at least one unknown speaker; classifying the speech samples according to the at least one unknown speaker thereby providing speaker-dependent classes of speech samples; extracting speaker information for the speech samples of each of the speaker-dependent classes of speech samples; combining the extracted speaker information for each of the speaker-dependent classes of speech samples; comparing the combined extracted speaker information for each of the speaker-dependent classes of speech samples with the stored speaker information for the at least one target speaker to obtain at least one comparison result; and determining whether one of the at least one unknown speakers is identical with the at least one target speaker based on the at least one comparison result.
    Type: Application
    Filed: November 12, 2009
    Publication date: September 13, 2012
    Inventors: Johan Nikolaas Langehoveen Brummer, Luis Buera Rodriguez, Martha Garcia Gomar
  • Patent number: 8260061
    Abstract: In an image data output processing apparatus of the present invention, an image matching section is capable of determining whether a similarity exists between each image of an N-up document and a reference document when input image data is indicative of the N-up document. An output process control section is capable of regulating an output process of each image in accordance with a result of determining whether the similarity exists between each image of the N-up document and the reference document. This allows detecting with high accuracy a document image under regulation on the output process and regulating the output process, when the input image data is indicative of an N-up document and includes the document image under regulation on the output process.
    Type: Grant
    Filed: September 18, 2008
    Date of Patent: September 4, 2012
    Assignee: Sharp Kabushiki Kaisha
    Inventor: Hitoshi Hirohata
  • Patent number: 8255214
    Abstract: A first signal of two signals to be compared for similarity is divided into small areas and one small area is selected for calculating the correlation with a second signal using a correlative method. Then, the quantity of translation, expansion rate and similarity in an area where the similarity, which is the square of the correlation value, reaches its maximum, are found. Values based on the similarity are integrated at a position represented by the quantity of translation and expansion rate. Similar processing is performed with respect to all the small areas, and at a peak where the maximum integral value of the similarity is obtained, its magnitude is compared with a threshold value to evaluate the similarity. The small area voted for that peak can be extracted.
    Type: Grant
    Filed: October 15, 2002
    Date of Patent: August 28, 2012
    Assignee: Sony Corporation
    Inventors: Mototsugu Abe, Masayuki Nishiguchi
  • Publication number: 20120209606
    Abstract: Obtaining information from audio interactions associated with an organization. The information may comprise entities, relations or events. The method comprises: receiving a corpus comprising audio interactions; performing audio analysis on audio interactions of the corpus to obtain text documents; performing linguistic analysis of the text documents; matching the text documents with one or more rules to obtain one or more matches; and unifying or filtering the matches.
    Type: Application
    Filed: February 14, 2011
    Publication date: August 16, 2012
    Applicant: Nice Systems Ltd.
    Inventors: Maya Gorodetsky, Ezra Daya, Oren Pereg
  • Patent number: 8239197
    Abstract: A system and method for efficiently transcribing verbal messages transmitted over the Internet (or other network) into text. The verbal messages are initially checked to ensure that they are in a valid format and include a return network address, and if so, are processed either as whole verbal messages or split into segments. These whole verbal messages and segments are processed by an automated speech recognition (ASR) program, which produces automatically recognized text. The automatically recognized text messages or segments are assigned to selected workbenches for manual editing and transcription, producing edited text. The segments of edited text are reassembled to produce whole edited text messages, undergo post processing to correct minor errors and output as an email, an SMS message, a file, or an input to a program. The automatically recognized text and manual edits thereof are returned as feedback to the ASR program to improve its accuracy.
    Type: Grant
    Filed: October 29, 2008
    Date of Patent: August 7, 2012
    Assignee: Intellisist, Inc.
    Inventors: Mike O. Webb, Bruce J. Peterson, Janet S. Kaseda
  • Publication number: 20120191453
    Abstract: A system and methods for matching at least one word of an utterance against a set of template hierarchies to select the best matching template or set of templates corresponding to the utterance. The system and methods determines at least one exact, inexact, and partial match between the at least one word of the utterance and at least one term within the template hierarchy to select and populate a template or set of templates corresponding to the utterance. The populated template or set of templates may then be used to generate a narrative template or a report template.
    Type: Application
    Filed: March 30, 2012
    Publication date: July 26, 2012
    Applicant: Cyberpulse L.L.C.
    Inventors: James ROBERGE, Jeffrey Soble
  • Patent number: 8229744
    Abstract: A method, system, and computer program for class detection and time mediated averaging of class dependent models. A technique is described to take advantage of gender information in training data and how obtain female, male, and gender independent models from this information. By using a probability value to average male and female Gaussian Mixture Models (GMMs), dramatic deterioration in cross gender decoding performance is avoided.
    Type: Grant
    Filed: August 26, 2003
    Date of Patent: July 24, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Satyanarayana Dharanipragada, Peder A. Olsen
  • Patent number: 8214211
    Abstract: In a voice processing device, a male voice index calculator calculates a male voice index indicating a similarity of the input sound relative to a male speaker sound model. A female voice index calculator calculates a female voice index indicating a similarity of the input sound relative to a female speaker sound model. A first discriminator discriminates the input sound between a non-human-voice sound and a human voice sound which may be either of the male voice sound or the female voice sound. A second discriminator discriminates the input sound between the male voice sound and the female voice sound based on the male voice index and the female voice index in case that the first discriminator discriminates the human voice sound.
    Type: Grant
    Filed: August 26, 2008
    Date of Patent: July 3, 2012
    Assignee: Yamaha Corporation
    Inventor: Yasuo Yoshioka
  • Patent number: 8209174
    Abstract: A text-independent speaker verification system utilizes mel frequency cepstral coefficients analysis in the feature extraction blocks, template modeling with vector quantization in the pattern matching blocks, an adaptive threshold and an adaptive decision verdict and is implemented in a stand-alone device using less powerful microprocessors and smaller data storage devices than used by comparable systems of the prior art.
    Type: Grant
    Filed: April 17, 2009
    Date of Patent: June 26, 2012
    Assignee: Saudi Arabian Oil Company
    Inventor: Essam Abed Al-Telmissani
  • Patent number: 8204749
    Abstract: A system, method and computer-readable medium for practicing a method of emotion detection during a natural language dialog between a human and a computing device are disclosed. The method includes receiving an utterance from a user in a natural language dialog, receiving contextual information regarding the natural language dialog which is related to changes of emotion over time in the dialog, and detecting an emotion of the user based on the received contextual information. Examples of contextual information include, for example, differential statistics, joint statistics and distance statistics.
    Type: Grant
    Filed: March 21, 2011
    Date of Patent: June 19, 2012
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Dilek Z. Hakkani-Tur, Jackson J. Liscombe, Guiseppe Riccardi
  • Patent number: 8175869
    Abstract: A method, apparatus, and medium for classifying a speech signal and a method, apparatus, and medium for encoding the speech signal using the same are provided. The method for classifying a speech signal includes calculating classification parameters from an input signal having block units, calculating a plurality of classification criteria from the classification parameters, and classifying the level of the input signal using the plurality of classification criteria. The classification parameters include at least one of an energy parameter of the input signal, a cross-correlation parameter between a specific block of a present frame and the input signal, and an integrated cross-correlation parameter obtained by accumulating the cross-correlation parameter.
    Type: Grant
    Filed: July 5, 2006
    Date of Patent: May 8, 2012
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hosang Sung, Rakesh Taori, Kangeun Lee
  • Patent number: 8175882
    Abstract: A method for task execution improvement, the method includes: generating a baseline model for executing a task; recording a user executing a task; comparing the baseline model to the user's execution of the task; and providing feedback to the user based on the differences in the user's execution and the baseline model.
    Type: Grant
    Filed: January 25, 2008
    Date of Patent: May 8, 2012
    Assignee: International Business Machines Corporation
    Inventors: Sara H. Basson, Dimitiri Kanevsky, Edward E. Kelley, Bhuvana Ramabhadran
  • Patent number: 8160370
    Abstract: An image processing apparatus includes: an image pyramid forming section configured to form an image pyramid by hierarchically creating layered image data including differently scaled images from inputted image data; a position calculating section configured to determine an in-image-pyramid position as a height position in the image pyramid to which template image data having an image portion of a target object at a fixed scale corresponds; an upper-layer-side likelihood calculating section configured to determine a likelihood for a target object by matching between upper-side layer image data directly above the in-image-pyramid position, and a state prediction; a lower-layer-side likelihood calculating section configured to determine a likelihood for a target object by matching between lower-side layer image data directly below the in-image-pyramid position, and a state prediction; and a true likelihood calculating section configured to determine a true likelihood from the likelihoods determined by the upper
    Type: Grant
    Filed: November 6, 2009
    Date of Patent: April 17, 2012
    Assignee: Sony Corporation
    Inventors: Yuyu Liu, Keisuke Yamaoka, Takayuki Yoshigahara, Xi Chen
  • Patent number: 8160866
    Abstract: The present invention can recognize both English and Chinese at the same time. The most important skill is that the features of all English words (without samples) are entirely extracted from the features of Chinese syllables. The invention normalizes the signal waveforms of variable lengths for English words (Chinese syllables) such that the same words (syllables) can have the same features at the same time position. Hence the Bayesian classifier can recognize both the fast and slow utterance of sentences. The invention can improve the feature such that the speech recognition of the unknown English (Chinese) is guaranteed to be correct. Furthermore, since the invention can create the features of English words from the features of Chinese syllables, it can also create the features of other languages from the features of Chinese syllables and hence it can also recognize other languages, such as German, French, Japanese, Korean, Russian, etc.
    Type: Grant
    Filed: October 10, 2008
    Date of Patent: April 17, 2012
    Inventors: Tze Fen Li, Tai-Jan Lee Li, Shih-Tzung Li, Shih-Hon Li, Li-Chuan Liao
  • Patent number: 8145484
    Abstract: The described implementations relate to speech spelling by a user. One method identifies one or more symbols that may match a user utterance and displays an individual symbol for confirmation by the user.
    Type: Grant
    Filed: November 11, 2008
    Date of Patent: March 27, 2012
    Assignee: Microsoft Corporation
    Inventor: Geoffrey Zweig
  • Patent number: 8140330
    Abstract: Embodiments of a method and system for detecting repeated patterns in dialog systems are described. The system includes a dynamic time warping (DTW) based pattern comparison algorithm that is used to find the best matching parts between a correction utterance and an original utterance. Reference patterns are generated from the correction utterance by an unsupervised segmentation scheme. No significant information about the position of the repeated parts in the correction utterance is assumed, as each reference pattern is compared with the original utterance from the beginning of the utterance to the end. A pattern comparison process with DTW is executed without knowledge of fixed end-points. A recursive DTW computation is executed to find the best matching parts that are considered as the repeated parts as well as the end-points of the utterance.
    Type: Grant
    Filed: June 13, 2008
    Date of Patent: March 20, 2012
    Assignee: Robert Bosch GmbH
    Inventors: Mert Cevik, Fuliang Weng
  • Patent number: 8140334
    Abstract: An apparatus and method for recognizing voice.
    Type: Grant
    Filed: June 28, 2006
    Date of Patent: March 20, 2012
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Sang-bae Jeong, Nam-hoon Kim, Jeong-su Kim, In-jeong Choi, Ick-sang Han
  • Patent number: 8073262
    Abstract: In an image matching apparatus of the present invention, only a connected region in which the number of pixels included therein exceeds a threshold value, among connected regions that are specified by a labeling process section, is sent to a centroid calculation process section from a threshold value processing section, and a centroid (feature point) of the connected region is calculated. When it is determined that a target document to be matched is an N-up document, the threshold value processing section uses, instead of a default threshold value, a variant threshold value that varies depending on the number of images laid out on the N-up document and a document size that are found and detected by an N-up document determination section and a document size detection section. This makes it possible to determine a similarity to a reference document with high accuracy even in a case of an N-up document, i.e., a case where each target image to be matched is reduced in size from an original image.
    Type: Grant
    Filed: September 8, 2008
    Date of Patent: December 6, 2011
    Assignee: Sharp Kabushiki Kaisha
    Inventor: Hitoshi Hirohata
  • Patent number: 8050918
    Abstract: A method and system for evaluating the quality of voice input recognition by a voice portal is provided. An analysis interface extracts a set of current grammars from the voice portal. A test pattern generator generates a test input for each current grammar. The test input includes a test pattern and a set of active grammars corresponding to each current grammar. The system further includes a text-to-speech engine for entering each test pattern into the voice server. A results collector analyzes each test pattern entered into the voice server with the speech recognition engine against the set of active grammars corresponding to the current grammar for said test pattern. A results analyzer derives a set of statistics of a quality of recognition of each current grammar.
    Type: Grant
    Filed: December 11, 2003
    Date of Patent: November 1, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Reza Ghasemi, Walter Haenel
  • Patent number: 8050505
    Abstract: In a card serving as an external storage device to be inserted into a digital color multi-function printer, features of a reference image and processing rule information indicating processing content to be applied to input image data judged to be similar to the reference image are prestored. Then, in cases where the input image data is judged to be similar to the reference image, the content of a process to be performed on the input image data is controlled in accordance with the processing rule information corresponding to the reference image. This makes it possible to save users the trouble of setting the content of a process to be performed on input image data, and to prevent a shortage of memory capacity of a memory of the image processing apparatus even in cases where there is an increase in the number of reference images.
    Type: Grant
    Filed: July 30, 2008
    Date of Patent: November 1, 2011
    Assignee: Sharp Kabushiki Kaisha
    Inventors: Hitoshi Hirohata, Masakazu Ohira
  • Patent number: 8041570
    Abstract: Representation-neutral dialogue systems and methods (“RNDS”) are described that include multi-application, multi-device spoken-language dialogue systems based on the information-state update approach. The RNDS includes representation-neutral core components of a dialogue system that provide scripted domain-specific extensions to routines such as dialogue move modeling and reference resolution, easy substitution of specific semantic representations and associated routines, and clean interfaces to external components for language-understanding (i.e., speech-recognition and parsing) and language-generation, and to domain-specific knowledge sources. The RNDS also allows seamless interaction with a community of devices.
    Type: Grant
    Filed: May 31, 2005
    Date of Patent: October 18, 2011
    Assignee: Robert Bosch Corporation
    Inventors: Danilo Mirkovic, Lawrence Cavedon
  • Publication number: 20110238409
    Abstract: Semantic clustering techniques are described. In various implementations, a conversational agent is configured to perform semantic clustering of a corpus of user utterances. Semantic clustering may be used to provide a variety of functionality, such as to group a corpus of utterances into semantic clusters in which each cluster pertains to a similar topic. These clusters may then be leveraged to identify topics and assess their relative importance, as for example to prioritize topics whose handling by the conversation agent should be improved. A variety of utterances may be processed using these techniques, such as spoken words, textual descriptions entered via live chat, instant messaging, a website interface, email, SMS, a social network, a blogging or micro-blogging interface, and so on.
    Type: Application
    Filed: March 26, 2010
    Publication date: September 29, 2011
    Inventors: Jean-Marie Henri Daniel Larcheveque, Elizabeth Ireland Powers, Freya Kate Recksiek, Dan Teodosiu
  • Patent number: 8024187
    Abstract: A pulse allocating method capable of coding stereophonic voice signals efficiently. In the fixed code note retrievals of this pulse allocating method, for individual subframes, the stereophonic voice signals are compared to judge similarity between channels, and are judged on their characteristics. On the basis of the similarity between the channels and the characteristics of the stereophonic signals, the pulse numbers to be allocated to the individual channels are determined. Pulse retrievals are executed to determine the pulse positions for the individual channels, so that the pulses determined are coded.
    Type: Grant
    Filed: February 9, 2006
    Date of Patent: September 20, 2011
    Assignee: Panasonic Corporation
    Inventors: Chun Woei Teo, Sua Hong Neo, Koji Yoshida, Michiyo Goto
  • Publication number: 20110202341
    Abstract: A device receives a voice recognition statistic from a voice recognition application and applies a grammar improvement rule based on the voice recognition statistic. The device also automatically adjusts a weight of the voice recognition statistic based on the grammar improvement rule, and outputs the weight adjusted voice recognition statistic for use in the voice recognition application.
    Type: Application
    Filed: April 29, 2011
    Publication date: August 18, 2011
    Applicant: VERIZON PATENT AND LICENSING INC.
    Inventor: Kevin W. BROWN
  • Patent number: 7962338
    Abstract: When the degree of similarity of the recognition candidates is greater than the second threshold value, the speech verification unit outputs the recognition candidates as a recognition result, and when the degree of similarity of the recognition candidates is smaller than the second threshold value, it outputs the recognition candidates as a recognition result if the degree of similarity of the recognition candidates is greater than the first threshold value and, at the same time, the degree of similarity of the recognition candidates is greater than the degree of similarity of the rejection candidates. It should be noted that the first threshold value is a measure used for rejecting input speech. The second threshold value is larger than the first threshold value and is used as a measure for outputting recognition candidates as a recognition result.
    Type: Grant
    Filed: November 13, 2007
    Date of Patent: June 14, 2011
    Assignee: Fujitsu Limited
    Inventor: Shouji Harada
  • Publication number: 20110071828
    Abstract: A speech discriminability assessment system includes: a biological signal measurement section for measuring an electroencephalogram signal of a user; a presented-speech sound control section for determining a speech sound to be presented to the user by referring to a speech sound database retaining a plurality of monosyllabic sound data; an audio presentation section for presenting an audio associated with the determined speech sound to the user; a character presentation section for presenting a character associated with the determined speech sound to the user, subsequent to the presentation of the audio by the audio presentation section; an unexpectedness detection section for detecting presence or absence of an unexpectedness signal from the measured electroencephalogram signal of the user, the unexpectedness signal representing a positive component at 600 ms±100 ms after a time point when the character was presented to the user; and a speech sound discriminability determination section for determining a sp
    Type: Application
    Filed: December 3, 2010
    Publication date: March 24, 2011
    Inventors: Shinobu ADACHI, Koji Morikawa
  • Patent number: 7912720
    Abstract: A system, method and computer-readable medium for practicing a method of emotion detection during a natural language dialog between a human and a computing device are disclosed. The method includes receiving an utterance from a user in a natural language dialog between a human and a computing device, receiving contextual information regarding the natural language dialog which is related to changes of emotion over time in the dialog, and detecting an emotion of the user based on the received contextual information. Examples of contextual information include, for example, differential statistics, joint statistics and distance statistics.
    Type: Grant
    Filed: July 20, 2005
    Date of Patent: March 22, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Dilek Z. Hakkani-Tur, Jackson J. Liscombe, Guiseppe Riccardi
  • Patent number: 7904297
    Abstract: Representation-neutral dialogue systems and methods (“RNDS”) are described that include multi-application, multi-device spoken-language dialogue systems based on the information-state update approach. The RNDS includes representation-neutral core components of a dialogue system that provide scripted domain-specific extensions to routines such as dialogue move modeling and reference resolution, easy substitution of specific semantic representations and associated routines, and clean interfaces to external components for language-understanding (i.e., speech-recognition and parsing) and language-generation, and to domain-specific knowledge sources. The RNDS also resolves multi-device dialogue by evaluating and selecting among candidate dialogue moves based on features at multiple levels. Multiple sources of information are combined, multiple speech recognition and parsing hypotheses tested, and multiple device and moves considered to choose the highest scoring hypothesis overall.
    Type: Grant
    Filed: December 8, 2005
    Date of Patent: March 8, 2011
    Assignee: Robert Bosch GmbH
    Inventors: Danilo Mirkovic, Lawrence Cavedon, Matthew Purver, Florin Ratiu, Tobias Scheideck, Fuliang Weng, Qi Zhang, Kui Xu
  • Publication number: 20110035219
    Abstract: A language identification system that includes a universal phoneme decoder (UPD) is described. The UPD contains a universal phoneme set representing both 1) all phonemes occurring in the set of two or more spoken languages, and 2) captures phoneme correspondences across languages, such that a set of unique phoneme patterns and probabilities are calculated in order to identify a most likely phoneme occurring each time in the audio files in the set of two or more potential languages in which the UPD was trained on. Each statistical language models (SLM) uses the set of unique phoneme patterns created for each language in the set to distinguish between spoken human languages in the set of languages. The run-time language identifier module identifies a particular human language being spoken by utilizing the linguistic probabilities supplied by the one or more SLMs that are based on the set of unique phoneme patterns created for each language.
    Type: Application
    Filed: August 4, 2009
    Publication date: February 10, 2011
    Applicant: AUTONOMY CORPORATION LTD.
    Inventors: Mahapathy Kadirkamanathan, Christopher John Waple
  • Patent number: 7860714
    Abstract: The present invention is a detection system of a segment including specific sound signal which detects a segment in a stored sound signal similar to a reference sound signal, including: a reference signal spectrogram division portion which divides a reference signal spectrogram into spectrograms of small-regions; a small-region reference signal spectrogram coding portion which encodes the small-region reference signal spectrogram to a reference signal small-region code; a small-region stored signal spectrogram coding portion which encodes a small-region stored signal spectrogram to a stored signal small-region code; a similar small-region spectrogram detection portion which detects a small-region spectrogram similar to the small-region reference signal spectrograms based on a degree of similarity of a code; and a degree of segment similarity calculation portion which uses a degree of small-region similarity and calculates a degree of similarity between the segment of the stored signal and the reference signal
    Type: Grant
    Filed: July 1, 2005
    Date of Patent: December 28, 2010
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Hidehisa Nagano, Takayuki Kurozumi, Kunio Kashino
  • Patent number: 7842873
    Abstract: A system and method for detecting a refrain in an audio file having vocal components. The method and system includes generating a phonetic transcription of a portion of the audio file, analyzing the phonetic transcription and identifying a vocal segment in the generated phonetic transcription that is repeated frequently. The method and system further relate to the speech-driven selection based on similarity of detected refrain and user input.
    Type: Grant
    Filed: February 12, 2007
    Date of Patent: November 30, 2010
    Assignee: Harman Becker Automotive Systems GmbH
    Inventors: Franz S. Gerl, Daniel Willett, Raymond Brueckner
  • Patent number: 7822604
    Abstract: One embodiment of the present method and apparatus for identifying a conversing pair of users of a two-way speech medium includes receiving a plurality of binary voice activity streams, where the plurality of voice activity streams includes a first voice activity stream associated with a first user, and pairing the first voice activity stream with a second voice activity stream associated with a second user, in accordance with a complementary similarity between the first voice activity stream and the second voice activity stream.
    Type: Grant
    Filed: October 31, 2006
    Date of Patent: October 26, 2010
    Assignee: International Business Machines Corporation
    Inventors: Lisa Amini, Eric Bouillet, Olivier Verscheure, Michail Vlachos
  • Patent number: 7813925
    Abstract: When adjacent times or the small change of an observation signal is determined, a distribution which maximizes the output probability of a mixture distribution does not change at a high possibility. By using this fact, when obtaining the output probability of the mixture distribution HMM, a distribution serving as a maximum output probability is stored. When adjacent times or the small change of the observation signal is determined, the output probability of the stored distribution serves as the output probability of the mixture distribution. This can reduce the output probability calculation of other distributions when calculating the output probability of the mixture distribution, thereby reducing the calculation amount required for output probabilities.
    Type: Grant
    Filed: April 6, 2006
    Date of Patent: October 12, 2010
    Assignee: Canon Kabushiki Kaisha
    Inventors: Hiroki Yamamoto, Masayuki Yamada
  • Patent number: 7813928
    Abstract: A speech recognition device presenting whether a user's utterance is an unregistered word and whether the utterance should be repeated. The device includes a vocabulary storage unit (102) defining a vocabulary for speech recognition, and a speech recognition unit (101) checking the uttered speech against registered words. The device also includes a similarity calculation unit (103) calculating a similarity between the uttered speech and acoustic units, a judgment unit (104) judging, based on the check by the speech recognition unit (101) and the calculation performed by the similarity calculation unit (103), whether the uttered speech is a registered or unregistered word, an unregistered word unit (106) storing unregistered words, an unregistered word candidate search unit (105) searching the unregistered word unit (106) for unregistered word candidates, the, when the judgment unit (104) judges that the uttered speech is an unregistered word, and a display unit (107) displaying the result.
    Type: Grant
    Filed: June 2, 2005
    Date of Patent: October 12, 2010
    Assignee: Panasonic Corporation
    Inventors: Yoshiyuki Okimoto, Tsuyoshi Inoue, Takashi Tsuzuki
  • Patent number: 7809579
    Abstract: Polyphonic signals are used to create a main signal, typically a mono signal, and a side signal. A number of encoding schemes for the side signal are provided. Each encoding scheme is characterized by a set of sub-frames of different lengths. The total length of the sub-frames corresponds to the length of the encoding frame of the encoding scheme. The encoding scheme to be used on the side signal is selected dependent on the present signal content of the polyphonic signals. In a preferred embodiment, a side residual signal is created as the difference between the side signal and the main signal scaled with a balance factor. The balance factor is selected to minimize the side residual signal. The optimized side residual signal and the balance factor are encoded and provided as encoding parameters representing the side signal.
    Type: Grant
    Filed: December 15, 2004
    Date of Patent: October 5, 2010
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventors: Stefan Bruhn, Ingemar Johansson, Anisse Taleb, Daniel Enström
  • Patent number: 7752044
    Abstract: To increase the robustness and/or the recognition rate of methods for recognizing speech it is proposed to include phone boundary verification measure features in the process of obtaining and/or generating confidence measures obtained recognition results.
    Type: Grant
    Filed: October 10, 2003
    Date of Patent: July 6, 2010
    Assignee: Sony Deutschland GmbH
    Inventors: Yin Hay Lam, Ralf Kompe
  • Patent number: 7739110
    Abstract: A method and an apparatus for multimedia data management are disclosed. The method provides an indexing and retrieval scheme for digital photos with speech annotations based on image-like patterns transformed from the recognized syllable candidates. For annotated spoken content, the recognized n-best syllable candidates are transformed into a sequence of syllable-transformed patterns. Eigen-image analysis is further adopted to extract the significant information to reduce the dimensionality. Vector quantization is applied to quantize the syllable-transformed patterns into feature vectors for indexing. The invention of indexing scheme reduces the dimensionality and noise of data, and achieves better performance of 16.26% for speech annotated photo retrieval.
    Type: Grant
    Filed: December 1, 2006
    Date of Patent: June 15, 2010
    Assignee: Industrial Technology Research Institute
    Inventors: Chung-Hsien Wu, Yu-Sheng Lai, Chien-Lin Huang, Chia-Hao Kang
  • Patent number: 7738635
    Abstract: A method for improving the recognition confidence of alphanumeric spoken input, suitable for use in a speech recognition telephony application such as a voice response system. An alphanumeric candidate is determined from the spoken input, which may be the best available representation of the spoken input. Recognition confidence is compared with a preestablished threshold. If the recognition confidence exceeds the threshold, the alphanumeric candidate is selected to represent the spoken input. Otherwise, present call data associated with the spoken input is determined. Call data may include automatic number identification (ANI) information, caller-ID information, and/or dialed number information service (DNIS) information. Information associated with the alphanumeric candidate and information associated with the present call data are correlated in order to select alphanumeric information that best represents the spoken input.
    Type: Grant
    Filed: January 6, 2005
    Date of Patent: June 15, 2010
    Assignee: International Business Machines Corporation Nuance Communications, Inc.
    Inventors: Christopher Ryan Groves, Kevin James Muterspaugh
  • Publication number: 20100121639
    Abstract: The described implementations relate to speech spelling by a user. One method identifies one or more symbols that may match a user utterance and displays an individual symbol for confirmation by the user.
    Type: Application
    Filed: November 11, 2008
    Publication date: May 13, 2010
    Applicant: Microsoft Corporation
    Inventor: Geoffrey Zweig
  • Patent number: 7698138
    Abstract: A broadcast receiving system includes a broadcast receiving part for receiving a broadcast in which additional information that corresponds to an object appearing in broadcast contents and that contains keyword information for specifying the object is broadcasted simultaneously with the broadcast contents; a recognition vocabulary generating section for generating a recognition vocabulary set in a manner corresponding to the additional information by using a synonym dictionary; a speech recognition section for performing the speech recognition of a voice uttered by a viewing person, and for thereby specifying keyword information corresponding to a recognition vocabulary set when a word recognized as the speech recognition result is contained in the recognition vocabulary set; and a displaying section for displaying additional information corresponding to the specified keyword information.
    Type: Grant
    Filed: December 26, 2003
    Date of Patent: April 13, 2010
    Assignee: Panasonic Corporation
    Inventors: Yumiko Kato, Takahiro Kamai, Hideyuki Yoshida, Yoshifumi Hirose
  • Patent number: 7684986
    Abstract: An apparatus, medium, and method recognizing speech. The method may include the calculating of scores indicating the degree of similarity between a characteristic of an input speech and characteristics of speech models based on the degree of similarity between the length of each phoneme included in an input speech and the length of phonemes included in each speech model, and determining a speech model with the highest score among the scores to be the corresponding recognized speech for the input speech. By doing so, the speech recognition rate may be greatly enhanced and when an input speech includes continuous identical phonemes the word error rate (WER) may be greatly reduced.
    Type: Grant
    Filed: December 23, 2005
    Date of Patent: March 23, 2010
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Icksang Han, Sangbae Jeong, Jeongsu Kim
  • Patent number: 7680663
    Abstract: A hidden dynamics value in speech is represented by a higher order, discretized dynamic model, which predicts the discretized dynamic variable that changes over time. Parameters are trained for the model. A decoder algorithm is developed for estimating the underlying phonological speech units in sequence that correspond to the observed speech signal using the higher order, discretized dynamic model.
    Type: Grant
    Filed: August 21, 2006
    Date of Patent: March 16, 2010
    Assignee: Micrsoft Corporation
    Inventor: Li Deng
  • Patent number: 7634405
    Abstract: The subject invention leverages spectral “palettes” or representations of an input sequence to provide recognition and/or synthesizing of a class of data. The class can include, but is not limited to, individual events, distributions of events, and/or environments relating to the input sequence. The representations are compressed versions of the data that utilize a substantially smaller amount of system resources to store and/or manipulate. Segments of the palettes are employed to facilitate in reconstruction of an event occurring in the input sequence. This provides an efficient means to recognize events, even when they occur in complex environments. The palettes themselves are constructed or “trained” utilizing any number of data compression techniques such as, for example, epitomes, vector quantization, and/or Huffman codes and the like.
    Type: Grant
    Filed: January 24, 2005
    Date of Patent: December 15, 2009
    Assignee: Microsoft Corporation
    Inventors: Sumit Basu, Nebojsa Jojic, Ashish Kapoor
  • Publication number: 20090271195
    Abstract: A speech recognition apparatus capable of attaining high recognition accuracy within practical processing time using a computing machine having standard performance by appropriately adapting a language model to a speech about a certain topic, irrespectively of a degree of detail and diversity of the topic and irrespectively of a confidence score of an initial speech recognition result is provided.
    Type: Application
    Filed: July 6, 2007
    Publication date: October 29, 2009
    Applicant: NEC Corporation
    Inventors: Tasuku Kitade, Takafumi Koshinaka
  • Publication number: 20090259465
    Abstract: A method and system for training an automatic speech recognition system are provided. The method includes separating training data into speaker specific segments, and for each speaker specific segment, performing the following acts: generating spectral data, selecting a first warping factor and warping the spectral data, and comparing the warped spectral data with a speech model. The method also includes iteratively performing the steps of selecting another warping factor and generating another warped spectral data, comparing the other warped spectral data with the speech model, and if the other warping factor produces a closer match to the speech model, saving the other warping factor as the best warping factor for the speaker specific segment. The system includes modules configured to control a processor in the system to perform the steps of the method.
    Type: Application
    Filed: June 24, 2009
    Publication date: October 15, 2009
    Applicant: AT&T Corp.
    Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
  • Publication number: 20090240498
    Abstract: Systems and methods to perform short text segment similarity measures. Illustratively, a short text segment similarity environment comprises a short text engine operative to process data representative of short segments of text and an instruction set comprising at least one instruction to instruct the short text engine to process data representative of short text segment inputs according to a selected short text similarity identification paradigm. Illustratively, two or more short text segments can be received as input by the short text engine and a request to identify similarities among the two or more short text segments. Responsive to the request and data input, the short text engine executes a selected similarity identification technique in accordance with the sort text similarity identification paradigm to process the received data and to identify similarities between the short text segment inputs.
    Type: Application
    Filed: March 19, 2008
    Publication date: September 24, 2009
    Applicant: Microsoft Corporation
    Inventors: Wen-tau Yih, Alexei V. Bocharov, Christopher A. Meek
  • Patent number: 7590537
    Abstract: A speech recognition method and apparatus perform speaker clustering and speaker adaptation using average model variation information over speakers while analyzing the quantity variation amount and the directional variation amount. In the speaker clustering method, a speaker group model variation is generated based on the model variation between a speaker-independent model and a training speaker ML model. In the speaker adaptation method, the model in which the model variation between a test speaker ML model and a speaker group ML model to which the test speaker belongs which is most similar to a training speaker group model variation is found, and speaker adaptation is performed on the found model. Herein, the model variation in the speaker clustering and the speaker adaptation are calculated while analyzing both the quantity variation amount and the directional variation amount. The present invention may be applied to any speaker adaptation algorithm of MLLR and MAP.
    Type: Grant
    Filed: December 27, 2004
    Date of Patent: September 15, 2009
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Namhoon Kim, Injeong Choi, Yoonkyung Song
  • Patent number: 7584099
    Abstract: A method, a system and a computer program product for interpreting a verbal input in a multimodal dialog system are provided. The method includes assigning (302) a confidence value to at least one word generated by a verbal recognition component. The method further includes generating (304) a semantic unit confidence score for the verbal input. The generation of a semantic unit confidence score is based on the confidence value of at least one word and at least one semantic confidence operator.
    Type: Grant
    Filed: April 6, 2005
    Date of Patent: September 1, 2009
    Assignee: Motorola, Inc.
    Inventors: Changxue C. Ma, Harry M. Bliss, Yan M. Cheng
  • Patent number: 7571093
    Abstract: A method of identifying duplicate voice recording by receiving digital voice recordings, selecting one of the recordings; segmenting the selected recording, extracting a pitch value per segment, estimating a total time that voice appears in the recording, removing pitch values that are less than and equal to a user-definable value, identifying unique pitch values, determining the frequency of occurrence of the unique pitch values, normalizing the frequencies of occurrence, determining an average pitch value, determining the distribution percentiles of the frequencies of occurrence, returning to the second step if additional recordings are to be processed, otherwise comparing the total voice time, average pitch value, and distribution percentiles for each recording processed, and declaring the recordings duplicates that compared to within a user-definable threshold for total voice time, average pitch value, and distribution percentiles.
    Type: Grant
    Filed: August 17, 2006
    Date of Patent: August 4, 2009
    Assignee: The United States of America as represented by the Director, National Security Agency
    Inventor: Adolf Cusmariu