Similarity Patents (Class 704/239)
  • Patent number: 8731922
    Abstract: A method of accessing a dial-up service is disclosed. An example method of providing access to a service includes receiving a first speech signal from a user to form a first utterance; recognizing the first utterance using speaker independent speaker recognition; requesting the user to enter a personal identification number; and when the personal identification number is valid, receiving a second speech signal to form a second utterance and providing access to the service.
    Type: Grant
    Filed: April 30, 2013
    Date of Patent: May 20, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Robert Wesley Bossemeyer, Jr.
  • Patent number: 8731207
    Abstract: An embodiment of an apparatus for computing control information for a suppression filter for filtering a second audio signal to suppress an echo based on a first audio signal includes a computer having a value determiner for determining at least one energy-related value for a band-pass signal of at least two temporally successive data blocks of at least one signal of a group of signals. The computer further includes a mean value determiner for determining at least one mean value of the at least one determined energy-related value for the band-pass signal. The computer further includes a modifier for modifying the at least one energy-related value for the band-pass signal on the basis of the determined mean value for the band-pass signal. The computer further includes a control information computer for computing the control information for the suppression filter on the basis of the at least one modified energy-related value.
    Type: Grant
    Filed: January 12, 2009
    Date of Patent: May 20, 2014
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung e.V.
    Inventors: Fabian Kuech, Markus Kallinger, Christof Faller, Alexis Favrot
  • Patent number: 8725829
    Abstract: A method and system is described which allows users to identify (pre-recorded) sounds such as music, radio broadcast, commercials, and other audio signals in almost any environment. The audio signal (or sound) must be a recording represented in a database of recordings. The service can quickly identify the signal from just a few seconds of excerption, while tolerating high noise and distortion. Once the signal is identified to the user, the user may perform transactions interactively in real-time or offline using the identification information.
    Type: Grant
    Filed: April 26, 2004
    Date of Patent: May 13, 2014
    Assignee: Shazam Investments Limited
    Inventors: Avery Li-Chun Wang, Christopher Jacques Penrose Barton, Dheeraj Shankar Mukherjee, Philip Inghelbrecht
  • Patent number: 8688451
    Abstract: A speech recognition method includes receiving input speech from a user, processing the input speech using a first grammar to obtain parameter values of a first N-best list of vocabulary, comparing a parameter value of a top result of the first N-best list to a threshold value, and if the compared parameter value is below the threshold value, then additionally processing the input speech using a second grammar to obtain parameter values of a second N-best list of vocabulary. Other preferred steps include: determining the input speech to be in-vocabulary if any of the results of the first N-best list is also present within the second N-best list, but out-of-vocabulary if none of the results of the first N-best list is within the second N-best list; and providing audible feedback to the user if the input speech is determined to be out-of-vocabulary.
    Type: Grant
    Filed: May 11, 2006
    Date of Patent: April 1, 2014
    Assignee: General Motors LLC
    Inventors: Timothy J. Grost, Rathinavelu Chengalvarayan
  • Patent number: 8682660
    Abstract: A system and a method to correct semantic interpretation recognition errors presented in this invention applies to Automatic Speech Recognition systems returning recognition results with semantic interpretations. The method finds the most likely intended semantic interpretation given the recognized sequence of words and the recognized semantic interpretation. The key point is the computation of the conditional probability of the recognized sequence of words given the recognized semantic interpretation and a particular intended semantic interpretation. It is done with the use of Conditional Language Models which are Statistical Language Models trained on a corpus of utterances collected under the condition of a particular recognized semantic interpretation and a particular intended semantic interpretation. Based on these conditional probabilities and the joint probabilities of the recognized and intended semantic interpretations, new semantic interpretation confidences are computed.
    Type: Grant
    Filed: May 16, 2009
    Date of Patent: March 25, 2014
    Assignee: Resolvity, Inc.
    Inventors: Yevgeniy Lyudovyk, Jacek Jarmulak
  • Patent number: 8676580
    Abstract: A method, an apparatus and an article of manufacture for automatic speech recognition. The method includes obtaining at least one language model word and at least one rule-based grammar word, determining an acoustic similarity of at least one pair of language model word and rule-based grammar word, and increasing a transition cost to the at least one language model word based on the acoustic similarity of the at least one language model word with the at least one rule-based grammar word to generate a modified language model for automatic speech recognition.
    Type: Grant
    Filed: August 16, 2011
    Date of Patent: March 18, 2014
    Assignee: International Business Machines Corporation
    Inventors: Om D. Deshmukh, Etienne Marcheret, Shajith I. Mohamed, Ashish Verma, Karthik Visweswariah
  • Patent number: 8660844
    Abstract: Systems, methods and computer-readable media associated with using a divergence metric to evaluate user simulations in a spoken dialog system. The method employs user simulations of a spoken dialog system and includes aggregating a first set of one or more scores from a real user dialog, aggregating a second set of one or more scores from a simulated user dialog associated with a user model, determining a similarity of distributions associated with each of the first set and the second set, wherein the similarity is determined using a divergence metric that does not require any assumptions regarding a shape of the distributions. It is preferable to use a Cramér-von Mises divergence.
    Type: Grant
    Filed: November 1, 2007
    Date of Patent: February 25, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Jason Williams
  • Patent number: 8655655
    Abstract: A sound event detecting module for detecting whether a sound event with characteristic of repeating is generated. A sound end recognizing unit recognizes ends of sounds according to a sound signal to generate sound sections and multiple sets of feature vectors of the sound sections correspondingly. A storage unit stores at least M sets of feature vectors. A similarity comparing unit compares the at least M sets of feature vectors with each other, and correspondingly generates a similarity score matrix, which stores similarity scores of any two of the sound sections of the at least M of the sound sections. A correlation arbitrating unit determines the number of sound sections with high correlations to each other according to the similarity score matrix. When the number is greater than one threshold value, the correlation arbitrating unit indicates that the sound event with the characteristic of repeating is generated.
    Type: Grant
    Filed: December 30, 2010
    Date of Patent: February 18, 2014
    Assignee: Industrial Technology Research Institute
    Inventors: Yuh-Ching Wang, Kuo-Yuan Li
  • Publication number: 20140012575
    Abstract: In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated to determine whether a meaning of any of the alternative recognition results differs from a meaning of the top recognition result in a manner that is significant for a domain, such as the medical domain. In some embodiments, words and/or phrases that may be confused by an ASR system may be determined and associated in sets of words and/or phrases. Words and/or phrases that may be determined include those that change a meaning of a phrase or sentence when included in the phrase/sentence.
    Type: Application
    Filed: July 9, 2012
    Publication date: January 9, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: William F. Ganong, III, Raghu Vemula, Robert Fleming
  • Patent number: 8620655
    Abstract: A speech processing method, comprising: receiving a speech input which comprises a sequence of feature vectors; determining the likelihood of a sequence of words arising from the sequence of feature vectors using an acoustic model and a language model, comprising: providing an acoustic model for performing speech recognition on an input signal which comprises a sequence of feature vectors, said model having a plurality of model parameters relating to the probability distribution of a word or part thereof being related to a feature vector, wherein said speech input is a mismatched speech input which is received from a speaker in an environment which is not matched to the speaker or environment under which the acoustic model was trained; and adapting the acoustic model to the mismatched speech input, the speech processing method further comprising determining the likelihood of a sequence of features occurring in a given language using a language model; and combining the likelihoods determined by the acoustic
    Type: Grant
    Filed: August 10, 2011
    Date of Patent: December 31, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Haitian Xu, Kean Kheong Chin, Mark John Francis Gales
  • Patent number: 8606580
    Abstract: To provide a data process unit and data process unit control program that are suitable for generating acoustic models for unspecified speakers taking distribution of diversifying feature parameters into consideration under such specific conditions as the type of speaker, speech lexicons, speech styles, and speech environment and that are suitable for providing acoustic models intended for unspecified speakers and adapted to speech of a specific person. The data process unit comprises a data classification section, data storing section, pattern model generating section, data control section, mathematical distance calculating section, pattern model converting section, pattern model display section, region dividing section, division changing section, region selecting section, and specific pattern model generating section.
    Type: Grant
    Filed: December 30, 2008
    Date of Patent: December 10, 2013
    Assignee: Asahi Kasei Kabushiki Kaisha
    Inventors: Makoto Shozakai, Goshu Nagino
  • Publication number: 20130325470
    Abstract: A system and method for identification of a speaker by phonograms of oral speech is disclosed. Similarity between a first phonogram of the speaker and a second, or sample, phonogram is evaluated by matching formant frequencies in referential utterances of a speech signal, where the utterances for comparison are selected from the first phonogram and the second phonogram. Referential utterances of speech signals are selected from the first phonogram and the second phonogram, where the referential utterances include formant paths of at least three formant frequencies. The selected referential utterances including at least two identical formant frequencies are compared therebetween. Similarity of the compared referential utterances from matching other formant frequencies is evaluated, where similarity of the phonograms is determined from evaluation of similarity of all the compared referential utterances.
    Type: Application
    Filed: July 31, 2013
    Publication date: December 5, 2013
    Applicant: Obschestvo s ogranichennoi otvetstvennost'yu "Centr Rechevyh Tehnologij"
    Inventor: Sergey Lvovich Koval
  • Publication number: 20130325469
    Abstract: A method for providing a voice recognition function and an electronic device thereof are provided. The method provides a voice recognition function in an electronic device that includes outputting, when a voice instruction is input, a list of prediction instructions that are candidate instructions similar to the input voice instruction, updating, when a correction instruction correcting the output candidate instructions is input, the list of prediction instructions, and performing, if the correction instruction matches with an instruction of high similarity in the updated list of prediction instructions, a voice recognition function corresponding to the voice instruction.
    Type: Application
    Filed: May 24, 2013
    Publication date: December 5, 2013
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Hee-Woon KIM, Yu-Mi AHN, Seon-Hwa KIM, Ha-Young JEON
  • Patent number: 8600747
    Abstract: A spoken dialog system and method having a dialog management module are disclosed. The dialog management module includes a plurality of dialog motivators for handling various operations during a spoken dialog. The dialog motivators comprise an error handling, disambiguation, assumption, confirmation, missing information, and continuation. The spoken dialog system uses the assumption dialog motivator in either a-priori or a-posteriori modes. A-priori assumption is based on predefined requirements for the call flow and a-posteriori assumption can work with the confirmation dialog motivator to assume the content of received user input and confirm received user input.
    Type: Grant
    Filed: June 17, 2008
    Date of Patent: December 3, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alicia Abella, Allen Louis Gorin
  • Publication number: 20130304469
    Abstract: Among multiple documents presented to a user, a high interest and a low interest document are specified, a word group in the high interest document is compared with a word group in the low interest document, and a string of word groups associated weight values is generated as a user feature vector. A word group included in each of multiple data items targeted for assigning priorities is extracted, and data feature vectors are generated specific to each data item, based on the word groups extracted. A degree of similarity between each data feature vectors of multiple data items and user feature vector is obtained, and according to the degree of similarity, priorities are assigned to the multiple data items to be presented to the user. Therefore, it is possible to extract user's feature information on which the user's interests and tastes are reflected more effectively.
    Type: Application
    Filed: April 29, 2013
    Publication date: November 14, 2013
    Inventors: Tomihisa KAMADA, Keisuke HARA
  • Publication number: 20130304470
    Abstract: An electronic device used for detecting pornographic audio contents includes a memory, a reading module, a calculating module, a comparing module, and a determining module. The memory stores multiple sample curves of pornographic audio contents. The reading module accesses audio contents from an audio/video source. The calculating module calculates a plurality of pitch curves of the audio contents. The comparing module compares the pitch curves of the audio contents with the sample curves of pornographic audio contents to gain similarities of the pitch curves and the sample curves of pornographic audio contents. The determining module determines whether the audio contents are pornographic audio contents according to the similarities.
    Type: Application
    Filed: May 12, 2013
    Publication date: November 14, 2013
    Applicant: HON HAI PRECISION INDUSTRY CO., LTD.
    Inventor: CHUN-TE WU
  • Patent number: 8583436
    Abstract: A word category estimation apparatus (100) includes a word category model (5) which is formed from a probability model having a plurality of kinds of information about a word category as features, and includes information about an entire word category graph as at least one of the features. A word category estimation unit (4) receives the word category graph of a speech recognition hypothesis to be processed, computes scores by referring to the word category model for respective arcs that form the word category graph, and outputs a word category sequence candidate based on the scores.
    Type: Grant
    Filed: December 19, 2008
    Date of Patent: November 12, 2013
    Assignee: NEC Corporation
    Inventors: Hitoshi Yamamoto, Kiyokazu Miki
  • Patent number: 8583433
    Abstract: A system and method for efficiently transcribing verbal messages to text is provided. Verbal messages are received and at least one of the verbal messages is divided into segments. Automatically recognized text is determined for each of the segments by performing speech recognition and a confidence rating is assigned to the automatically recognized text for each segment. A threshold is applied to the confidence ratings and those segments with confidence ratings that fall below the threshold are identified. The segments that fall below the threshold are assigned to one or more human agents starting with those segments that have the lowest confidence ratings. Transcription from the human agents is received for the segments assigned to that agent. The transcription is assembled with the automatically recognized text of the segments not assigned to the human agents as a text message for the at least one verbal message.
    Type: Grant
    Filed: August 6, 2012
    Date of Patent: November 12, 2013
    Assignee: Intellisist, Inc.
    Inventors: Mike O. Webb, Bruce J. Peterson, Janet S. Kaseda
  • Publication number: 20130289988
    Abstract: A post-processing speech system includes a natural language-based speech recognition system that compares a spoken utterance to a natural language vocabulary that includes words used to generate a natural language speech recognition result. A master conversation module engine compares the natural language speech recognition result to domain specific words and phrases. A voting engine selects a word or a phrase from the domain specific words and phrases that is transmitted to an application control system. The application control system transmits one or more control signals that are used to control an internal or an external device or an internal or an external process.
    Type: Application
    Filed: April 30, 2012
    Publication date: October 31, 2013
    Applicant: QNX SOFTWARE SYSTEMS LIMITED
    Inventor: Darrin Kenneth Fry
  • Patent number: 8571864
    Abstract: A system and method are described for recognizing repeated audio material within at least one media stream without prior knowledge of the nature of the repeated material. The system and method are able to create a screening database from the media stream or streams. An unknown sample audio fragment is taken from the media stream and compared against the screening database to find if there are matching fragments within the media streams by determining if the unknown sample matches any samples in the screening database.
    Type: Grant
    Filed: December 2, 2011
    Date of Patent: October 29, 2013
    Assignee: Shazam Investments Limited
    Inventors: David L. DeBusk, Darren P. Briggs, Michael Karliner, Richard W. Cheong Tang, Avery Li-Chun Wang
  • Patent number: 8566088
    Abstract: Speech recognition is performed in near-real-time and improved by exploiting events and event sequences, employing machine learning techniques including boosted classifiers, ensembles, detectors and cascades and using perceptual clusters. Speech recognition is also improved using tandem processing. An automatic punctuator injects punctuation into recognized text streams.
    Type: Grant
    Filed: November 11, 2009
    Date of Patent: October 22, 2013
    Assignee: SCTI Holdings, Inc.
    Inventors: Mark Pinson, David Pinson, Sr., Mary Flanagan, Shahrokh Makanvand
  • Publication number: 20130262106
    Abstract: A system and method for adapting a language model to a specific environment by receiving interactions captured the specific environment, generating a collection of documents from documents retrieved from external resources, detecting in the collection of documents terms related to the environment that are not included in an initial language model and adapting the initial language model to include the terms detected.
    Type: Application
    Filed: March 29, 2012
    Publication date: October 3, 2013
    Inventors: Eyal HURVITZ, Ezra Daya, Oren Pereg, Moshe Wasserblat
  • Patent number: 8543399
    Abstract: An apparatus for speech recognition includes: a first confidence score calculator calculating a first confidence score using a ratio between a likelihood of a keyword model for feature vectors per frame of a speech signal and a likelihood of a Filler model for the feature vectors; a second confidence score calculator calculating a second confidence score by comparing a Gaussian distribution trace of the keyword model per frame of the speech signal with a Gaussian distribution trace sample of a stored corresponding keyword of the keyword model; and a determination module determining a confidence of a result using the keyword model in accordance with a position determined by the first and second confidence scores on a confidence coordinate system.
    Type: Grant
    Filed: September 8, 2006
    Date of Patent: September 24, 2013
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jae-hoon Jeong, Sang-bae Jeong, Jeong-su Kim, Nam-hoon Kim
  • Patent number: 8504364
    Abstract: Differential dynamic content delivery including providing a session document for a presentation, wherein the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming presentation speech to the user including individual speech from at least one user participating in the presentation; converting the presentation speech to text; detecting whether the presentation speech contains simultaneous individual speech from two or more users; and displaying the text if the presentation speech contains simultaneous individual speech from two or more users.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: August 6, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: William K. Bodin, Michael John Burkhart, Daniel G. Eisenhauer, Thomas James Watson, Daniel Mark Schumacher
  • Patent number: 8494668
    Abstract: Character value of a sound signal is extracted for each unit portion, and degrees of similarity between the character values of the individual unit portions are calculated and arranged in a matrix configuration. The matrix has arranged in each column the degrees of similarity acquired by comparing, for each of the unit portions, the sound signal and a delayed sound signal obtained by delaying the sound signal by a time difference equal to an integral multiple of a time length of the unit portion, and it has a plurality of the columns in association with different time differences. Repetition probability is calculated for each of the columns corresponding to the different time differences in the matrix. A plurality of peaks in a distribution of the repetition probabilities are identified. The loop region in the sound signal is identified by collating a reference matrix with the degree of similarity matrix.
    Type: Grant
    Filed: February 19, 2009
    Date of Patent: July 23, 2013
    Assignee: Yamaha Corporation
    Inventors: Bee Suan Ong, Sebastian Streich, Takuya Fujishima, Keita Arimoto
  • Patent number: 8489397
    Abstract: A machine-readable medium and a network device are provided for speech-to-text translation. Speech packets are received at a broadband telephony interface and stored in a buffer. The speech packets are processed and textual representations thereof are displayed as words on a display device. Speech processing is activated and deactivated in response to a command from a subscriber.
    Type: Grant
    Filed: September 11, 2012
    Date of Patent: July 16, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Charles David Caldwell, John Bruce Harlow, Robert J. Sayko, Norman Shaye
  • Patent number: 8447605
    Abstract: A game apparatus includes a CPU core for creating an input envelope and a registered envelope. The input envelope has a plurality of envelope values detected from a voice waveform input in real time through a microphone. The registered envelope has a plurality of envelope values detected from a voice waveform previously input. Both of the input envelope and the registered envelope are stored in a RAM. The CPU core evaluates difference of the envelope values between the input envelope and the registered envelope. When an evaluated value satisfies a condition, the CPU core executes a process according to a command assigned to the registered envelope.
    Type: Grant
    Filed: June 3, 2005
    Date of Patent: May 21, 2013
    Assignee: Nintendo Co., Ltd.
    Inventor: Yoji Inagaki
  • Patent number: 8438028
    Abstract: A method of and system for managing nametags including receiving a command from a user to store a nametag, prompting the user to input a number to be stored in association with the nametag, receiving an input for the number from the user, prompting the user to input the nametag to be stored in association with the number, receiving an input for the nametag from the user, processing the nametag input, and calculating confusability of the nametag input in multiple individual domains including a nametag domain, a number domain, and a command domain.
    Type: Grant
    Filed: May 18, 2010
    Date of Patent: May 7, 2013
    Assignee: General Motors LLC
    Inventors: Rathinavelu Chengalvarayan, Lawrence D. Cepuran
  • Patent number: 8433569
    Abstract: A method of accessing a dial-up service is disclosed. An example method of providing access to a service includes receiving a first speech signal from a user to form a first utterance; recognizing the first utterance using speaker independent speaker recognition; requesting the user to enter a personal identification number; and when the personal identification number is valid, receiving a second speech signal to form a second utterance and providing access to the service.
    Type: Grant
    Filed: October 3, 2011
    Date of Patent: April 30, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Robert Wesley Bossemeyer, Jr.
  • Patent number: 8433575
    Abstract: A system and method is described in which a multimedia story is rendered to a consumer in dependence on features extracted from an audio signal representing for example a musical selection of the consumer. Features such as key changes and tempo of the music selection are related to dramatic parameters defined by and associated with story arcs, narrative story rules and film or story structure. In one example a selection of a few music tracks provides input audio signals (602) from which musical features are extracted (604), following which a dramatic parameter list and timeline are generated (606). Media fragments are then obtained (608), the fragments having story content associated with the dramatic parameters, and the fragments output (610) with the music selection.
    Type: Grant
    Filed: December 10, 2003
    Date of Patent: April 30, 2013
    Assignee: AMBX UK Limited
    Inventors: David A. Eves, Richard S. Cole, Christopher Thorne
  • Publication number: 20130054242
    Abstract: Embodiments of the present invention improve methods of performing speech recognition. In one embodiment, the present invention includes a method comprising receiving a spoken utterance, processing the spoken utterance in a speech recognizer to generate a recognition result, determining consistencies of one or more parameters of component sounds of the spoken utterance, wherein the parameters are selected from the group consisting of duration, energy, and pitch, and wherein each component sound of the spoken utterance has a corresponding value of said parameter, and validating the recognition result based on the consistency of at least one of said parameters.
    Type: Application
    Filed: August 24, 2011
    Publication date: February 28, 2013
    Applicant: SENSORY, INCORPORATED
    Inventors: Jonathan Shaw, Pieter Vermeulen, Stephen Sutton, Robert Savoie
  • Publication number: 20130046539
    Abstract: A method, an apparatus and an article of manufacture for automatic speech recognition. The method includes obtaining at least one language model word and at least one rule-based grammar word, determining an acoustic similarity of at least one pair of language model word and rule-based grammar word, and increasing a transition cost to the at least one language model word based on the acoustic similarity of the at least one language model word with the at least one rule-based grammar word to generate a modified language model for automatic speech recognition.
    Type: Application
    Filed: August 16, 2011
    Publication date: February 21, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Om D. Deshmukh, Etienne Marcheret, Shajith I. Mohamed, Ashish Verma, Karthik Visweswariah
  • Patent number: 8374868
    Abstract: A method for recognizing speech involves reciting, into a speech recognition system, an utterance including a numeric sequence that contains a digit string including a plurality of tokens and detecting a co-articulation problem related to at least two potentially co-articulated tokens in the digit string. The numeric sequence may be identified using i) a dynamically generated possible numeric sequence that potentially corresponds with the numeric sequence, and/or ii) at least one supplemental acoustic model. Also disclosed herein is a system for accomplishing the same.
    Type: Grant
    Filed: August 21, 2009
    Date of Patent: February 12, 2013
    Assignee: General Motors LLC
    Inventors: Uma Arun, Sherri J Voran-Nowak, Rathinavelu Chengalvarayan, Gaurav Talwar
  • Patent number: 8355907
    Abstract: In one embodiment, the present invention comprises a vocoder having at least one input and at least one output, an encoder comprising a filter having at least one input operably connected to the input of the vocoder and at least one output, a decoder comprising a synthesizer having at least one input operably connected to the at least one output of the encoder, and at least one output operably connected to the at least one output of the vocoder, wherein the decoder comprises a memory and the decoder is adapted to execute instructions stored in the memory comprising phase matching and time-warping a speech frame.
    Type: Grant
    Filed: July 27, 2005
    Date of Patent: January 15, 2013
    Assignee: QUALCOMM Incorporated
    Inventors: Rohit Kapoor, Serafin Diaz Spindola
  • Publication number: 20130006630
    Abstract: A state detecting apparatus includes: a processor to execute acquiring utterance data related to uttered speech, computing a plurality of statistical quantities for feature parameters regarding features of the utterance data, creating, on the basis of the plurality of statistical quantities regarding the utterance data and another plurality of statistical quantities regarding reference utterance data based on other uttered speech, pseudo-utterance data having at least one statistical quantity equal to a statistical quantity in the other plurality of statistical quantities, computing a plurality of statistical quantities for synthetic utterance data synthesized on the basis of the pseudo-utterance data and the utterance data, and determining, on the basis of a comparison between statistical quantities of the synthetic utterance data and statistical quantities of the reference utterance data, whether the speaker who produced the uttered speech is in a first state or a second state; and a memory.
    Type: Application
    Filed: April 13, 2012
    Publication date: January 3, 2013
    Applicant: FUJITSU LIMITED
    Inventors: Shoji HAYAKAWA, Naoshi Matsuo
  • Publication number: 20120330662
    Abstract: An input supporting system (1) includes a database (10) which accumulates data for a plurality of items therein, an extraction unit (104) which compares, with the data for the items in the database (10), input data which is obtained as a result of a speech recognition process on speech data (D0), and extracts data similar to the input data from the database, and a presentation unit (106) which presents the extracted data as candidates to be registered in the database (10).
    Type: Application
    Filed: January 17, 2011
    Publication date: December 27, 2012
    Applicant: NEC CORPORATION
    Inventor: Masahiro Saikou
  • Patent number: 8332220
    Abstract: Differential dynamic content delivery including providing a session document for a presentation, wherein the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming presentation speech to the user including individual speech from at least one user participating in the presentation; converting the presentation speech to text; detecting whether the presentation speech contains simultaneous individual speech from two or more users; and displaying the text if the presentation speech contains simultaneous individual speech from two or more users.
    Type: Grant
    Filed: March 25, 2008
    Date of Patent: December 11, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: William K. Bodin, Michael J. Burkhart, Daniel G. Eisenhauer, Daniel M. Schumacher, Thomas J. Watson
  • Patent number: 8306817
    Abstract: In an automatic speech recognition system, a feature extractor extracts features from a speech signal, and speech is recognized by the automatic speech recognition system based on the extracted features. Noise reduction as part of the feature extractor is provided by feature enhancement in which feature-domain noise reduction in the form of Mel-frequency cepstra is provided based on the minimum means square error criterion. Specifically, the devised method takes into account the random phase between the clean speech and the mixing noise. The feature-domain noise reduction is performed in a dimension-wise fashion to the individual dimensions of the feature vectors input to the automatic speech recognition system, in order to perform environment-robust speech recognition.
    Type: Grant
    Filed: January 8, 2008
    Date of Patent: November 6, 2012
    Assignee: Microsoft Corporation
    Inventors: Dong Yu, Alejandro Acero, James G. Droppo, Li Deng
  • Publication number: 20120259637
    Abstract: An electronic apparatus and method for retrieving a song, and a storage medium. The electronic apparatus includes: a storage unit which stores a plurality of songs; a user input unit which receives a hummed query which is inputted for retrieving a song; and a song retrieving unit which retrieves a song based on the hummed query from among the plurality of stored songs when the hummed query is received. The song retrieving unit extracts a pitch and a duration of the hummed query, converts each of the extracted pitch and duration into multi-level symbols, calculates a string edit distance between the hummed query and one of the plurality of songs based on the symbols, and determines a similarity between the hummed query and a song based on edit operations which are performed within the calculated string edit distance.
    Type: Application
    Filed: April 11, 2012
    Publication date: October 11, 2012
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: A. Srinivas, P. Krishnamoorthy, Rajen Bhatt, Sarvesh Kumar
  • Publication number: 20120239398
    Abstract: In one aspect, a method for determining a validity of an identity asserted by a speaker using a voice print is provided. The method comprises acts of performing a first verification stage comprising comparing a first voice signal from the speaker uttering at least one first challenge utterance-with at least a portion of the voice print and performing a second verification stage if it is concluded in the first verification stage that the first voice signal was obtained from an utterance by the user. The second verification stage comprises adapting at least one parameter of the voice print based, at least in part, on the first voice signal to obtain an adapted voice print, and comparing a second voice signal from the speaker uttering at least one second challenge utterance with at least a portion of the adapted voice print.
    Type: Application
    Filed: April 9, 2012
    Publication date: September 20, 2012
    Applicant: Nuance Communications, Inc.
    Inventors: Kevin R. Farrell, David A. James, William F. Ganong, III, Jerry K. Carter
  • Publication number: 20120232900
    Abstract: The present invention relates to a method for speaker recognition, comprising the steps of obtaining and storing speaker information for at least one target speaker; obtaining a plurality of speech samples from a plurality of telephone calls from at least one unknown speaker; classifying the speech samples according to the at least one unknown speaker thereby providing speaker-dependent classes of speech samples; extracting speaker information for the speech samples of each of the speaker-dependent classes of speech samples; combining the extracted speaker information for each of the speaker-dependent classes of speech samples; comparing the combined extracted speaker information for each of the speaker-dependent classes of speech samples with the stored speaker information for the at least one target speaker to obtain at least one comparison result; and determining whether one of the at least one unknown speakers is identical with the at least one target speaker based on the at least one comparison result.
    Type: Application
    Filed: November 12, 2009
    Publication date: September 13, 2012
    Inventors: Johan Nikolaas Langehoveen Brummer, Luis Buera Rodriguez, Martha Garcia Gomar
  • Publication number: 20120232899
    Abstract: A system and method for identification of a speaker by phonograms of oral speech is disclosed. Similarity between a first phonogram of the speaker and a second, or sample, phonogram is evaluated by matching formant frequencies in referential utterances of a speech signal, where the utterances for comparison are selected from the first phonogram and the second phonogram. Referential utterances of speech signals are selected from the first phonogram and the second phonogram, where the referential utterances include formant paths of at least three formant frequencies. The selected referential utterances including at least two identical formant frequencies are compared therebetween. Similarity of the compared referential utterances from matching other formant frequencies is evaluated, where similarity of the phonograms is determined from evaluation of similarity of all the compared referential utterances.
    Type: Application
    Filed: March 23, 2012
    Publication date: September 13, 2012
    Applicant: Obschestvo s orgranichennoi otvetstvennost'yu "Centr Rechevyh Technologij"
    Inventor: Sergey Lvovich Koval
  • Patent number: 8260061
    Abstract: In an image data output processing apparatus of the present invention, an image matching section is capable of determining whether a similarity exists between each image of an N-up document and a reference document when input image data is indicative of the N-up document. An output process control section is capable of regulating an output process of each image in accordance with a result of determining whether the similarity exists between each image of the N-up document and the reference document. This allows detecting with high accuracy a document image under regulation on the output process and regulating the output process, when the input image data is indicative of an N-up document and includes the document image under regulation on the output process.
    Type: Grant
    Filed: September 18, 2008
    Date of Patent: September 4, 2012
    Assignee: Sharp Kabushiki Kaisha
    Inventor: Hitoshi Hirohata
  • Patent number: 8255214
    Abstract: A first signal of two signals to be compared for similarity is divided into small areas and one small area is selected for calculating the correlation with a second signal using a correlative method. Then, the quantity of translation, expansion rate and similarity in an area where the similarity, which is the square of the correlation value, reaches its maximum, are found. Values based on the similarity are integrated at a position represented by the quantity of translation and expansion rate. Similar processing is performed with respect to all the small areas, and at a peak where the maximum integral value of the similarity is obtained, its magnitude is compared with a threshold value to evaluate the similarity. The small area voted for that peak can be extracted.
    Type: Grant
    Filed: October 15, 2002
    Date of Patent: August 28, 2012
    Assignee: Sony Corporation
    Inventors: Mototsugu Abe, Masayuki Nishiguchi
  • Publication number: 20120209606
    Abstract: Obtaining information from audio interactions associated with an organization. The information may comprise entities, relations or events. The method comprises: receiving a corpus comprising audio interactions; performing audio analysis on audio interactions of the corpus to obtain text documents; performing linguistic analysis of the text documents; matching the text documents with one or more rules to obtain one or more matches; and unifying or filtering the matches.
    Type: Application
    Filed: February 14, 2011
    Publication date: August 16, 2012
    Applicant: Nice Systems Ltd.
    Inventors: Maya Gorodetsky, Ezra Daya, Oren Pereg
  • Patent number: 8239197
    Abstract: A system and method for efficiently transcribing verbal messages transmitted over the Internet (or other network) into text. The verbal messages are initially checked to ensure that they are in a valid format and include a return network address, and if so, are processed either as whole verbal messages or split into segments. These whole verbal messages and segments are processed by an automated speech recognition (ASR) program, which produces automatically recognized text. The automatically recognized text messages or segments are assigned to selected workbenches for manual editing and transcription, producing edited text. The segments of edited text are reassembled to produce whole edited text messages, undergo post processing to correct minor errors and output as an email, an SMS message, a file, or an input to a program. The automatically recognized text and manual edits thereof are returned as feedback to the ASR program to improve its accuracy.
    Type: Grant
    Filed: October 29, 2008
    Date of Patent: August 7, 2012
    Assignee: Intellisist, Inc.
    Inventors: Mike O. Webb, Bruce J. Peterson, Janet S. Kaseda
  • Publication number: 20120191453
    Abstract: A system and methods for matching at least one word of an utterance against a set of template hierarchies to select the best matching template or set of templates corresponding to the utterance. The system and methods determines at least one exact, inexact, and partial match between the at least one word of the utterance and at least one term within the template hierarchy to select and populate a template or set of templates corresponding to the utterance. The populated template or set of templates may then be used to generate a narrative template or a report template.
    Type: Application
    Filed: March 30, 2012
    Publication date: July 26, 2012
    Applicant: Cyberpulse L.L.C.
    Inventors: James ROBERGE, Jeffrey Soble
  • Patent number: 8229744
    Abstract: A method, system, and computer program for class detection and time mediated averaging of class dependent models. A technique is described to take advantage of gender information in training data and how obtain female, male, and gender independent models from this information. By using a probability value to average male and female Gaussian Mixture Models (GMMs), dramatic deterioration in cross gender decoding performance is avoided.
    Type: Grant
    Filed: August 26, 2003
    Date of Patent: July 24, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Satyanarayana Dharanipragada, Peder A. Olsen
  • Patent number: 8214211
    Abstract: In a voice processing device, a male voice index calculator calculates a male voice index indicating a similarity of the input sound relative to a male speaker sound model. A female voice index calculator calculates a female voice index indicating a similarity of the input sound relative to a female speaker sound model. A first discriminator discriminates the input sound between a non-human-voice sound and a human voice sound which may be either of the male voice sound or the female voice sound. A second discriminator discriminates the input sound between the male voice sound and the female voice sound based on the male voice index and the female voice index in case that the first discriminator discriminates the human voice sound.
    Type: Grant
    Filed: August 26, 2008
    Date of Patent: July 3, 2012
    Assignee: Yamaha Corporation
    Inventor: Yasuo Yoshioka
  • Patent number: 8209174
    Abstract: A text-independent speaker verification system utilizes mel frequency cepstral coefficients analysis in the feature extraction blocks, template modeling with vector quantization in the pattern matching blocks, an adaptive threshold and an adaptive decision verdict and is implemented in a stand-alone device using less powerful microprocessors and smaller data storage devices than used by comparable systems of the prior art.
    Type: Grant
    Filed: April 17, 2009
    Date of Patent: June 26, 2012
    Assignee: Saudi Arabian Oil Company
    Inventor: Essam Abed Al-Telmissani