Specialized Equations Or Comparisons Patents (Class 704/236)
  • Patent number: 9405828
    Abstract: A method of phonetically searching media information comprises receiving a plurality of search queries from one or more client systems and providing a phonetic representation of each search query. One or more search jobs are instantiated, each search job comprising a plurality of tasks, each task being arranged to sequentially read a block from an archive file. The archive file is stored within a distributed filing system (DFS) in which sequential blocks of data comprising the archive file are replicated to be locally available to one or more processors from a cluster of processors for executing the tasks. Each block stores index files corresponding to a plurality of source media files, each index file containing a phonetic stream corresponding to audio information for a given source media file. Each task obtains phonetic representations of outstanding search queries for a block and sequentially searches the block for each outstanding search query.
    Type: Grant
    Filed: September 6, 2012
    Date of Patent: August 2, 2016
    Assignee: Avaya Inc.
    Inventors: Malcolm Fintan Wilkins, Gareth Alan Wynn
  • Patent number: 9374464
    Abstract: Systems and methods are disclosed for online data-linked telecommunications decisioning and distribution. One method includes receiving call data relating to a telephone call from a telephone device of a user to an interactive voice response (“IVR”) system; accessing a database storing correlated call data and user data; retrieving correlated call data and user data based on the telephone number of the call data; determining a confidence score defining a confidence that the received call data relates to the retrieved correlated call data and user data; correlating the received call data with retrieved call data and user data when the confidence score is greater than a threshold value; determining an IVR response to present to the user via the IVR system; and transmitting the determined IVR response to the IVR system for presentation to the telephone device of the user.
    Type: Grant
    Filed: December 3, 2015
    Date of Patent: June 21, 2016
    Assignee: AOL Advertising Inc.
    Inventor: Seth Mitchell Demsey
  • Patent number: 9363365
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for evaluating the quality of a communication session or of a communication path used for the communication session. One of the methods includes initiating a communication session between a first communications device and a second communications device, wherein initiating the communication session comprises routing session data for the communication session along a first communication path between the first communications device and the second communications device; generating, at the first communications device, a plurality of reference content samples; generating a recording of the communication session as received at a first destination along the first communication path; and evaluating a quality of the communication session or of the first communication path by comparing the plurality of reference content samples with the recorded communication session.
    Type: Grant
    Filed: April 27, 2015
    Date of Patent: June 7, 2016
    Assignee: RingCentral, Inc.
    Inventor: Mikhail Nekorystnov
  • Patent number: 9355642
    Abstract: A speaker recognition method through emotional model synthesis based on Neighbors Preserving Principle is enclosed. The methods includes the following steps: (1) training the reference speaker's and user's speech models; (2) extracting the neutral-to-emotion transformation/mapping sets of GMM reference models; (3) extracting the emotion reference Gaussian components mapped by or corresponding to several Gaussian neutral reference Gaussian components close to the user's neutral training Gaussian component; (4) synthesizing the user's emotion training Gaussian component and then synthesizing the user's emotion training model; (5) synthesizing all user's GMM training models; (6) inputting test speech and conducting the identification.
    Type: Grant
    Filed: September 4, 2012
    Date of Patent: May 31, 2016
    Assignee: ZHEJIANG UNIVERSITY
    Inventors: Zhaohui Wu, Yingchun Yang, Li Chen
  • Patent number: 9336780
    Abstract: Method for speaker identification includes detecting a target speaker's utterance locally; extracting features from the detected utterance locally, analyzing the extracted features in the local device to obtain information on the speaker identification and/or encoding the extracted features locally, transmitting the encoded extracted features to a remote server, decoding and analyzing the received extracted features by the server to obtain information on the speaker identification, and transmitting the information on the speaker identification from the server to the location where the speaker's utterance was detected. The method further includes detecting speech activity locally. Extracting features, encoding the extracted features, and/or transmitting the encoded extracted features to the server, are only performed if speech activity above some predetermined threshold is detected.
    Type: Grant
    Filed: June 20, 2011
    Date of Patent: May 10, 2016
    Assignee: AGNITIO, S.L.
    Inventors: Luis Buera Rodriguez, Carlos Vaquero Aviles-Casco, Marta Garcia Gomar
  • Patent number: 9336770
    Abstract: Provided is a pattern recognition apparatus for creating multiple systems and combining the multiple systems to improve the recognition performance, including a discriminative training unit for constructing model parameters of a second or subsequent system based on an output tendency of a previously-constructed model so as to be different from the output tendency of the previously-constructed model. Accordingly, when multiple systems are combined, the recognition performance can be improved without trials-and-errors.
    Type: Grant
    Filed: August 13, 2013
    Date of Patent: May 10, 2016
    Assignees: MITSUBISHI ELECTRIC CORPORATION, MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC.
    Inventors: Yuki Tachioka, Shinji Watanabe
  • Patent number: 9313359
    Abstract: A mobile device responds in real time to media content presented on a media device, such as a television. The mobile device captures temporal fragments of audio-video content on its microphone, camera, or both and generates corresponding audio-video query fingerprints. The query fingerprints are transmitted to a search server located remotely or used with a search function on the mobile device for content search and identification. Audio features are extracted and audio signal global onset detection is used for input audio frame alignment. Additional audio feature signatures are generated from local audio frame onsets, audio frame frequency domain entropy, and maximum change in the spectral coefficients. Video frames are analyzed to find a television screen in the frames, and a detected active television quadrilateral is used to generate video fingerprints to be combined with audio fingerprints for more reliable content identification.
    Type: Grant
    Filed: August 21, 2012
    Date of Patent: April 12, 2016
    Assignee: Gracenote, Inc.
    Inventors: Mihailo M. Stojancic, Jose Pio Pereira, Peter Wendt, Shashank Merchant, Sunil Suresh Kulkarni
  • Patent number: 9286408
    Abstract: Methods for analyzing a Uniform Resource Locator (URL) and apparatus for performing such methods. The methods include parsing the URL into text segments and generating n-grams from the text segments. The methods further include generating annotations, each annotation corresponding to one of the n-grams and comprising a match value for its corresponding n-gram, a description of its match value, and a score. The methods still further include selecting a subset of the annotations.
    Type: Grant
    Filed: January 30, 2013
    Date of Patent: March 15, 2016
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Georgia Koutrika
  • Patent number: 9269374
    Abstract: The methods and systems described herein predict user behavior based on analysis of a user video communication. The methods include receiving a user video communication, extracting video facial analysis data from the video communication, extracting voice analysis data from the video communication, associating the video facial analysis data with the voice analysis data to determine an emotional state of a user, applying a linguistic-based psychological behavioral model to the voice analysis data to determine personality type of the user, and inputting the emotional state and personality type into a predictive model to determine a likelihood of an outcome of the video communication.
    Type: Grant
    Filed: October 27, 2014
    Date of Patent: February 23, 2016
    Assignee: Mattersight Corporation
    Inventors: Kelly Conway, Christopher Danson
  • Patent number: 9251202
    Abstract: A system determines search hypotheses for a search query, each search hypothesis defining a search type and respectively corresponding to a resource corpus of a type that matches the search type; for each search hypothesis, generate a hypothesis search query based on the search query and the search type and submits the hypothesis search query to a search service to determine a search hypothesis score, and for each search hypothesis having a search hypothesis score meeting a search hypothesis threshold, providing search results for the search operation performed for the hypothesis search query determined for the search hypothesis; and for each search hypothesis not having a search hypothesis score meeting a search hypothesis threshold, not providing search results for the search operation performed for the hypothesis search query determined for the search hypothesis.
    Type: Grant
    Filed: June 25, 2013
    Date of Patent: February 2, 2016
    Assignee: Google Inc.
    Inventors: Jakob D. Uszkoreit, Percy Liang, Daniel M. Bikel, Pravir K. Gupta, Omer Bar-or
  • Patent number: 9245526
    Abstract: A speech recognition method includes receiving a nametag utterance, decoding the nametag utterance to recognize constituent subwords of the nametag utterance, determining the number of subwords in the nametag utterance, and associating the nametag utterance with one or more of a plurality of different nametag clusters based on the number of subwords in the nametag utterance. According to preferred aspects of the method, a confusability check is performed on the nametag utterance within the cluster(s) associated with the nametag utterance, stored nametags are received from memory by decoding the nametag utterance within the cluster(s) associated with the nametag utterance, and the stored nametags are played back by cluster.
    Type: Grant
    Filed: April 25, 2006
    Date of Patent: January 26, 2016
    Assignee: General Motors LLC
    Inventor: Rathinavelu Chengalvarayan
  • Patent number: 9232057
    Abstract: A method and apparatus of processing caller experiences is disclosed. One example method may include determining a call event type occurring during a call and assigning a weight to the call event type via a processing device. The method may also include calculating a caller experience metric value representing a caller's current call status responsive to determining the at least one call event type, the caller experience metric being a function of the current event type weight and a discounting variable that discounts a value of past events. The method may also provide comparing the caller experience metric to a predefined threshold value and determining whether to perform at least one of transferring the call to a live agent and switching from a current caller modality to a different caller modality.
    Type: Grant
    Filed: June 10, 2014
    Date of Patent: January 5, 2016
    Assignee: West Corporation
    Inventors: Silke Witt-ehsani, Aaron Scott Fisher
  • Patent number: 9230541
    Abstract: This application discloses a method implemented of recognizing a keyword in a speech that includes a sequence of audio frames further including a current frame and a subsequent frame. A candidate keyword is determined for the current frame using a decoding network that includes keywords and filler words of multiple languages, and used to determine a confidence score for the audio frame sequence. A word option is also determined for the subsequent frame based on the decoding network, and when the candidate keyword and the word option are associated with two distinct types of languages, the confidence score of the audio frame sequence is updated at least based on a penalty factor associated with the two distinct types of languages. The audio frame sequence is then determined to include both the candidate keyword and the word option by evaluating the updated confidence score according to a keyword determination criterion.
    Type: Grant
    Filed: December 11, 2014
    Date of Patent: January 5, 2016
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Lu Ll, Li Lu, Jianxiong Ma, Linghui Kong, Feng Rao, Shuai Yue, Xiang Zhang, Haibo Liu, Eryu Wang, Bo Chen
  • Patent number: 9137238
    Abstract: Authentication techniques, and in particular, authentication techniques which can be used in conjunction with input constrained devices are described herein. A plurality of words is received. The received words are parsed. A credential is authenticated by determining a match based on information associated with at least one of the received words in the plurality.
    Type: Grant
    Filed: February 11, 2011
    Date of Patent: September 15, 2015
    Assignee: RightQuestions, LLC
    Inventor: Bjorn Markus Jakobsson
  • Patent number: 9093081
    Abstract: The subject matter discloses a computerized method for real time emotion detection in audio interactions comprising: receiving at a computer server a portion of an audio interaction between a customer and an organization representative, the portion of the audio interaction comprises a speech signal; extracting feature vectors from the speech signal; obtaining a statistical model; producing adapted statistical data by adapting the statistical model according to the speech signal using the feature vectors extracted from the speech signal; obtaining an emotion classification model; and producing an emotion score based on the adapted statistical data and the emotion classification model, said emotion score represents the probability that the speaker that produced the speech signal is in an emotional state.
    Type: Grant
    Filed: March 10, 2013
    Date of Patent: July 28, 2015
    Assignee: NICE-SYSTEMS LTD
    Inventors: Ronen Laperdon, Moshe Wasserblat, Tzach Ashkenazi, Ido David David, Oren Pereg
  • Patent number: 9089772
    Abstract: Methods and systems establish games with automation using verbal communication for exchanges between the automated game and the one or more game players. Game information data is converted into verbal information that is provided to the individual. The individual provides verbal instruction which is received and converted into the instruction data. The instruction data is applied to the current game to update the current game status. Information data for the current game status is converted to verbal information for the current game status which is provided to the individual. The game may be implemented on a local device of the individual or may be network-based and accessed remotely by the individual through verbal communication over a voice connection. The voice connection may be of various forms such as a conventional voiced call to a voice services node of a telephone network or a voice-over IP voiced call on a data network.
    Type: Grant
    Filed: December 3, 2007
    Date of Patent: July 28, 2015
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Dave Anderson, Senis Busayapongchai
  • Patent number: 9047870
    Abstract: Methods, computer program products and systems are described for speech-to-text conversion. A voice input is received from a user of an electronic device and contextual metadata is received that describes a context of the electronic device at a time when the voice input is received. Multiple base language models are identified, where each base language model corresponds to a distinct textual corpus of content. Using the contextual metadata, an interpolated language model is generated based on contributions from the base language models. The contributions are weighted according to a weighting for each of the base language models. The interpolated language model is used to convert the received voice input to a textual output. The voice input is received at a computer server system that is remote to the electronic device. The textual output is transmitted to the electronic device.
    Type: Grant
    Filed: September 29, 2011
    Date of Patent: June 2, 2015
    Assignee: Google Inc.
    Inventors: Brandon M. Ballinger, Johan Schalkwyk, Michael H. Cohen, Cyril Georges Luc Allauzen, Michael D. Riley
  • Patent number: 9037462
    Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for using alternate recognition hypotheses to improve whole-dialog understanding accuracy. The method includes receiving an utterance as part of a user dialog, generating an N-best list of recognition hypotheses for the user dialog turn, selecting an underlying user intention based on a belief distribution across the generated N-best list and at least one contextually similar N-best list, and responding to the user based on the selected underlying user intention. Selecting an intention can further be based on confidence scores associated with recognition hypotheses in the generated N-best lists, and also on the probability of a user's action given their underlying intention. A belief or cumulative confidence score can be assigned to each inferred user intention.
    Type: Grant
    Filed: March 20, 2012
    Date of Patent: May 19, 2015
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Jason Williams
  • Patent number: 9037465
    Abstract: A method of detecting pre-determined phrases to determine compliance quality is provided. The method includes determining whether at least one of an event or a precursor event has occurred based on a comparison between pre-determined phrases and a communication between a sender and a recipient in a communications network, and rating the recipient based on the presence of the pre-determined phrases associated with the event or the presence of the pre-determined phrases associated with the precursor event in the communication.
    Type: Grant
    Filed: February 21, 2013
    Date of Patent: May 19, 2015
    Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventors: I. Dan Melamed, Andrej Ljolje, Bernard Renger, Yeon-Jun Kim, David J. Smith
  • Patent number: 9026442
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
    Type: Grant
    Filed: August 14, 2014
    Date of Patent: May 5, 2015
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 9026437
    Abstract: A location determination system includes a first mobile terminal and a second mobile terminal. The first mobile terminal includes a first processor to acquire a first sound signal, analyze the first sound signal to obtain a first analysis result, and transmit the first analysis result. The second mobile terminal includes a second processor to acquire a second sound signal, analyze the second sound signal to obtain a second analysis result, receive the first analysis result from the first mobile terminal, compare the second analysis result with the first analysis result to obtain a comparison result, and determine whether the first mobile terminal locates in an area in which the second mobile terminal locates, based on the comparison result.
    Type: Grant
    Filed: March 26, 2012
    Date of Patent: May 5, 2015
    Assignee: Fujitsu Limited
    Inventor: Eiji Hasegawa
  • Publication number: 20150120296
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for making a multi-factor decision whether to process speech or language requests via a network-based speech processor or a local speech processor. An example local device configured to practice the method, having a local speech processor, and having access to a remote speech processor, receives a request to process speech. The local device can analyze multi-vector context data associated with the request to identify one of the local speech processor and the remote speech processor as an optimal speech processor. Then the local device can process the speech, in response to the request, using the optimal speech processor. If the optimal speech processor is local, then the local device processes the speech. If the optimal speech processor is remote, the local device passes the request and any supporting data to the remote speech processor and waits for a result.
    Type: Application
    Filed: October 29, 2013
    Publication date: April 30, 2015
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Benjamin J. STERN, Enrico Luigi BOCCHIERI, Diamantino Antonio CASEIRO, Danilo GIULIANELLI, Ladan GOLIPOUR
  • Patent number: 9020816
    Abstract: A method, system and apparatus are shown for identifying non-language speech sounds in a speech or audio signal. An audio signal is segmented and feature vectors are extracted from the segments of the audio signal. The segment is classified using a hidden Markov model (HMM) that has been trained on sequences of these feature vectors. Post-processing components can be utilized to enhance classification. An embodiment is described in which the hidden Markov model is used to classify a segment as a language speech sound or one of a variety of non-language speech sounds. Another embodiment is described in which the hidden Markov model is trained using discriminative learning.
    Type: Grant
    Filed: August 13, 2009
    Date of Patent: April 28, 2015
    Assignee: 21CT, Inc.
    Inventor: Matthew McClain
  • Publication number: 20150112678
    Abstract: Broadly speaking, embodiments of the present invention provide a device, systems and methods for capturing sounds, generating a sound model (or “sound pack”) for each captured sound, and identifying a detected sound using the sound model(s). Preferably, a single device is used to capture a sound, store sound models, and to identify a detected sound using the stored sound models.
    Type: Application
    Filed: December 30, 2014
    Publication date: April 23, 2015
    Applicant: AUDIO ANALYTIC LTD
    Inventors: DOMINIC FRANK JULIAN BINKS, SACHA KRSTULOVIC, CHRISTOPHER JAMES MITCHELL
  • Patent number: 9015044
    Abstract: Implementations of systems, method and devices described herein enable enhancing the intelligibility of a target voice signal included in a noisy audible signal received by a hearing aid device or the like. In particular, in some implementations, systems, methods and devices are operable to generate a machine readable formant based codebook. In some implementations, the method includes determining whether or not a candidate codebook tuple includes a sufficient amount of new information to warrant either adding the candidate codebook tuple to the codebook or using at least a portion of the candidate codebook tuple to update an existing codebook tuple. Additionally and/or alternatively, in some implementations systems, methods and devices are operable to reconstruct a target voice signal by detecting formants in an audible signal, using the detected formants to select codebook tuples, and using the formant information in the selected codebook tuples to reconstruct the target voice signal.
    Type: Grant
    Filed: August 20, 2012
    Date of Patent: April 21, 2015
    Assignee: Malaspina Labs (Barbados) Inc.
    Inventors: Pierre Zakarauskas, Alexander Escott, Clarence S. H. Chu, Shawn E. Stevenson
  • Publication number: 20150106095
    Abstract: A digital sound identification system for storing a Markov model is disclosed. A processor is coupled to a sound data input, working memory, and a stored program memory for executing processor control code to input sound data for a sound to be identified. The sample sound data defines a sample frequency domain data energy in a range of frequency. Mean and variance values for a Markov model of the sample sound are generated. The Markov model is stored in the non-volatile memory. Interference sound data defining interference frequency domain data is inputted. The mean and variance values of the Markov model using the interference frequency domain data are adjusted. Sound data defining other sound frequency domain data are inputted. A probability of the other sound frequency domain data fitting the Markov model is determined. Finally, sound identification data dependent on the probability is outputted.
    Type: Application
    Filed: November 5, 2014
    Publication date: April 16, 2015
    Applicant: AUDIO ANALYTIC LTD.
    Inventor: CHRISTOPHER JAMES MITCHELL
  • Patent number: 9009043
    Abstract: Methods and apparatus for identifying a user group in connection with user group-based speech recognition. An exemplary method comprises receiving, from a user, a user group identifier that identifies a user group to which the user was previously assigned based on training data. The user group comprises a plurality of individuals including the user. The method further comprises using the user group identifier, identifying a pattern processing data set corresponding to the user group, and receiving speech input from the user to be recognized using the pattern processing data set.
    Type: Grant
    Filed: August 20, 2012
    Date of Patent: April 14, 2015
    Assignee: Nuance Communications, Inc.
    Inventor: Peter Beyerlein
  • Patent number: 9009041
    Abstract: A method is described for improving the accuracy of a transcription generated by an automatic speech recognition (ASR) engine. A personal vocabulary is maintained that includes replacement words. The replacement words in the personal vocabulary are obtained from personal data associated with a user. A transcription is received of an audio recording. The transcription is generated by an ASR engine using an ASR vocabulary and includes a transcribed word that represents a spoken word in the audio recording. Data is received that is associated with the transcribed word. A replacement word from the personal vocabulary is identified, which is used to re-score the transcription and replace the transcribed word.
    Type: Grant
    Filed: July 26, 2011
    Date of Patent: April 14, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: George Zavaliagkos, William F. Ganong, III, Uwe H. Jost, Shreedhar Madhavapeddi, Gary B. Clayton
  • Patent number: 9008329
    Abstract: Provided are methods and systems for noise suppression within multiple time-frequency points of spectral representations. A multi-feature cluster tracker is used to track signal and noise sources and to predict signal versus noise dominance at each time-frequency point. Multiple features, such as binaural and monaural features, may be used for these purposes. A Gaussian mixture model (GMM) is developed and, in some embodiments, dynamically updated for distinguishing signal from noise and performing mask-based noise reduction. Each frequency band may use a different GMM or share a GMM with other frequency bands. A GMM may be combined from two models, with one trained to model time-frequency points in which the target dominates and another trained to model time-frequency points in which the noise dominates. Dynamic updates of a GMM may be performed using an expectation-maximization algorithm in an unsupervised fashion.
    Type: Grant
    Filed: June 8, 2012
    Date of Patent: April 14, 2015
    Assignee: Audience, Inc.
    Inventors: Michael Mandel, Carlos Avendano
  • Publication number: 20150095026
    Abstract: In an automatic speech recognition (ASR) processing system, ASR processing may be configured to process speech based on multiple channels of audio received from a beamformer. The ASR processing system may include a microphone array and the beamformer to output multiple channels of audio such that each channel isolates audio in a particular direction. The multichannel audio signals may include spoken utterances/speech from one or more speakers as well as undesired audio, such as noise from a household appliance. The ASR device may simultaneously perform speech recognition on the multi-channel audio to provide more accurate speech recognition results.
    Type: Application
    Filed: September 27, 2013
    Publication date: April 2, 2015
    Applicant: Amazon Technologies, Inc.
    Inventors: Michael Maximilian Emanuel Bisani, Nikko Strom, Bjorn Hoffmeister, Ryan Paul Thomas
  • Publication number: 20150088506
    Abstract: The speech recognition result through the general-purpose server and that through the specialized speech recognition server are integrated in an optimum manner, thereby, a speech recognition function least in errors in the end being provided. The specialized speech recognition server 108 is constructed with the words contained in the user dictionary data in use as well as the performance of the general-purpose speech recognition server 106 is preliminarily evaluated with such user dictionary data. Based on such evaluation result, information related to which recognition results through the specialized and general-purpose speech recognition servers are adopted and to how the adopted recognition results are weighted to obtain an optimum recognition result is preliminarily retained in the form of a database.
    Type: Application
    Filed: April 3, 2013
    Publication date: March 26, 2015
    Inventors: Yasunari Obuchi, Takeshi Homma
  • Publication number: 20150088507
    Abstract: In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated to determine whether a meaning of any of the alternative recognition results differs from a meaning of the top recognition result in a manner that is significant for a domain, such as the medical domain. In some embodiments, words and/or phrases that may be confused by an ASR system may be determined and associated in sets of words and/or phrases. Words and/or phrases that may be determined include those that change a meaning of a phrase or sentence when included in the phrase/sentence.
    Type: Application
    Filed: December 1, 2014
    Publication date: March 26, 2015
    Applicant: Nuance Communications, Inc.
    Inventors: William F. Ganong, III, Raghu Vemula, Robert Fleming
  • Patent number: 8990082
    Abstract: A method for scoring non-native speech includes receiving a speech sample spoken by a non-native speaker and performing automatic speech recognition and metric extraction on the speech sample to generate a transcript of the speech sample and a speech metric associated with the speech sample. The method further includes determining whether the speech sample is scorable or non-scorable based upon the transcript and speech metric, where the determination is based on an audio quality of the speech sample, an amount of speech of the speech sample, a degree to which the speech sample is off-topic, whether the speech sample includes speech from an incorrect language, or whether the speech sample includes plagiarized material. When the sample is determined to be non-scorable, an indication of non-scorability is associated with the speech sample. When the sample is determined to be scorable, the sample is provided to a scoring model for scoring.
    Type: Grant
    Filed: March 23, 2012
    Date of Patent: March 24, 2015
    Assignee: Educational Testing Service
    Inventors: Su-Youn Yoon, Derrick Higgins, Klaus Zechner, Shasha Xie, Je Hun Jeon, Keelan Evanini
  • Patent number: 8990081
    Abstract: A method of analyzing an audio signal is disclosed. A digital representation of an audio signal is received and a first output function is generated based on a response of a physiological model to the digital representation. At least one property of the first output function may be determined. One or more values are determined for use in analyzing the audio signal, based on the determined property of the first output function.
    Type: Grant
    Filed: September 11, 2009
    Date of Patent: March 24, 2015
    Assignee: Newsouth Innovations Pty Limited
    Inventors: Wenliang Lu, Dipanjan Sen
  • Patent number: 8990076
    Abstract: In automated speech recognition (ASR), multiple devices may be employed to perform the ASR in a distributed environment. To reduce bandwidth use in transmitting between devices ASR information is compressed prior to transmission. To counteract fidelity loss that may accompany such compression, two versions of an audio signal are processed by an acoustic front end (AFE), one version is unaltered and one is compressed and decompressed prior to AFE processing. The two versions are compared, and the comparison data is sent to a recipient for further ASR processing. The recipient uses the comparison data and a received version of the compressed audio signal to recreate the post-AFE processing results from the received audio signal. The result is improved ASR results and decreased bandwidth usage between distributed ASR devices.
    Type: Grant
    Filed: September 10, 2012
    Date of Patent: March 24, 2015
    Assignee: Amazon Technologies, Inc.
    Inventor: Nikko Strom
  • Publication number: 20150081295
    Abstract: According to an aspect of the present disclosure, a method for controlling access to a plurality of applications in an electronic device is disclosed. The method includes receiving a voice command from a speaker for accessing a target application among the plurality of applications, and verifying whether the voice command is indicative of a user authorized to access the applications based on a speaker model of the authorized user. In this method, each application is associated with a security level having a threshold value. The method further includes updating the speaker model with the voice command if the voice command is verified to be indicative of the user, and adjusting at least one of the threshold values based on the updated speaker model.
    Type: Application
    Filed: September 16, 2013
    Publication date: March 19, 2015
    Applicant: QUALCOMM Incorporated
    Inventors: Sungrack Yun, Taesu Kim, Jun-Cheol Cho, Min-Kyu Park, Kyu Woong Hwang
  • Patent number: 8977547
    Abstract: A voice recognition system includes: a voice input unit 11 for inputting a voice uttered a plurality of times; a registering voice data storage unit 12 for storing voice data uttered the plurality of times and input into the voice input unit 11; an utterance stability verification unit 13 for determining a similarity between the voice data uttered the plurality of times that are read from the registering voice data storage unit 12, and determining that registration of the voice data is acceptable when the similarity is greater than a threshold Tl; and a standard pattern creation unit 14 for creating a standard pattern by using the voice data where the utterance stability verification unit 13 determines that registration is acceptable.
    Type: Grant
    Filed: October 8, 2009
    Date of Patent: March 10, 2015
    Assignee: Mitsubishi Electric Corporation
    Inventors: Michihiro Yamazaki, Jun Ishii, Hiroki Sakashita, Kazuyuki Nogi
  • Patent number: 8972259
    Abstract: A method and system for teaching non-lexical speech effects includes delexicalizing a first speech segment to provide a first prosodic speech signal and data indicative of the first prosodic speech signal is stored in a computer memory. The first speech segment is audibly played to a language student and the student is prompted to recite the speech segment. The speech uttered by the student in response to the prompt, is recorded.
    Type: Grant
    Filed: September 9, 2010
    Date of Patent: March 3, 2015
    Assignee: Rosetta Stone, Ltd.
    Inventors: Joseph Tepperman, Theban Stanley, Kadri Hacioglu
  • Patent number: 8972260
    Abstract: In accordance with one embodiment, a method of generating language models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a low-frequency plurality of utterances from the plurality of utterances having a frequency that is below the predetermined frequency threshold, generating a grammar-based language model using the high-frequency plurality of utterances as training data, and generating a statistical language model using the low-frequency plurality of utterances as training data.
    Type: Grant
    Filed: April 19, 2012
    Date of Patent: March 3, 2015
    Assignee: Robert Bosch GmbH
    Inventors: Fuliang Weng, Zhe Feng, Kui Xu, Lin Zhao
  • Patent number: 8972416
    Abstract: Disclosed are various embodiments of a content management application that facilitates a content management system. Content items that can include audio and/or video can be stored in the content management system. A transcript is generated that corresponds to spoken words within the content. Content can be tagged based upon the transcript. Content anomalies can also be detected as well as editing functionality provided.
    Type: Grant
    Filed: November 29, 2012
    Date of Patent: March 3, 2015
    Assignee: Amazon Technologies, Inc.
    Inventors: Erin Nicole Rifkin, Joshua Drew Ramsden-Pogue
  • Publication number: 20150058012
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.
    Type: Application
    Filed: November 10, 2014
    Publication date: February 26, 2015
    Inventors: Sumit CHOPRA, Dimitrios DIMITRIADIS, Patrick HAFFNER
  • Publication number: 20150058011
    Abstract: An information processing apparatus recognizes input data as character information formed by character strings each being in a predetermined unit based on information relating to a character string as a recognition target, and performs processing based on the recognized character information.
    Type: Application
    Filed: August 6, 2014
    Publication date: February 26, 2015
    Inventors: Naoya MORITA, Satoshi Aoki, Shinya Miyazaki, Kazutaka Murakami, Yasuko Hashimoto
  • Publication number: 20150058010
    Abstract: Method for measuring level of speech determined by an audio signal in a manner which corrects for and reduces the effect of modification of the signal by the addition of noise thereto and/or amplitude compression thereof, and a system configured to perform any embodiment of the method. In some embodiments, the method includes steps of generating frequency banded, frequency-domain data indicative of an input speech signal, determining from the data a Gaussian parametric spectral model of the speech signal, and determining from the parametric spectral model an estimated mean speech level and a standard deviation value for each frequency band of the data; and generating speech level data indicative of a bias corrected mean speech level for each frequency band, including using at least one correction value to correct the estimated mean speech level for the frequency band, where each correction value has been predetermined using a reference speech model.
    Type: Application
    Filed: March 21, 2013
    Publication date: February 26, 2015
    Applicant: DOLBY LABORATORIES LICENSING CORPORATION
    Inventors: David Gunawan, Glenn Dickins
  • Patent number: 8965762
    Abstract: A method is disclosed in the present disclosure for recognizing emotion by setting different weights to at least of two kinds of unknown information, such as image and audio information, based on their recognition reliability respectively. The weights are determined by the distance between test data and hyperplane and the standard deviation of training data and normalized by the mean distance between training data and hyperplane, representing the classification reliability of different information. The method recognizes the emotion according to the unidentified information having higher weights while the at least two kinds of unidentified information have different result classified by the hyperplane and correcting wrong classification result of the other unidentified information so as to raise the accuracy while emotion recognition. Meanwhile, the present disclosure also provides a learning step with a characteristic of higher learning speed through an algorithm of iteration.
    Type: Grant
    Filed: February 7, 2011
    Date of Patent: February 24, 2015
    Assignee: Industrial Technology Research Institute
    Inventors: Kai-Tai Song, Meng-Ju Han, Jing-Huai Hsu, Jung-Wei Hong, Fuh-Yu Chang
  • Patent number: 8959019
    Abstract: Efficient empirical determination, computation, and use of an acoustic confusability measure comprises: (1) an empirically derived acoustic confusability measure, comprising a means for determining the acoustic confusability between any two textual phrases in a given language, where the measure of acoustic confusability is empirically derived from examples of the application of a specific speech recognition technology, where the procedure does not require access to the internal computational models of the speech recognition technology, and does not depend upon any particular internal structure or modeling technique, and where the procedure is based upon iterative improvement from an initial estimate; (2) techniques for efficient computation of empirically derived acoustic confusability measure, comprising means for efficient application of an acoustic confusability score, allowing practical application to very large-scale problems; and (3) a method for using acoustic confusability measures to make principled
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: February 17, 2015
    Assignee: Promptu Systems Corporation
    Inventors: Harry Printz, Narren Chittar
  • Publication number: 20150039309
    Abstract: Methods are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. A method for model adaptation for a speech recognition system includes determining an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system. The method may further include adjusting an adaptation, of the model for the word or various models for the various words, based on the error rate. Apparatus are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. An apparatus for model adaptation for a speech recognition system includes a processor adapted to estimate an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system.
    Type: Application
    Filed: October 17, 2014
    Publication date: February 5, 2015
    Inventors: Keith P. Braho, Jeffrey P. Pike, Lisa A. Pike
  • Patent number: 8948466
    Abstract: In real biometric systems, false match rates and false non-match rates of 0% do not exist. There is always some probability that a purported match is false, and that a genuine match is not identified. The performance of biometric systems is often expressed in part in terms of their false match rate and false non-match rate, with the equal error rate being when the two are equal. There is a tradeoff between the FMR and FNMR in biometric systems which can be adjusted by changing a matching threshold. This matching threshold can be automatically, dynamically and/or user adjusted so that a biometric system of interest can achieve a desired FMR and FNMR.
    Type: Grant
    Filed: October 11, 2013
    Date of Patent: February 3, 2015
    Assignee: Aware, Inc.
    Inventor: David Benini
  • Patent number: 8942978
    Abstract: Parameters for distributions of a hidden trajectory model including means and variances are estimated using an acoustic likelihood function for observation vectors as an objection function for optimization. The estimation includes only acoustic data and not any intermediate estimate on hidden dynamic variables. Gradient ascent methods can be developed for optimizing the acoustic likelihood function.
    Type: Grant
    Filed: July 14, 2011
    Date of Patent: January 27, 2015
    Assignee: Microsoft Corporation
    Inventors: Li Deng, Dong Yu, Xiaolong Li, Alejandro Acero
  • Patent number: 8938390
    Abstract: In one embodiment, a method for detecting autism in a natural language environment using a microphone, sound recorder, and a computer programmed with software for the specialized purpose of processing recordings captured by the microphone and sound recorder combination, the computer programmed to execute the method, includes segmenting an audio signal captured by the microphone and sound recorder combination using the computer programmed for the specialized purpose into a plurality recording segments. The method further includes determining which of the plurality of recording segments correspond to a key child. The method further includes determining which of the plurality of recording segments that correspond to the key child are classified as key child recordings.
    Type: Grant
    Filed: February 27, 2009
    Date of Patent: January 20, 2015
    Assignee: LENA Foundation
    Inventors: Dongxin D. Xu, Terrance D. Paul
  • Publication number: 20150019218
    Abstract: Method and apparatus for segmenting speech by detecting the pauses between the words and/or phrases, and to determine whether a particular time interval contains speech or non-speech, such as a pause.
    Type: Application
    Filed: April 25, 2014
    Publication date: January 15, 2015
    Applicant: Speech Morphing Systems, Inc.
    Inventors: Fathy Yassa, Ben Reaves