Specialized Equations Or Comparisons Patents (Class 704/236)

Correlation (Class 704/237)

Distance (Class 704/238)

Similarity (Class 704/239)

Probability (Class 704/240)

Dynamic time warping (Class 704/241)

Viterbi trellis (Class 704/242)

System and method for phonetic searching of data

Patent number: 9405828

Abstract: A method of phonetically searching media information comprises receiving a plurality of search queries from one or more client systems and providing a phonetic representation of each search query. One or more search jobs are instantiated, each search job comprising a plurality of tasks, each task being arranged to sequentially read a block from an archive file. The archive file is stored within a distributed filing system (DFS) in which sequential blocks of data comprising the archive file are replicated to be locally available to one or more processors from a cluster of processors for executing the tasks. Each block stores index files corresponding to a plurality of source media files, each index file containing a phonetic stream corresponding to audio information for a given source media file. Each task obtains phonetic representations of outstanding search queries for a block and sequentially searches the block for each outstanding search query.

Type: Grant

Filed: September 6, 2012

Date of Patent: August 2, 2016

Assignee: Avaya Inc.

Inventors: Malcolm Fintan Wilkins, Gareth Alan Wynn
Systems and methods for online data-linked telecommunications decisions and distribution

Patent number: 9374464

Abstract: Systems and methods are disclosed for online data-linked telecommunications decisioning and distribution. One method includes receiving call data relating to a telephone call from a telephone device of a user to an interactive voice response (“IVR”) system; accessing a database storing correlated call data and user data; retrieving correlated call data and user data based on the telephone number of the call data; determining a confidence score defining a confidence that the received call data relates to the retrieved correlated call data and user data; correlating the received call data with retrieved call data and user data when the confidence score is greater than a threshold value; determining an IVR response to present to the user via the IVR system; and transmitting the determined IVR response to the IVR system for presentation to the telephone device of the user.

Type: Grant

Filed: December 3, 2015

Date of Patent: June 21, 2016

Assignee: AOL Advertising Inc.

Inventor: Seth Mitchell Demsey
System and method for evaluating the quality of a communication session

Patent number: 9363365

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for evaluating the quality of a communication session or of a communication path used for the communication session. One of the methods includes initiating a communication session between a first communications device and a second communications device, wherein initiating the communication session comprises routing session data for the communication session along a first communication path between the first communications device and the second communications device; generating, at the first communications device, a plurality of reference content samples; generating a recording of the communication session as received at a first destination along the first communication path; and evaluating a quality of the communication session or of the first communication path by comparing the plurality of reference content samples with the recorded communication session.

Type: Grant

Filed: April 27, 2015

Date of Patent: June 7, 2016

Assignee: RingCentral, Inc.

Inventor: Mikhail Nekorystnov
Speaker recognition method through emotional model synthesis based on neighbors preserving principle

Patent number: 9355642

Abstract: A speaker recognition method through emotional model synthesis based on Neighbors Preserving Principle is enclosed. The methods includes the following steps: (1) training the reference speaker's and user's speech models; (2) extracting the neutral-to-emotion transformation/mapping sets of GMM reference models; (3) extracting the emotion reference Gaussian components mapped by or corresponding to several Gaussian neutral reference Gaussian components close to the user's neutral training Gaussian component; (4) synthesizing the user's emotion training Gaussian component and then synthesizing the user's emotion training model; (5) synthesizing all user's GMM training models; (6) inputting test speech and conducting the identification.

Type: Grant

Filed: September 4, 2012

Date of Patent: May 31, 2016

Assignee: ZHEJIANG UNIVERSITY

Inventors: Zhaohui Wu, Yingchun Yang, Li Chen
Identification of a local speaker

Patent number: 9336780

Abstract: Method for speaker identification includes detecting a target speaker's utterance locally; extracting features from the detected utterance locally, analyzing the extracted features in the local device to obtain information on the speaker identification and/or encoding the extracted features locally, transmitting the encoded extracted features to a remote server, decoding and analyzing the received extracted features by the server to obtain information on the speaker identification, and transmitting the information on the speaker identification from the server to the location where the speaker's utterance was detected. The method further includes detecting speech activity locally. Extracting features, encoding the extracted features, and/or transmitting the encoded extracted features to the server, are only performed if speech activity above some predetermined threshold is detected.

Type: Grant

Filed: June 20, 2011

Date of Patent: May 10, 2016

Assignee: AGNITIO, S.L.

Inventors: Luis Buera Rodriguez, Carlos Vaquero Aviles-Casco, Marta Garcia Gomar
Pattern recognition apparatus for creating multiple systems and combining the multiple systems to improve recognition performance and pattern recognition method

Patent number: 9336770

Abstract: Provided is a pattern recognition apparatus for creating multiple systems and combining the multiple systems to improve the recognition performance, including a discriminative training unit for constructing model parameters of a second or subsequent system based on an output tendency of a previously-constructed model so as to be different from the output tendency of the previously-constructed model. Accordingly, when multiple systems are combined, the recognition performance can be improved without trials-and-errors.

Type: Grant

Filed: August 13, 2013

Date of Patent: May 10, 2016

Assignees: MITSUBISHI ELECTRIC CORPORATION, MITSUBISHI ELECTRIC RESEARCH LABORATORIES, INC.

Inventors: Yuki Tachioka, Shinji Watanabe
Media content identification on mobile devices

Patent number: 9313359

Abstract: A mobile device responds in real time to media content presented on a media device, such as a television. The mobile device captures temporal fragments of audio-video content on its microphone, camera, or both and generates corresponding audio-video query fingerprints. The query fingerprints are transmitted to a search server located remotely or used with a search function on the mobile device for content search and identification. Audio features are extracted and audio signal global onset detection is used for input audio frame alignment. Additional audio feature signatures are generated from local audio frame onsets, audio frame frequency domain entropy, and maximum change in the spectral coefficients. Video frames are analyzed to find a television screen in the frames, and a detected active television quadrilateral is used to generate video fingerprints to be combined with audio fingerprints for more reliable content identification.

Type: Grant

Filed: August 21, 2012

Date of Patent: April 12, 2016

Assignee: Gracenote, Inc.

Inventors: Mihailo M. Stojancic, Jose Pio Pereira, Peter Wendt, Shashank Merchant, Sunil Suresh Kulkarni
Analyzing uniform resource locators

Patent number: 9286408

Abstract: Methods for analyzing a Uniform Resource Locator (URL) and apparatus for performing such methods. The methods include parsing the URL into text segments and generating n-grams from the text segments. The methods further include generating annotations, each annotation corresponding to one of the n-grams and comprising a match value for its corresponding n-gram, a description of its match value, and a score. The methods still further include selecting a subset of the annotations.

Type: Grant

Filed: January 30, 2013

Date of Patent: March 15, 2016

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: Georgia Koutrika
Predictive video analytics system and methods

Patent number: 9269374

Abstract: The methods and systems described herein predict user behavior based on analysis of a user video communication. The methods include receiving a user video communication, extracting video facial analysis data from the video communication, extracting voice analysis data from the video communication, associating the video facial analysis data with the voice analysis data to determine an emotional state of a user, applying a linguistic-based psychological behavioral model to the voice analysis data to determine personality type of the user, and inputting the emotional state and personality type into a predictive model to determine a likelihood of an outcome of the video communication.

Type: Grant

Filed: October 27, 2014

Date of Patent: February 23, 2016

Assignee: Mattersight Corporation

Inventors: Kelly Conway, Christopher Danson
Corpus specific queries for corpora from search query

Patent number: 9251202

Abstract: A system determines search hypotheses for a search query, each search hypothesis defining a search type and respectively corresponding to a resource corpus of a type that matches the search type; for each search hypothesis, generate a hypothesis search query based on the search query and the search type and submits the hypothesis search query to a search service to determine a search hypothesis score, and for each search hypothesis having a search hypothesis score meeting a search hypothesis threshold, providing search results for the search operation performed for the hypothesis search query determined for the search hypothesis; and for each search hypothesis not having a search hypothesis score meeting a search hypothesis threshold, not providing search results for the search operation performed for the hypothesis search query determined for the search hypothesis.

Type: Grant

Filed: June 25, 2013

Date of Patent: February 2, 2016

Assignee: Google Inc.

Inventors: Jakob D. Uszkoreit, Percy Liang, Daniel M. Bikel, Pravir K. Gupta, Omer Bar-or
Dynamic clustering of nametags in an automated speech recognition system

Patent number: 9245526

Abstract: A speech recognition method includes receiving a nametag utterance, decoding the nametag utterance to recognize constituent subwords of the nametag utterance, determining the number of subwords in the nametag utterance, and associating the nametag utterance with one or more of a plurality of different nametag clusters based on the number of subwords in the nametag utterance. According to preferred aspects of the method, a confusability check is performed on the nametag utterance within the cluster(s) associated with the nametag utterance, stored nametags are received from memory by decoding the nametag utterance within the cluster(s) associated with the nametag utterance, and the stored nametags are played back by cluster.

Type: Grant

Filed: April 25, 2006

Date of Patent: January 26, 2016

Assignee: General Motors LLC

Inventor: Rathinavelu Chengalvarayan
Method and apparatus of processing speech dialog data of a user call

Patent number: 9232057

Abstract: A method and apparatus of processing caller experiences is disclosed. One example method may include determining a call event type occurring during a call and assigning a weight to the call event type via a processing device. The method may also include calculating a caller experience metric value representing a caller's current call status responsive to determining the at least one call event type, the caller experience metric being a function of the current event type weight and a discounting variable that discounts a value of past events. The method may also provide comparing the caller experience metric to a predefined threshold value and determining whether to perform at least one of transferring the call to a live agent and switching from a current caller modality to a different caller modality.

Type: Grant

Filed: June 10, 2014

Date of Patent: January 5, 2016

Assignee: West Corporation

Inventors: Silke Witt-ehsani, Aaron Scott Fisher
Keyword detection for speech recognition

Patent number: 9230541

Abstract: This application discloses a method implemented of recognizing a keyword in a speech that includes a sequence of audio frames further including a current frame and a subsequent frame. A candidate keyword is determined for the current frame using a decoding network that includes keywords and filler words of multiple languages, and used to determine a confidence score for the audio frame sequence. A word option is also determined for the subsequent frame based on the decoding network, and when the candidate keyword and the word option are associated with two distinct types of languages, the confidence score of the audio frame sequence is updated at least based on a penalty factor associated with the two distinct types of languages. The audio frame sequence is then determined to include both the candidate keyword and the word option by evaluating the updated confidence score according to a keyword determination criterion.

Type: Grant

Filed: December 11, 2014

Date of Patent: January 5, 2016

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Lu Ll, Li Lu, Jianxiong Ma, Linghui Kong, Feng Rao, Shuai Yue, Xiang Zhang, Haibo Liu, Eryu Wang, Bo Chen
Pass-sequences

Patent number: 9137238

Abstract: Authentication techniques, and in particular, authentication techniques which can be used in conjunction with input constrained devices are described herein. A plurality of words is received. The received words are parsed. A credential is authenticated by determining a match based on information associated with at least one of the received words in the plurality.

Type: Grant

Filed: February 11, 2011

Date of Patent: September 15, 2015

Assignee: RightQuestions, LLC

Inventor: Bjorn Markus Jakobsson
Method and apparatus for real time emotion detection in audio interactions

Patent number: 9093081

Abstract: The subject matter discloses a computerized method for real time emotion detection in audio interactions comprising: receiving at a computer server a portion of an audio interaction between a customer and an organization representative, the portion of the audio interaction comprises a speech signal; extracting feature vectors from the speech signal; obtaining a statistical model; producing adapted statistical data by adapting the statistical model according to the speech signal using the feature vectors extracted from the speech signal; obtaining an emotion classification model; and producing an emotion score based on the adapted statistical data and the emotion classification model, said emotion score represents the probability that the speaker that produced the speech signal is in an emotional state.

Type: Grant

Filed: March 10, 2013

Date of Patent: July 28, 2015

Assignee: NICE-SYSTEMS LTD

Inventors: Ronen Laperdon, Moshe Wasserblat, Tzach Ashkenazi, Ido David David, Oren Pereg
Methods and systems for establishing games with automation using verbal communication

Patent number: 9089772

Abstract: Methods and systems establish games with automation using verbal communication for exchanges between the automated game and the one or more game players. Game information data is converted into verbal information that is provided to the individual. The individual provides verbal instruction which is received and converted into the instruction data. The instruction data is applied to the current game to update the current game status. Information data for the current game status is converted to verbal information for the current game status which is provided to the individual. The game may be implemented on a local device of the individual or may be network-based and accessed remotely by the individual through verbal communication over a voice connection. The voice connection may be of various forms such as a conventional voiced call to a voice services node of a telephone network or a voice-over IP voiced call on a data network.

Type: Grant

Filed: December 3, 2007

Date of Patent: July 28, 2015

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Dave Anderson, Senis Busayapongchai
Context based language model selection

Patent number: 9047870

Abstract: Methods, computer program products and systems are described for speech-to-text conversion. A voice input is received from a user of an electronic device and contextual metadata is received that describes a context of the electronic device at a time when the voice input is received. Multiple base language models are identified, where each base language model corresponds to a distinct textual corpus of content. Using the contextual metadata, an interpolated language model is generated based on contributions from the base language models. The contributions are weighted according to a weighting for each of the base language models. The interpolated language model is used to convert the received voice input to a textual output. The voice input is received at a computer server system that is remote to the electronic device. The textual output is transmitted to the electronic device.

Type: Grant

Filed: September 29, 2011

Date of Patent: June 2, 2015

Assignee: Google Inc.

Inventors: Brandon M. Ballinger, Johan Schalkwyk, Michael H. Cohen, Cyril Georges Luc Allauzen, Michael D. Riley
User intention based on N-best list of recognition hypotheses for utterances in a dialog

Patent number: 9037462

Abstract: Disclosed herein are systems, computer-implemented methods, and tangible computer-readable media for using alternate recognition hypotheses to improve whole-dialog understanding accuracy. The method includes receiving an utterance as part of a user dialog, generating an N-best list of recognition hypotheses for the user dialog turn, selecting an underlying user intention based on a belief distribution across the generated N-best list and at least one contextually similar N-best list, and responding to the user based on the selected underlying user intention. Selecting an intention can further be based on confidence scores associated with recognition hypotheses in the generated N-best lists, and also on the probability of a user's action given their underlying intention. A belief or cumulative confidence score can be assigned to each inferred user intention.

Type: Grant

Filed: March 20, 2012

Date of Patent: May 19, 2015

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Jason Williams
Automatic disclosure detection

Patent number: 9037465

Abstract: A method of detecting pre-determined phrases to determine compliance quality is provided. The method includes determining whether at least one of an event or a precursor event has occurred based on a comparison between pre-determined phrases and a communication between a sender and a recipient in a communications network, and rating the recipient based on the presence of the pre-determined phrases associated with the event or the presence of the pre-determined phrases associated with the precursor event in the communication.

Type: Grant

Filed: February 21, 2013

Date of Patent: May 19, 2015

Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventors: I. Dan Melamed, Andrej Ljolje, Bernard Renger, Yeon-Jun Kim, David J. Smith
System and method for adapting automatic speech recognition pronunciation by acoustic model restructuring

Patent number: 9026442

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

Type: Grant

Filed: August 14, 2014

Date of Patent: May 5, 2015

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
Location determination system and mobile terminal

Patent number: 9026437

Abstract: A location determination system includes a first mobile terminal and a second mobile terminal. The first mobile terminal includes a first processor to acquire a first sound signal, analyze the first sound signal to obtain a first analysis result, and transmit the first analysis result. The second mobile terminal includes a second processor to acquire a second sound signal, analyze the second sound signal to obtain a second analysis result, receive the first analysis result from the first mobile terminal, compare the second analysis result with the first analysis result to obtain a comparison result, and determine whether the first mobile terminal locates in an area in which the second mobile terminal locates, based on the comparison result.

Type: Grant

Filed: March 26, 2012

Date of Patent: May 5, 2015

Assignee: Fujitsu Limited

Inventor: Eiji Hasegawa
SYSTEM AND METHOD FOR SELECTING NETWORK-BASED VERSUS EMBEDDED SPEECH PROCESSING

Publication number: 20150120296

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for making a multi-factor decision whether to process speech or language requests via a network-based speech processor or a local speech processor. An example local device configured to practice the method, having a local speech processor, and having access to a remote speech processor, receives a request to process speech. The local device can analyze multi-vector context data associated with the request to identify one of the local speech processor and the remote speech processor as an optimal speech processor. Then the local device can process the speech, in response to the request, using the optimal speech processor. If the optimal speech processor is local, then the local device processes the speech. If the optimal speech processor is remote, the local device passes the request and any supporting data to the remote speech processor and waits for a result.

Type: Application

Filed: October 29, 2013

Publication date: April 30, 2015

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Benjamin J. STERN, Enrico Luigi BOCCHIERI, Diamantino Antonio CASEIRO, Danilo GIULIANELLI, Ladan GOLIPOUR
Hidden markov model for speech processing with training method

Patent number: 9020816

Abstract: A method, system and apparatus are shown for identifying non-language speech sounds in a speech or audio signal. An audio signal is segmented and feature vectors are extracted from the segments of the audio signal. The segment is classified using a hidden Markov model (HMM) that has been trained on sequences of these feature vectors. Post-processing components can be utilized to enhance classification. An embodiment is described in which the hidden Markov model is used to classify a segment as a language speech sound or one of a variety of non-language speech sounds. Another embodiment is described in which the hidden Markov model is trained using discriminative learning.

Type: Grant

Filed: August 13, 2009

Date of Patent: April 28, 2015

Assignee: 21CT, Inc.

Inventor: Matthew McClain
SOUND CAPTURING AND IDENTIFYING DEVICES

Publication number: 20150112678

Abstract: Broadly speaking, embodiments of the present invention provide a device, systems and methods for capturing sounds, generating a sound model (or “sound pack”) for each captured sound, and identifying a detected sound using the sound model(s). Preferably, a single device is used to capture a sound, store sound models, and to identify a detected sound using the stored sound models.

Type: Application

Filed: December 30, 2014

Publication date: April 23, 2015

Applicant: AUDIO ANALYTIC LTD

Inventors: DOMINIC FRANK JULIAN BINKS, SACHA KRSTULOVIC, CHRISTOPHER JAMES MITCHELL
Formant based speech reconstruction from noisy signals

Patent number: 9015044

Abstract: Implementations of systems, method and devices described herein enable enhancing the intelligibility of a target voice signal included in a noisy audible signal received by a hearing aid device or the like. In particular, in some implementations, systems, methods and devices are operable to generate a machine readable formant based codebook. In some implementations, the method includes determining whether or not a candidate codebook tuple includes a sufficient amount of new information to warrant either adding the candidate codebook tuple to the codebook or using at least a portion of the candidate codebook tuple to update an existing codebook tuple. Additionally and/or alternatively, in some implementations systems, methods and devices are operable to reconstruct a target voice signal by detecting formants in an audible signal, using the detected formants to select codebook tuples, and using the formant information in the selected codebook tuples to reconstruct the target voice signal.

Type: Grant

Filed: August 20, 2012

Date of Patent: April 21, 2015

Assignee: Malaspina Labs (Barbados) Inc.

Inventors: Pierre Zakarauskas, Alexander Escott, Clarence S. H. Chu, Shawn E. Stevenson
SOUND IDENTIFICATION SYSTEMS

Publication number: 20150106095

Abstract: A digital sound identification system for storing a Markov model is disclosed. A processor is coupled to a sound data input, working memory, and a stored program memory for executing processor control code to input sound data for a sound to be identified. The sample sound data defines a sample frequency domain data energy in a range of frequency. Mean and variance values for a Markov model of the sample sound are generated. The Markov model is stored in the non-volatile memory. Interference sound data defining interference frequency domain data is inputted. The mean and variance values of the Markov model using the interference frequency domain data are adjusted. Sound data defining other sound frequency domain data are inputted. A probability of the other sound frequency domain data fitting the Markov model is determined. Finally, sound identification data dependent on the probability is outputted.

Type: Application

Filed: November 5, 2014

Publication date: April 16, 2015

Applicant: AUDIO ANALYTIC LTD.

Inventor: CHRISTOPHER JAMES MITCHELL
Pattern processing system specific to a user group

Patent number: 9009043

Abstract: Methods and apparatus for identifying a user group in connection with user group-based speech recognition. An exemplary method comprises receiving, from a user, a user group identifier that identifies a user group to which the user was previously assigned based on training data. The user group comprises a plurality of individuals including the user. The method further comprises using the user group identifier, identifying a pattern processing data set corresponding to the user group, and receiving speech input from the user to be recognized using the pattern processing data set.

Type: Grant

Filed: August 20, 2012

Date of Patent: April 14, 2015

Assignee: Nuance Communications, Inc.

Inventor: Peter Beyerlein
Systems and methods for improving the accuracy of a transcription using auxiliary data such as personal data

Patent number: 9009041

Abstract: A method is described for improving the accuracy of a transcription generated by an automatic speech recognition (ASR) engine. A personal vocabulary is maintained that includes replacement words. The replacement words in the personal vocabulary are obtained from personal data associated with a user. A transcription is received of an audio recording. The transcription is generated by an ASR engine using an ASR vocabulary and includes a transcribed word that represents a spoken word in the audio recording. Data is received that is associated with the transcribed word. A replacement word from the personal vocabulary is identified, which is used to re-score the transcription and replace the transcribed word.

Type: Grant

Filed: July 26, 2011

Date of Patent: April 14, 2015

Assignee: Nuance Communications, Inc.

Inventors: George Zavaliagkos, William F. Ganong, III, Uwe H. Jost, Shreedhar Madhavapeddi, Gary B. Clayton
Noise reduction using multi-feature cluster tracker

Patent number: 9008329

Abstract: Provided are methods and systems for noise suppression within multiple time-frequency points of spectral representations. A multi-feature cluster tracker is used to track signal and noise sources and to predict signal versus noise dominance at each time-frequency point. Multiple features, such as binaural and monaural features, may be used for these purposes. A Gaussian mixture model (GMM) is developed and, in some embodiments, dynamically updated for distinguishing signal from noise and performing mask-based noise reduction. Each frequency band may use a different GMM or share a GMM with other frequency bands. A GMM may be combined from two models, with one trained to model time-frequency points in which the target dominates and another trained to model time-frequency points in which the noise dominates. Dynamic updates of a GMM may be performed using an expectation-maximization algorithm in an unsupervised fashion.

Type: Grant

Filed: June 8, 2012

Date of Patent: April 14, 2015

Assignee: Audience, Inc.

Inventors: Michael Mandel, Carlos Avendano
SPEECH RECOGNIZER WITH MULTI-DIRECTIONAL DECODING

Publication number: 20150095026

Abstract: In an automatic speech recognition (ASR) processing system, ASR processing may be configured to process speech based on multiple channels of audio received from a beamformer. The ASR processing system may include a microphone array and the beamformer to output multiple channels of audio such that each channel isolates audio in a particular direction. The multichannel audio signals may include spoken utterances/speech from one or more speakers as well as undesired audio, such as noise from a household appliance. The ASR device may simultaneously perform speech recognition on the multi-channel audio to provide more accurate speech recognition results.

Type: Application

Filed: September 27, 2013

Publication date: April 2, 2015

Applicant: Amazon Technologies, Inc.

Inventors: Michael Maximilian Emanuel Bisani, Nikko Strom, Bjorn Hoffmeister, Ryan Paul Thomas
Speech Recognition Server Integration Device and Speech Recognition Server Integration Method

Publication number: 20150088506

Abstract: The speech recognition result through the general-purpose server and that through the specialized speech recognition server are integrated in an optimum manner, thereby, a speech recognition function least in errors in the end being provided. The specialized speech recognition server 108 is constructed with the words contained in the user dictionary data in use as well as the performance of the general-purpose speech recognition server 106 is preliminarily evaluated with such user dictionary data. Based on such evaluation result, information related to which recognition results through the specialized and general-purpose speech recognition servers are adopted and to how the adopted recognition results are weighted to obtain an optimum recognition result is preliminarily retained in the form of a database.

Type: Application

Filed: April 3, 2013

Publication date: March 26, 2015

Inventors: Yasunari Obuchi, Takeshi Homma
DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS

Publication number: 20150088507

Abstract: In some embodiments, the recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential significant errors. In some embodiments, the recognition results may be evaluated to determine whether a meaning of any of the alternative recognition results differs from a meaning of the top recognition result in a manner that is significant for a domain, such as the medical domain. In some embodiments, words and/or phrases that may be confused by an ASR system may be determined and associated in sets of words and/or phrases. Words and/or phrases that may be determined include those that change a meaning of a phrase or sentence when included in the phrase/sentence.

Type: Application

Filed: December 1, 2014

Publication date: March 26, 2015

Applicant: Nuance Communications, Inc.

Inventors: William F. Ganong, III, Raghu Vemula, Robert Fleming
Non-scorable response filters for speech scoring systems

Patent number: 8990082

Abstract: A method for scoring non-native speech includes receiving a speech sample spoken by a non-native speaker and performing automatic speech recognition and metric extraction on the speech sample to generate a transcript of the speech sample and a speech metric associated with the speech sample. The method further includes determining whether the speech sample is scorable or non-scorable based upon the transcript and speech metric, where the determination is based on an audio quality of the speech sample, an amount of speech of the speech sample, a degree to which the speech sample is off-topic, whether the speech sample includes speech from an incorrect language, or whether the speech sample includes plagiarized material. When the sample is determined to be non-scorable, an indication of non-scorability is associated with the speech sample. When the sample is determined to be scorable, the sample is provided to a scoring model for scoring.

Type: Grant

Filed: March 23, 2012

Date of Patent: March 24, 2015

Assignee: Educational Testing Service

Inventors: Su-Youn Yoon, Derrick Higgins, Klaus Zechner, Shasha Xie, Je Hun Jeon, Keelan Evanini
Method of analysing an audio signal

Patent number: 8990081

Abstract: A method of analyzing an audio signal is disclosed. A digital representation of an audio signal is received and a first output function is generated based on a response of a physiological model to the digital representation. At least one property of the first output function may be determined. One or more values are determined for use in analyzing the audio signal, based on the determined property of the first output function.

Type: Grant

Filed: September 11, 2009

Date of Patent: March 24, 2015

Assignee: Newsouth Innovations Pty Limited

Inventors: Wenliang Lu, Dipanjan Sen
Front-end difference coding for distributed speech recognition

Patent number: 8990076

Abstract: In automated speech recognition (ASR), multiple devices may be employed to perform the ASR in a distributed environment. To reduce bandwidth use in transmitting between devices ASR information is compressed prior to transmission. To counteract fidelity loss that may accompany such compression, two versions of an audio signal are processed by an acoustic front end (AFE), one version is unaltered and one is compressed and decompressed prior to AFE processing. The two versions are compared, and the comparison data is sent to a recipient for further ASR processing. The recipient uses the comparison data and a received version of the compressed audio signal to recreate the post-AFE processing results from the received audio signal. The result is improved ASR results and decreased bandwidth usage between distributed ASR devices.

Type: Grant

Filed: September 10, 2012

Date of Patent: March 24, 2015

Assignee: Amazon Technologies, Inc.

Inventor: Nikko Strom
METHOD AND APPARATUS FOR CONTROLLING ACCESS TO APPLICATIONS

Publication number: 20150081295

Abstract: According to an aspect of the present disclosure, a method for controlling access to a plurality of applications in an electronic device is disclosed. The method includes receiving a voice command from a speaker for accessing a target application among the plurality of applications, and verifying whether the voice command is indicative of a user authorized to access the applications based on a speaker model of the authorized user. In this method, each application is associated with a security level having a threshold value. The method further includes updating the speaker model with the voice command if the voice command is verified to be indicative of the user, and adjusting at least one of the threshold values based on the updated speaker model.

Type: Application

Filed: September 16, 2013

Publication date: March 19, 2015

Applicant: QUALCOMM Incorporated

Inventors: Sungrack Yun, Taesu Kim, Jun-Cheol Cho, Min-Kyu Park, Kyu Woong Hwang
Voice recognition system for registration of stable utterances

Patent number: 8977547

Abstract: A voice recognition system includes: a voice input unit 11 for inputting a voice uttered a plurality of times; a registering voice data storage unit 12 for storing voice data uttered the plurality of times and input into the voice input unit 11; an utterance stability verification unit 13 for determining a similarity between the voice data uttered the plurality of times that are read from the registering voice data storage unit 12, and determining that registration of the voice data is acceptable when the similarity is greater than a threshold Tl; and a standard pattern creation unit 14 for creating a standard pattern by using the voice data where the utterance stability verification unit 13 determines that registration is acceptable.

Type: Grant

Filed: October 8, 2009

Date of Patent: March 10, 2015

Assignee: Mitsubishi Electric Corporation

Inventors: Michihiro Yamazaki, Jun Ishii, Hiroki Sakashita, Kazuyuki Nogi
System and method for teaching non-lexical speech effects

Patent number: 8972259

Abstract: A method and system for teaching non-lexical speech effects includes delexicalizing a first speech segment to provide a first prosodic speech signal and data indicative of the first prosodic speech signal is stored in a computer memory. The first speech segment is audibly played to a language student and the student is prompted to recite the speech segment. The speech uttered by the student in response to the prompt, is recorded.

Type: Grant

Filed: September 9, 2010

Date of Patent: March 3, 2015

Assignee: Rosetta Stone, Ltd.

Inventors: Joseph Tepperman, Theban Stanley, Kadri Hacioglu
Speech recognition using multiple language models

Patent number: 8972260

Abstract: In accordance with one embodiment, a method of generating language models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a low-frequency plurality of utterances from the plurality of utterances having a frequency that is below the predetermined frequency threshold, generating a grammar-based language model using the high-frequency plurality of utterances as training data, and generating a statistical language model using the low-frequency plurality of utterances as training data.

Type: Grant

Filed: April 19, 2012

Date of Patent: March 3, 2015

Assignee: Robert Bosch GmbH

Inventors: Fuliang Weng, Zhe Feng, Kui Xu, Lin Zhao
Management of content items

Patent number: 8972416

Abstract: Disclosed are various embodiments of a content management application that facilitates a content management system. Content items that can include audio and/or video can be stored in the content management system. A transcript is generated that corresponds to spoken words within the content. Content can be tagged based upon the transcript. Content anomalies can also be detected as well as editing functionality provided.

Type: Grant

Filed: November 29, 2012

Date of Patent: March 3, 2015

Assignee: Amazon Technologies, Inc.

Inventors: Erin Nicole Rifkin, Joshua Drew Ramsden-Pogue
System and Method for Combining Frame and Segment Level Processing, Via Temporal Pooling, for Phonetic Classification

Publication number: 20150058012

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.

Type: Application

Filed: November 10, 2014

Publication date: February 26, 2015

Inventors: Sumit CHOPRA, Dimitrios DIMITRIADIS, Patrick HAFFNER
INFORMATION PROCESSING APPARATUS, INFORMATION UPDATING METHOD AND COMPUTER-READABLE STORAGE MEDIUM

Publication number: 20150058011

Abstract: An information processing apparatus recognizes input data as character information formed by character strings each being in a predetermined unit based on information relating to a character string as a recognition target, and performs processing based on the recognized character information.

Type: Application

Filed: August 6, 2014

Publication date: February 26, 2015

Inventors: Naoya MORITA, Satoshi Aoki, Shinya Miyazaki, Kazutaka Murakami, Yasuko Hashimoto
METHOD AND SYSTEM FOR BIAS CORRECTED SPEECH LEVEL DETERMINATION

Publication number: 20150058010

Abstract: Method for measuring level of speech determined by an audio signal in a manner which corrects for and reduces the effect of modification of the signal by the addition of noise thereto and/or amplitude compression thereof, and a system configured to perform any embodiment of the method. In some embodiments, the method includes steps of generating frequency banded, frequency-domain data indicative of an input speech signal, determining from the data a Gaussian parametric spectral model of the speech signal, and determining from the parametric spectral model an estimated mean speech level and a standard deviation value for each frequency band of the data; and generating speech level data indicative of a bias corrected mean speech level for each frequency band, including using at least one correction value to correct the estimated mean speech level for the frequency band, where each correction value has been predetermined using a reference speech model.

Type: Application

Filed: March 21, 2013

Publication date: February 26, 2015

Applicant: DOLBY LABORATORIES LICENSING CORPORATION

Inventors: David Gunawan, Glenn Dickins
Bimodal emotion recognition method and system utilizing a support vector machine

Patent number: 8965762

Abstract: A method is disclosed in the present disclosure for recognizing emotion by setting different weights to at least of two kinds of unknown information, such as image and audio information, based on their recognition reliability respectively. The weights are determined by the distance between test data and hyperplane and the standard deviation of training data and normalized by the mean distance between training data and hyperplane, representing the classification reliability of different information. The method recognizes the emotion according to the unidentified information having higher weights while the at least two kinds of unidentified information have different result classified by the hyperplane and correcting wrong classification result of the other unidentified information so as to raise the accuracy while emotion recognition. Meanwhile, the present disclosure also provides a learning step with a characteristic of higher learning speed through an algorithm of iteration.

Type: Grant

Filed: February 7, 2011

Date of Patent: February 24, 2015

Assignee: Industrial Technology Research Institute

Inventors: Kai-Tai Song, Meng-Ju Han, Jing-Huai Hsu, Jung-Wei Hong, Fuh-Yu Chang
Efficient empirical determination, computation, and use of acoustic confusability measures

Patent number: 8959019

Abstract: Efficient empirical determination, computation, and use of an acoustic confusability measure comprises: (1) an empirically derived acoustic confusability measure, comprising a means for determining the acoustic confusability between any two textual phrases in a given language, where the measure of acoustic confusability is empirically derived from examples of the application of a specific speech recognition technology, where the procedure does not require access to the internal computational models of the speech recognition technology, and does not depend upon any particular internal structure or modeling technique, and where the procedure is based upon iterative improvement from an initial estimate; (2) techniques for efficient computation of empirically derived acoustic confusability measure, comprising means for efficient application of an acoustic confusability score, allowing practical application to very large-scale problems; and (3) a method for using acoustic confusability measures to make principled

Type: Grant

Filed: October 31, 2007

Date of Patent: February 17, 2015

Assignee: Promptu Systems Corporation

Inventors: Harry Printz, Narren Chittar
METHODS AND SYSTEMS FOR IDENTIFYING ERRORS IN A SPEECH REGONITION SYSTEM

Publication number: 20150039309

Abstract: Methods are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. A method for model adaptation for a speech recognition system includes determining an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system. The method may further include adjusting an adaptation, of the model for the word or various models for the various words, based on the error rate. Apparatus are disclosed for identifying possible errors made by a speech recognition system without using a transcript of words input to the system. An apparatus for model adaptation for a speech recognition system includes a processor adapted to estimate an error rate, corresponding to either recognition of instances of a word or recognition of instances of various words, without using a transcript of words input to the system.

Type: Application

Filed: October 17, 2014

Publication date: February 5, 2015

Inventors: Keith P. Braho, Jeffrey P. Pike, Lisa A. Pike
Biometric identification and verification

Patent number: 8948466

Abstract: In real biometric systems, false match rates and false non-match rates of 0% do not exist. There is always some probability that a purported match is false, and that a genuine match is not identified. The performance of biometric systems is often expressed in part in terms of their false match rate and false non-match rate, with the equal error rate being when the two are equal. There is a tradeoff between the FMR and FNMR in biometric systems which can be adjusted by changing a matching threshold. This matching threshold can be automatically, dynamically and/or user adjusted so that a biometric system of interest can achieve a desired FMR and FNMR.

Type: Grant

Filed: October 11, 2013

Date of Patent: February 3, 2015

Assignee: Aware, Inc.

Inventor: David Benini
Parameter learning in a hidden trajectory model

Patent number: 8942978

Abstract: Parameters for distributions of a hidden trajectory model including means and variances are estimated using an acoustic likelihood function for observation vectors as an objection function for optimization. The estimation includes only acoustic data and not any intermediate estimate on hidden dynamic variables. Gradient ascent methods can be developed for optimizing the acoustic likelihood function.

Type: Grant

Filed: July 14, 2011

Date of Patent: January 27, 2015

Assignee: Microsoft Corporation

Inventors: Li Deng, Dong Yu, Xiaolong Li, Alejandro Acero
System and method for expressive language and developmental disorder assessment

Patent number: 8938390

Abstract: In one embodiment, a method for detecting autism in a natural language environment using a microphone, sound recorder, and a computer programmed with software for the specialized purpose of processing recordings captured by the microphone and sound recorder combination, the computer programmed to execute the method, includes segmenting an audio signal captured by the microphone and sound recorder combination using the computer programmed for the specialized purpose into a plurality recording segments. The method further includes determining which of the plurality of recording segments correspond to a key child. The method further includes determining which of the plurality of recording segments that correspond to the key child are classified as key child recordings.

Type: Grant

Filed: February 27, 2009

Date of Patent: January 20, 2015

Assignee: LENA Foundation

Inventors: Dongxin D. Xu, Terrance D. Paul
METHOD AND APPARATUS FOR EXEMPLARY SEGMENT CLASSIFICATION

Publication number: 20150019218

Abstract: Method and apparatus for segmenting speech by detecting the pauses between the words and/or phrases, and to determine whether a particular time interval contains speech or non-speech, such as a pause.

Type: Application

Filed: April 25, 2014

Publication date: January 15, 2015

Applicant: Speech Morphing Systems, Inc.

Inventors: Fathy Yassa, Ben Reaves

prev 1 2 3 4 5 6 7 8 … next