Endpoint Detection Patents (Class 704/253)
  • Patent number: 10672397
    Abstract: The present teaching relates to facilitating a guided dialog with a user. In one example, an input utterance is obtained from the user. One or more task sets are estimated based on the input utterance. Each of the one or more task sets includes a plurality of tasks estimated to be requested by the user via the input utterance and is associated with a confidence score computed based on statistics with respect to the plurality of tasks in the task set. At least one of the one or more task sets is selected based on their respective confidence scores. A response is generated based on the tasks in the selected at least one task set. The response is provided to the user.
    Type: Grant
    Filed: July 26, 2019
    Date of Patent: June 2, 2020
    Assignee: Oath Inc.
    Inventors: Sungjin Lee, Amanda Stent
  • Patent number: 10667155
    Abstract: A method, a device, and a non-transitory storage medium for estimating voice call quality include performing automatic speech recognition, for each of a plurality of voice calls, to generate recognized text for both an originating device acoustic signal and a receiving device acoustic signal. The recognized text for both the originating device acoustic signal and the receiving device acoustic signal are compared to the reference text to identified recognition errors and a voice call quality score for each of the originating device acoustic signal and the receiving device acoustic signal are determined. A correlation between the network conditions and the voice call quality scores is then determined.
    Type: Grant
    Filed: July 16, 2018
    Date of Patent: May 26, 2020
    Assignee: Verizon Patent and Licensing Inc.
    Inventors: Ye Ouyang, Krishna Pichumani Iyer, Zhenyi Lin, Le Su
  • Patent number: 10582046
    Abstract: A voice recognition-based dialing method and a voice recognition-based dialing system are provided. The methods includes: determining a recognition result based on a user's voice input, at least one acoustic model and at least one language model, where the at least one acoustic model and the at least one language model are obtained based on information collected in an electronic device. The system includes: obtain at least one acoustic model and at least one language model based on information collected in an electronic device; and determine a recognition result based on a user's voice input, the at least one acoustic model and the at least one language model. The acoustic models and the language models are updated based on the information collected in the electronic device, which may be helpful to the voice recognition-based dialing.
    Type: Grant
    Filed: December 30, 2014
    Date of Patent: March 3, 2020
    Assignee: HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED
    Inventors: Jianjun Ma, Liping Hu, Richard Allen Kreifeldt
  • Patent number: 10523897
    Abstract: One or more sensor devices detect a condition of each of users. With regard to a user set including at least two of the users as elements, an information processing apparatus calculates an agreement degree representing a degree of agreement between the at least two users being the elements in the user set, based on the condition information on the at least two users which is in the obtained condition information on all the users. An information presentation device provides presentation information based on a result of the agreement degree calculation.
    Type: Grant
    Filed: August 30, 2017
    Date of Patent: December 31, 2019
    Assignee: PANASONIC INTELLECTUAL PROPERTY MANAGEMENT CO., LTD.
    Inventors: Yuri Nishikawa, Masayuki Misaki
  • Patent number: 10447675
    Abstract: A method for delivering primary information that exists in at least one electronic form, includes transmission of the primary information via at least one communication network to at least one communication system allocated to an addressee of the primary information; creation of verification information relating to the acknowledgement of the primary information by the addressee; and saving and/or transmission of the verification information via at least one communication network. Individualized data is thus made available which not only documents the receipt of the primary information by the addressee but also the acknowledgement of the receipt of the primary information by the addressee. A telecommunication arrangement and a telecommunication unit which are suitable for carrying out the method are also disclosed.
    Type: Grant
    Filed: November 30, 2007
    Date of Patent: October 15, 2019
    Assignee: Sigram Schindler Beteiligungsgesellschaft MbH
    Inventors: Sigram Schindler, Juergen Schulze
  • Patent number: 10403273
    Abstract: The present teaching relates to facilitating a guided dialog with a user. In one example, an input utterance is obtained from the user. One or more task sets are estimated based on the input utterance. Each of the one or more task sets includes a plurality of tasks estimated to be requested by the user via the input utterance and is associated with a confidence score computed based on statistics with respect to the plurality of tasks in the task set. At least one of the one or more task sets is selected based on their respective confidence scores. A response is generated based on the tasks in the selected at least one task set. The response is provided to the user.
    Type: Grant
    Filed: September 9, 2016
    Date of Patent: September 3, 2019
    Assignee: Oath Inc.
    Inventors: Sungjin Lee, Amanda Stent
  • Patent number: 10402500
    Abstract: Provided are a method and electronic device for voice translation. The electronic device includes a voice receiver configured to receive a voice signal; a processor configured to divide the voice signal into a plurality of voice segments, determine an input language and a speaker that correspond to each of the plurality of voice segments, determine a translation direction based on the input language and the speaker of the voice segments, and translate the voice segments according to the translation direction to generate a translation result; and an output device configured to output the translation result.
    Type: Grant
    Filed: February 8, 2017
    Date of Patent: September 3, 2019
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Marcin Chochowski, Pawel Przybysz, Elzbieta Gajewska-Dendek
  • Patent number: 10276150
    Abstract: A correction system of the embodiment includes an interface system, a calculator, a generator, and a display controller. The interface system receives correction information for correcting a voice recognition result. The calculator estimates a part of the voice recognition result to be corrected and calculates a degree of association between the part to be corrected and the correction information. The generator generates corrected display information comprising at least one of the correction information and the part to be corrected using a display format corresponding to the degree of association. The display controller outputs the corrected display information on a display.
    Type: Grant
    Filed: February 23, 2017
    Date of Patent: April 30, 2019
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Kosei Fume, Taira Ashikawa, Masayuki Ashikawa, Hiroshi Fujimura
  • Patent number: 10049657
    Abstract: Phoneme boundaries may be determined from a signal corresponding to recorded audio by extracting auditory attention features from the signal and extracting phoneme posteriors from the signal. The auditory attention features and phoneme posteriors may then be combined to detect boundaries in the signal.
    Type: Grant
    Filed: May 26, 2017
    Date of Patent: August 14, 2018
    Assignee: SONY INTERACTIVE ENTERTAINMENT INC.
    Inventor: Ozlem Kalinli-Akbacak
  • Patent number: 10002259
    Abstract: An always-listening-capable computing device includes a receiver for input from a user, a module for communication with a remote server, and a gate-keeping module that, when enabled, prevents the communication module from transmitting data external to the device. After determining that user input includes a first wake up phrase, the device processor automatically transmits a representation of user input subsequent to the phrase, activates an always-receiving mode to transmit a stream of user input captured subsequent to the phrase, deactivates the always-receiving mode to prevent transmission of user input received subsequent to the phrase, unless also preceded by a second wake up phrase, or enables the gate-keeping module to prevent transmission of data external to the device.
    Type: Grant
    Filed: November 14, 2017
    Date of Patent: June 19, 2018
    Inventor: Xiao Ming Mai
  • Patent number: 9972308
    Abstract: Methods, a system, and a classifier are provided. A method includes preparing, by a processor, pairs for an information retrieval task. Each pair includes (i) a training-stage speech recognition result for a respective sequence of training words and (ii) an answer label corresponding to the training-stage speech recognition result. The method further includes obtaining, by the processor, a respective rank for the answer label included in each pair to obtain a set of ranks. The method also includes determining, by the processor, for each pair, an end of question part in the training-stage speech recognition result based on the set of ranks. The method additionally includes building, by the processor, the classifier such that the classifier receives a recognition-stage speech recognition result and returns a corresponding end of question part for the recognition-stage speech recognition result, based on the end of question part determined for the pairs.
    Type: Grant
    Filed: November 8, 2016
    Date of Patent: May 15, 2018
    Assignee: International Business Machines Corporation
    Inventors: Tohru Nagano, Ryuki Tachibana
  • Patent number: 9954507
    Abstract: Various aspects of this disclosure describe setting an audio compressor threshold using averaged audio measurements. Examples include calculating one or more average values of amplitude values of an audio file, and setting a threshold used in the audio compressor based on the calculated thresholds. Samples of the audio file with amplitude values above the threshold are attenuated, while samples of the audio file with amplitude values below the threshold are not attenuated. The threshold can be set equal to a calculated average value, or from a function of one or more calculated average values. Different audio channels comprising the audio file can be processed to set a respective compressor threshold for each audio channel.
    Type: Grant
    Filed: August 1, 2016
    Date of Patent: April 24, 2018
    Assignee: ADOBE SYSTEMS INCORPORATED
    Inventor: Matthew Gehring Stegner
  • Patent number: 9947321
    Abstract: Methods and systems for handling speech recognition processing in effectively real-time, via the Internet, in order that users do not experience noticeable delays from the start until they receive responsive feedback. A user uses a client to access the Internet and a server supporting speech recognition processing. The user inputs speech to the client, which transmits the user speech to the server in approximate real-time. The server evaluates the user speech, and provides responsive feedback to the client, again, in approximate real-time, with minimum latency delays. The client upon receiving responsive feedback from the server, displays, or otherwise provides, the feedback to the user.
    Type: Grant
    Filed: February 21, 2017
    Date of Patent: April 17, 2018
    Assignee: PEARSON EDUCATION, INC.
    Inventor: Christopher S. Jochumson
  • Patent number: 9922640
    Abstract: The disclosure describe a system and method for detecting one or more segments of desired speech utterances from an audio stream using timings of events from other modes that are correlated to the timings of the desired segments of speech. The redundant information from other modes results in a highly accurate and robust utterance detection.
    Type: Grant
    Filed: February 3, 2014
    Date of Patent: March 20, 2018
    Inventor: Ashwin P Rao
  • Patent number: 9886943
    Abstract: A method and system for improving the accuracy of a speech recognition system using word confidence score (WCS) processing is introduced. Parameters in a decoder are selected to minimize a weighted total error rate, such that deletion errors are weighted more heavily than substitution and insertion errors. The occurrence distribution in WCS is different depending on whether the word was correctly identified and based on the type of error. This is used to determine thresholds in WCS for insertion and substitution errors. By processing the hypothetical word (HYP) (output of the decoder), a mHYP (modified HYP) is determined. In some circumstances, depending on the WCS's value in relation to insertion and substitution threshold values, mHYP is set equal to: null, a substituted HYP, or HYP.
    Type: Grant
    Filed: January 13, 2017
    Date of Patent: February 6, 2018
    Assignee: Adadel Inc.
    Inventor: Chang-Qing Shu
  • Patent number: 9805714
    Abstract: The disclosure is directed to a directional keyword verification method and an electronic device using the same method. According to an exemplary embodiment, the proposed keyword verification method would include receiving an audio stream; analyzing the audio stream to obtain at least a word; determining whether the word matches a key word from a keyword database; assigning the word as a filler if the word does not match the keyword from the keyword database; determining whether a vowel pattern of the word matches the vowel pattern of the keyword if the word matches the key word from the keyword database; assigning the first word as a trigger or command word if the vowel pattern of the word matches the vowel pattern of the key word; and otherwise assigning the word as a filler if the vowel pattern of the word does not match the vowel pattern of the key word.
    Type: Grant
    Filed: March 22, 2016
    Date of Patent: October 31, 2017
    Assignee: ASUSTeK COMPUTER INC.
    Inventors: Bhoomek D. Pandya, Hsing-Yu Tsai, Min-Hong Wang, Cheng-Chung Hsu
  • Patent number: 9806740
    Abstract: A device for data compression includes a processing unit, a temporary memory, and a storage device. The temporary memory is used to temporarily store data to be compressed. The storage device includes multiple physical blocks. Each physical block has a same volume size. The processing unit compresses the to-be-compressed data, generates compressed data, and stores the compressed data into one of the physical blocks. The processing unit compares a data size of the compressed data and a volume size of one physical block, and when the data size of the compressed data is smaller than the volume size, the processing unit stores remnant data into the same physical block as the compressed data stored in, wherein the total data size of the remnant data plus the compressed data is equal to the volume size of the physical block both are stored in.
    Type: Grant
    Filed: July 19, 2017
    Date of Patent: October 31, 2017
    Assignee: ACCELSTOR, INC.
    Inventors: An-Nan Chang, Shih-Chiang Tsao, Pao-Chien Li, Chih-Kang Nung
  • Patent number: 9711133
    Abstract: A desired character train included in a predefined reference character train, such as lyrics, is set as a target character train, and a user designates a target phoneme train that is indirectly representative of the target character train by use of a limited plurality of kinds of particular phonemes, such as vowels and a particular consonants. A reference phoneme train indirectly representative of the reference character train by use of the particular phonemes is prepared in advance. Based on a comparison between the target phoneme train and the reference phoneme train, a sequence of the particular phonemes in the reference phoneme train that matches the target phoneme train is identified, and a character sequence in the reference character train that corresponds to the identified sequence of the particular phonemes is identified. The thus-identified character sequence estimates the target character train.
    Type: Grant
    Filed: July 29, 2015
    Date of Patent: July 18, 2017
    Assignee: YAMAHA CORPORATION
    Inventor: Kazuhiko Yamamoto
  • Patent number: 9646603
    Abstract: A method, apparatus, and system are described for a continuous speech recognition engine that includes a fine speech recognizer model, a coarse sound representation generator, and a coarse match generator. The fine speech recognizer model receives a time coded sequence of sound feature frames, applies a speech recognition process to the sound feature frames and determines at least a best guess at each recognizable word that corresponds to the sound feature frames. The coarse sound representation generator generates a coarse sound representation of the recognized word. The coarse match generator determines a likelihood of the coarse sound representation actually being the recognized word based on comparing the coarse sound representation of the recognized word to a database containing the known sound of that recognized word and assigns the likelihood as a robust confidence level parameter to that recognized word.
    Type: Grant
    Filed: February 27, 2009
    Date of Patent: May 9, 2017
    Assignee: LONGSAND LIMITED
    Inventor: Mahapathy Kadirkamanathan
  • Patent number: 9583094
    Abstract: A method and system for improving the accuracy of a speech recognition system using were confidence score (WCS) processing is introduced. Parameters in a decoder are selected to minimize a weighted total error rate, such that deletion errors are weighted more heavily than substitution and insertion errors. The occurrence distribution in WCS is different depending on whether the word was correctly identified and based on the type of error. This is used to determine thresholds in WCS for insertion and substitution errors. By processing the hypothetical word (HYP) (output of the decoder), a mHYP (modified HYP) is determined. In some circumstances, depending on the WCS's value in relation to insertion and substitution threshold values, mHYP is set equal to: null, a substituted HYP, or HYP.
    Type: Grant
    Filed: September 22, 2016
    Date of Patent: February 28, 2017
    Assignee: ADACEL, INC.
    Inventor: Chang-Qing Shu
  • Patent number: 9542945
    Abstract: Methods, systems, and apparatuses, including computer programs encoded on a computer storage medium, for adjusting language models. In one aspect, a method includes accessing audio data. Information that indicates a first context is accessed, the first context being associated with the audio data. At least one term is accessed. Information that indicates a second context is accessed, the second context being associated with the term. A similarity score is determined that indicates a degree of similarity between the second context and the first context. A language model is adjusted based on the accessed term and the determined similarity score to generate an adjusted language model. Speech recognition is performed on the audio data using the adjusted language model to select one or more candidate transcriptions for a portion of the audio data.
    Type: Grant
    Filed: June 10, 2015
    Date of Patent: January 10, 2017
    Assignee: Google Inc.
    Inventor: Matthew I. Lloyd
  • Patent number: 9489371
    Abstract: A method for detecting data in a sequence of characters or text using both a statistical engine and a pattern engine. The statistical engine is trained to recognize certain types of data and the pattern engine is programmed to recognize the grammatical pattern of certain types of data. The statistical engine may scan the sequence of characters to output first data, and the pattern engine may break down the first data into subsets of data. Alternatively, the statistical engine may output items that have a predetermined probability or greater of being a certain type of data and the pattern engine may then detect the data from the output items and/or remove incorrect information from the output items.
    Type: Grant
    Filed: July 12, 2013
    Date of Patent: November 8, 2016
    Assignee: Apple Inc.
    Inventors: Olivier Bonnet, Frederick de Jaeger, Romain Goyet, Jean-Pierre Ciudad
  • Patent number: 9478218
    Abstract: A method and system for improving the accuracy of a speech recognition system using word confidence score (WCS) processing is introduced. Parameters in a decoder are selected to minimize a weighted total error rate, such that deletion errors are weighted more heavily than substitution and insertion errors. The occurrence distribution in WCS is different depending on whether the word was correctly identified and based on the type of error. This is used to determine thresholds in WCS for insertion and substitution errors. By processing the hypothetical word (HYP) (output of the decoder), a mHYP (modified HYP) is determined. In some circumstances, depending on the WCS's value in relation to insertion and substitution threshold values, mHYP is set equal to: null, a substituted HYP, or HYP.
    Type: Grant
    Filed: October 24, 2008
    Date of Patent: October 25, 2016
    Assignee: Adacel, Inc.
    Inventor: Chang-Qing Shu
  • Patent number: 9401145
    Abstract: A method for converting speech to text in a speech analytics system is provided. The method includes receiving audio data containing speech made up of sounds from an audio source, processing the sounds with a phonetic module resulting in symbols corresponding to the sounds, and processing the symbols with a language module and occurrence table resulting in text. The method also includes determining a probability of correct translation for each word in the text, comparing the probability of correct translation for each word in the text to the occurrence table, and adjusting the occurrence table based on the probability of correct translation for each word in the text.
    Type: Grant
    Filed: May 5, 2014
    Date of Patent: July 26, 2016
    Assignee: VERINT SYSTEMS LTD.
    Inventors: Omer Ziv, Ran Achituv, Ido Shapira
  • Patent number: 9378728
    Abstract: Methods and systems are provided for gathering research data that includes information pertaining to audio signals received on a portable device, such as a cell phone. Frequency domain data is received or produced, a signature is extracted from the frequency domain data and an ancillary code is read from the frequency domain data.
    Type: Grant
    Filed: April 25, 2014
    Date of Patent: June 28, 2016
    Assignee: The Nielsen Company (US), LLC
    Inventor: Alan R. Neuhauser
  • Patent number: 9330676
    Abstract: A method to filter out speech interference is provided. The method includes defining a time threshold by using a probability distribution model. When a current instruction from a speech input is recognized, a reference instruction recognized from the speech input is obtained. The current instruction is recognized right after the recognition of the reference instruction, wherein the reference instruction and the current instruction correspond to a first time point and a second time point respectively. The method includes determining whether speech interference occurs according to a comparison result of the time threshold and an interval between the first time point and the second time point as well as a state corresponding to the first time point. The method includes filtering out the reference instruction and the current instruction if the speech interference occurs, and outputting the reference instruction or the current instruction if the speech interference does not occur.
    Type: Grant
    Filed: October 18, 2013
    Date of Patent: May 3, 2016
    Assignee: Wistron Corporation
    Inventor: Hsi-Chun Hsiao
  • Patent number: 9324316
    Abstract: There is provided a prosody generator that generates prosody information for implementing highly natural speech synthesis without unnecessarily collecting large quantities of learning data. A data dividing means 81 divides into subspaces the data space of a learning database as an assembly of learning data indicative of the feature quantities of speech waveforms. A density information extracting means 82 extracts density information indicative of the density state in terms of information quantity of the learning data in each of the subspaces divided by the data dividing means 81. A prosody information generating method selecting means 83 selects either a first method or a second method as a prosody information generating method based on the density information, the first method involving generating the prosody information using a statistical technique, the second method involving generating the prosody information using rules based on heuristics.
    Type: Grant
    Filed: May 10, 2012
    Date of Patent: April 26, 2016
    Assignee: NEC CORPORATION
    Inventors: Yasuyuki Mitsui, Reishi Kondo, Masanori Kato
  • Patent number: 9318103
    Abstract: An automatic speech recognition system for recognizing a user voice command in noisy environment, including: matching means for matching elements retrieved from speech units forming said command with templates in a template library; characterized by processing means including a MultiLayer Perceptron for computing posterior templates (P(Otemplate(q))) stored as said templates in said template library; means for retrieving posterior vectors (P(Otest(q))) from said speech units, said posterior vectors being used as said elements. The present invention relates also to a method for recognizing a user voice command in noisy environments.
    Type: Grant
    Filed: February 21, 2013
    Date of Patent: April 19, 2016
    Assignee: VEOVOX SA
    Inventors: John Dines, Jorge Carmona, Olivier Masson, Guillermo Aradilla
  • Patent number: 9263061
    Abstract: Methods and systems are provided for detecting chop in an audio signal. A time-frequency representation, such as a spectrogram, is created for an audio signal and used to calculate a gradient of mean power per frame of the audio signal. Positive and negative gradients are defined for the signal based on the gradient of mean power, and a maximum overlap offset between the positive and negative gradients is determined by calculating a value that maximizes the cross-correlation of the positive and negative gradients. The negative gradient values may be combined (e.g., summed) with the overlap offset, and the combined values then compared with a threshold to estimate the amount of chop present in the audio signal. The chop detection model provided is low-complexity and is applicable to narrowband, wideband, and superwideband speech.
    Type: Grant
    Filed: May 21, 2013
    Date of Patent: February 16, 2016
    Assignee: GOOGLE INC.
    Inventors: Andrew J. Hines, Jan Skoglund, Naomi Harte, Anil Kokaram
  • Patent number: 9117460
    Abstract: The present invention relates to speech recognition systems, especially to arranging detection of end-of utterance in such systems. A speech recognizer of the system is configured to determine whether recognition result determined from received speech data is stabilized. The speech recognizer is configured to process values of best state scores and best token scores associated with frames of received speech data for end of utterance detection purposes. Further, the speech recognizer is configured to determine whether end of utterance is detected or not, based on the processing, if the recognition result is stabilized.
    Type: Grant
    Filed: May 12, 2004
    Date of Patent: August 25, 2015
    Assignee: Core Wireless Licensing S.A.R.L.
    Inventor: Tommi Lahti
  • Patent number: 9099098
    Abstract: In speech processing systems, compensation is made for sudden changes in the background noise in the average signal-to-noise ratio (SNR) calculation. SNR outlier filtering may be used, alone or in conjunction with weighting the average SNR. Adaptive weights may be applied on the SNRs per band before computing the average SNR. The weighting function can be a function of noise level, noise type, and/or instantaneous SNR value. Another weighting mechanism applies a null filtering or outlier filtering which sets the weight in a particular band to be zero. This particular band may be characterized as the one that exhibits an SNR that is several times higher than the SNRs in other bands.
    Type: Grant
    Filed: November 6, 2012
    Date of Patent: August 4, 2015
    Assignee: QUALCOMM Incorporated
    Inventors: Venkatraman Srinivasa Atti, Venkatesh Krishnan
  • Patent number: 9076453
    Abstract: The present invention relates to a postfilter and a postfilter control to be associated with a postfilter for improving perceived quality of speech reconstructed at a speech decoder. The postfilter control comprises means for measuring stationarity of a speech signal reconstructed at a decoder, means for determining a coefficient to a postfilter control parameter based on the measured stationarity, and means for transmitting the determined coefficient to a postfilter, such that the postfilter can process the reconstructed speech signal by applying the determined coefficient to the postfilter control parameter to obtain an enhanced speech signal.
    Type: Grant
    Filed: May 15, 2014
    Date of Patent: July 7, 2015
    Assignee: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)
    Inventor: Volodya Grancharov
  • Patent number: 9076445
    Abstract: Methods, systems, and apparatuses, including computer programs encoded on a computer storage medium, for adjusting language models. In one aspect, a method includes accessing audio data. Information that indicates a first context is accessed, the first context being associated with the audio data. At least one term is accessed. Information that indicates a second context is accessed, the second context being associated with the term. A similarity score is determined that indicates a degree of similarity between the second context and the first context. A language model is adjusted based on the accessed term and the determined similarity score to generate an adjusted language model. Speech recognition is performed on the audio data using the adjusted language model to select one or more candidate transcriptions for a portion of the audio data.
    Type: Grant
    Filed: December 5, 2012
    Date of Patent: July 7, 2015
    Assignee: Google Inc.
    Inventor: Matthew I. Lloyd
  • Patent number: 9026065
    Abstract: Methods and apparatus for voice and data interlacing in a system having a shared antenna. In one embodiment, a voice and data communication system has a shared antenna for transmitting and receiving information in time slots, wherein the antenna can only be used for transmit or receive at a given time. The system determines timing requirements for data transmission and reception and interrupts data transmission for transmission of speech in selected intervals while meeting the data transmission timing and throughput requirements. The speech can be manipulated to fit with the selected intervals, to preserve the intelligibility of the manipulated speech.
    Type: Grant
    Filed: March 21, 2012
    Date of Patent: May 5, 2015
    Assignee: Raytheon Company
    Inventors: David R. Peterson, Timothy S. Loos, David F. Ring, James F. Keating
  • Patent number: 9009048
    Abstract: A speech recognition method, medium, and system. The method includes detecting an energy change of each frame making up signals including speech and non-speech signals, and identifying a speech segment corresponding to frames that include only speech signals from among the frames based on the detected energy change.
    Type: Grant
    Filed: August 1, 2007
    Date of Patent: April 14, 2015
    Assignees: Samsung Electronics Co., Ltd., Apple Inc.
    Inventors: Giljin Jang, Jeongsu Kim, John S. Bridle, Melvyn J. Hunt
  • Publication number: 20150100316
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for advanced turn-taking in an interactive spoken dialog system. A system configured according to this disclosure can incrementally process speech prior to completion of the speech utterance, and can communicate partial speech recognition results upon finding particular conditions. A first condition which, if found, allows the system to communicate partial speech recognition results, is that the most recent word found in the partial results is statistically likely to be the termination of the utterance, also known as a terminal node. A second condition is the determination that all search paths within a speech lattice converge to a common node, also known as a pinch node, before branching out again. Upon finding either condition, the system can communicate the partial speech recognition results. Stability and correctness probabilities can also determine which partial results are communicated.
    Type: Application
    Filed: December 10, 2014
    Publication date: April 9, 2015
    Inventors: Jason D. Williams, Ethan SELFRIDGE
  • Patent number: 9002709
    Abstract: Provided is a voice recognition system capable of, while suppressing negative influences from sound not to be recognized, correctly estimating utterance sections that are to be recognized. A voice segmenting means calculates voice feature values, and segments voice sections or non-voice sections by comparing the voice feature values with a threshold value. Then, the voice segmenting means determines, to be first voice sections, those segmented sections or sections obtained by adding a margin to the front and rear of each of those segmented sections. On the basis of voice and non-voice likelihoods, a search means determines, to be second voice sections, sections to which voice recognition is to be applied. A parameter updating means updates the threshold value and the margin. The voice segmenting means determines the first voice sections by using the one of the threshold value and the margin which has been updated by the parameter updating means.
    Type: Grant
    Filed: November 26, 2010
    Date of Patent: April 7, 2015
    Assignee: NEC Corporation
    Inventor: Takayuki Arakawa
  • Patent number: 8972260
    Abstract: In accordance with one embodiment, a method of generating language models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a low-frequency plurality of utterances from the plurality of utterances having a frequency that is below the predetermined frequency threshold, generating a grammar-based language model using the high-frequency plurality of utterances as training data, and generating a statistical language model using the low-frequency plurality of utterances as training data.
    Type: Grant
    Filed: April 19, 2012
    Date of Patent: March 3, 2015
    Assignee: Robert Bosch GmbH
    Inventors: Fuliang Weng, Zhe Feng, Kui Xu, Lin Zhao
  • Patent number: 8972840
    Abstract: Methods and apparatuses in which two or more types of attributes from an information stream are identified. Each of the identified attributes from the information stream is encoded. A time ordered indication is assigned with each of the identified attributes. Each of the identified attributes shares a common time reference measurement. A time ordered index of the identified attributes is generated.
    Type: Grant
    Filed: April 12, 2007
    Date of Patent: March 3, 2015
    Assignee: Longsand Limited
    Inventors: D. Matthew Karas, William J. Muldrew
  • Publication number: 20150051911
    Abstract: Disclosed is a method for dividing pronunciation units which includes the steps of: extracting voice-intensity maxima and minima in voice waveforms of letter sequences; forming a group by grouping the extracted maxima together; dividing the letter sequences into pronunciation units around the points nearest to either side of the group from among minima on both sides of the group, voice start points, and voice end points.
    Type: Application
    Filed: April 3, 2013
    Publication date: February 19, 2015
    Inventor: Byoung Ki CHOI
  • Patent number: 8954332
    Abstract: A computer-implemented system and method for masking special data is provided. Speakers of a call recording are identified. The call recording is separated into strands corresponding to each of the speakers. A prompt list of elements that prompt the speaker of the other strand to utter special information is applied to one of the strands. At least one of the elements of the prompt list is identified in the one strand. A special information candidate is identified in the other strand and is located after a location in time where the element was found in the voice recording of the one strand. A confidence score is assigned to the element located in the one strand and to the special information candidate in the other strand. The confidence scores are combined and a threshold is applied. The special information candidate is rendered unintelligible when the combined confidence scores satisfy the threshold.
    Type: Grant
    Filed: November 4, 2013
    Date of Patent: February 10, 2015
    Assignee: Intellisist, Inc.
    Inventors: Howard M. Lee, Steven Lutz, Gilad Odinak
  • Patent number: 8914288
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for advanced turn-taking in an interactive spoken dialog system. A system configured according to this disclosure can incrementally process speech prior to completion of the speech utterance, and can communicate partial speech recognition results upon finding particular conditions. A first condition which, if found, allows the system to communicate partial speech recognition results, is that the most recent word found in the partial results is statistically likely to be the termination of the utterance, also known as a terminal node. A second condition is the determination that all search paths within a speech lattice converge to a common node, also known as a pinch node, before branching out again. Upon finding either condition, the system can communicate the partial speech recognition results. Stability and correctness probabilities can also determine which partial results are communicated.
    Type: Grant
    Filed: September 1, 2011
    Date of Patent: December 16, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Jason Williams, Ethan Selfridge
  • Patent number: 8909538
    Abstract: Improved methods of presenting speech prompts to a user as part of an automated system that employs speech recognition or other voice input are described. The invention improves the user interface by providing in combination with at least one user prompt seeking a voice response, an enhanced user keyword prompt intended to facilitate the user selecting a keyword to speak in response to the user prompt. The enhanced keyword prompts may be the same words as those a user can speak as a reply to the user prompt but presented using a different audio presentation method, e.g., speech rate, audio level, or speaker voice, than used for the user prompt. In some cases, the user keyword prompts are different words from the expected user response keywords, or portions of words, e.g., truncated versions of keywords.
    Type: Grant
    Filed: November 11, 2013
    Date of Patent: December 9, 2014
    Assignee: Verizon Patent and Licensing Inc.
    Inventor: James Mark Kondziela
  • Patent number: 8838452
    Abstract: A method (400) and system (200) for classifying a audio signal are described. The method (400) operates by first receiving a sequence of audio frame feature data, each of the frame feature data characterising an audio frame along the audio segment. In response to receipt of each of the audio frame feature data, statistical data characterising the audio segment is updated with the received frame feature data. The received frame feature data is then discarded. A preliminary classification for the audio segment may be determined from the statistical data. Upon receipt of a notification of an end boundary of the audio segment, the audio segment is classified (410) based on the statistical data.
    Type: Grant
    Filed: June 6, 2005
    Date of Patent: September 16, 2014
    Assignee: Canon Kabushiki Kaisha
    Inventors: Reuben Kan, Dmitri Katchalov, Muhammad Majid, George Politis, Timothy John Wark
  • Patent number: 8831946
    Abstract: A method and system of indexing speech data. The method includes indexing word transcripts including a timestamp for a word occurrence; and indexing sub-word transcripts including a timestamp for a sub-word occurrence. A timestamp in the index indicates the time and duration of occurrence of the word or sub-word in the speech data, and word and sub-word occurrences can be correlated using the timestamps. A method of searching speech transcripts is also provided in which a search query in the form of a phrase to be searched includes at least one in-vocabulary word and at least one out-of-vocabulary word.
    Type: Grant
    Filed: July 23, 2007
    Date of Patent: September 9, 2014
    Assignee: Nuance Communications, Inc.
    Inventor: Jonathan Joseph Mamou
  • Patent number: 8825489
    Abstract: Provided in some embodiments is a computer implemented method that includes providing script data including script words indicative of dialog words to be spoken, providing audio data corresponding to at least a portion of the dialog words to be spoken, wherein the audio data includes timecodes associated with dialog words, generating a sequential alignment of the script words to the dialog words, matching at least some of the script words to corresponding dialog words to determine alignment points, determining corresponding timecodes for unmatched script words using interpolation based on the timecodes associated with matching script words, and generating time-aligned script data including the script words and their corresponding time codes.
    Type: Grant
    Filed: May 28, 2010
    Date of Patent: September 2, 2014
    Assignee: Adobe Systems Incorporated
    Inventors: Jerry R. Scoggins, II, Walter W. Chang
  • Patent number: 8825488
    Abstract: A method includes receiving script data including script words for dialogue, receiving audio data corresponding to at least a portion of the dialogue, wherein the audio data includes timecodes associated with dialogue words, generating a sequential alignment of the script words to the dialogue words, matching at least some of the script words to corresponding dialogue words to determine hard alignment points, partitioning the sequential alignment of script words into alignment sub-sets, wherein the bounds of the alignment sub-subsets are defined by adjacent hard-alignment points, and wherein the alignment subsets includes a sub-set of the script words and a corresponding sub-set of dialogue words that occur between the hard-alignment points, determining corresponding timecodes for a sub-set of script words in a sub-subset based on the timecodes associated with the sub-set of dialogue words, and generating time-aligned script data including the sub-set of script words and their corresponding timecodes.
    Type: Grant
    Filed: May 28, 2010
    Date of Patent: September 2, 2014
    Assignee: Adobe Systems Incorporated
    Inventors: Jerry R. Scoggins, II, Walter W. Chang, David A. Kuspa, Charles E. Van Winkle, Simon R. Hayhurst
  • Patent number: 8818802
    Abstract: A method for real-time data-pattern analysis. The method includes receiving and queuing at least one data-pattern analysis request by a data-pattern analysis unit controller. At least one data stream portion is also received and stored by the data-pattern analysis unit controller, each data stream portion corresponding to a received data-pattern analysis request. Next, a received data-pattern analysis request is selected by the data-pattern analysis unit controller along with a corresponding data stream portion. A data-pattern analysis is performed based on the selected data-pattern analysis request and the corresponding data stream portion, wherein the data-pattern analysis is performed by one of a plurality of data-pattern analysis units.
    Type: Grant
    Filed: August 9, 2010
    Date of Patent: August 26, 2014
    Assignee: Spansion LLC
    Inventors: Richard Fastow, Qamrul Hasan
  • Patent number: 8818811
    Abstract: This application relates to a voice activity detection (VAD) apparatus configured to provide a voice activity detection decision for an input audio signal. The VAD apparatus includes a state detector and a voice activity calculator. The state detector is configured to determine, based on the input audio signal, a current working state of the VAD apparatus among at least two different working states. Each of the at least two different working states is associated with a corresponding working state parameter decision set which includes at least one voice activity decision parameter. The voice activity calculator is configured to calculate a voice activity detection parameter value for the at least one voice activity decision parameter of the working state parameter decision set associated with the current working state, and to provide the voice activity detection decision by comparing the calculated voice activity detection parameter value with a threshold.
    Type: Grant
    Filed: June 24, 2013
    Date of Patent: August 26, 2014
    Assignee: Huawei Technologies Co., Ltd
    Inventor: Zhe Wang
  • Patent number: 8805677
    Abstract: Creating and processing a natural language grammar set of data based on an input text string are disclosed. The method may include tagging the input text string, and examining, via a processor, the input text string for at least one first set of substitutions based on content of the input text string. The method may also include determining whether the input text string is a substring of a previously tagged input text string by comparing the input text string to a previously tagged input text string, such that the substring determination operation determines whether the input text string is wholly included in the previously tagged input text string.
    Type: Grant
    Filed: February 4, 2014
    Date of Patent: August 12, 2014
    Assignee: West Corporation
    Inventor: Steven John Schanbacher