Clustering Patents (Class 704/245)
  • Patent number: 9672814
    Abstract: Software that trains an artificial neural network for generating vector representations for natural language text, by performing the following steps: (i) receiving, by one or more processors, a set of natural language text; (ii) generating, by one or more processors, a set of first metadata for the set of natural language text, where the first metadata is generated using supervised learning method(s); (iii) generating, by one or more processors, a set of second metadata for the set of natural language text, where the second metadata is generated using unsupervised learning method(s); and (iv) training, by one or more processors, an artificial neural network adapted to generate vector representations for natural language text, where the training is based, at least in part, on the received natural language text, the generated set of first metadata, and the generated set of second metadata.
    Type: Grant
    Filed: May 8, 2015
    Date of Patent: June 6, 2017
    Assignee: International Business Machines Corporation
    Inventors: Liangliang Cao, James J. Fan, Chang Wang, Bing Xiang, Bowen Zhou
  • Patent number: 9666192
    Abstract: Methods and apparatus for reducing latency in speech recognition applications. The method comprises receive first audio comprising speech from a user of a computing device, detecting an end of speech in the first audio, generating an ASR result based, at least in part, on a portion of the first audio prior to the detected end of speech, determining whether a valid action can be performed by a speech-enabled application installed on the computing device using the ASR result, and processing second audio when it is determined that a valid action cannot be performed by the speech-enabled application using the ASR result.
    Type: Grant
    Filed: May 26, 2015
    Date of Patent: May 30, 2017
    Assignee: Nuance Communications, Inc.
    Inventor: Mark Fanty
  • Patent number: 9659560
    Abstract: Software that trains an artificial neural network for generating vector representations for natural language text, by performing the following steps: (i) receiving, by one or more processors, a set of natural language text; (ii) generating, by one or more processors, a set of first metadata for the set of natural language text, where the first metadata is generated using supervised learning method(s); (iii) generating, by one or more processors, a set of second metadata for the set of natural language text, where the second metadata is generated using unsupervised learning method(s); and (iv) training, by one or more processors, an artificial neural network adapted to generate vector representations for natural language text, where the training is based, at least in part, on the received natural language text, the generated set of first metadata, and the generated set of second metadata.
    Type: Grant
    Filed: September 30, 2015
    Date of Patent: May 23, 2017
    Assignee: International Business Machines Corporation
    Inventors: Liangliang Cao, James J. Fan, Chang Wang, Bing Xiang, Bowen Zhou
  • Patent number: 9641968
    Abstract: A system for sharing moment experiences is described. A system receives moment data from an input to a mobile device. The system receives geographic location information, time information, and contextual information that is local to the mobile device. The system creates a message about the moment data based on the geographic location information, the time information, and the contextual information. The system outputs the moment data with the message.
    Type: Grant
    Filed: May 15, 2015
    Date of Patent: May 2, 2017
    Assignee: Krumbs, Inc.
    Inventors: Neilesh Jain, Ramesh Jain, Pinaki Sinha
  • Patent number: 9620148
    Abstract: Systems, vehicles, and methods for limiting speech-based access to an audio metadata database are described herein. Audio metadata databases described herein include a plurality of audio metadata entries. Each audio metadata entry includes metadata information associated with at least one audio file. Embodiments described herein determine when a size of the audio metadata database reaches a threshold size, and limit which of the plurality of audio metadata entries may be accessed in response to the speech input signal when the size of the audio metadata database reaches the threshold size.
    Type: Grant
    Filed: July 1, 2013
    Date of Patent: April 11, 2017
    Assignee: Toyota Motor Engineering & Manufacturing North America, Inc.
    Inventor: Eric Randell Schmidt
  • Patent number: 9595260
    Abstract: A modeling device comprises a front end which receives enrollment speech data from each target speaker, a reference anchor set generation unit which generates a reference anchor set using the enrollment speech data based on an anchor space, and a voice print generation unit which generates voice prints based on the reference anchor set and the enrollment speech data. By taking the enrollment speech and speaker adaptation technique into account, anchor models with a smaller size can be generated, so reliable and robust speaker recognition with a smaller size reference anchor set is possible.
    Type: Grant
    Filed: December 10, 2010
    Date of Patent: March 14, 2017
    Assignee: PANASONIC INTELLECTUAL PROPERTY CORPORATION OF AMERICA
    Inventors: Haifeng Shen, Long Ma, Bingqi Zhang
  • Patent number: 9576582
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
    Type: Grant
    Filed: February 23, 2016
    Date of Patent: February 21, 2017
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 9524291
    Abstract: Techniques involving visual display of information related to matching user utterances against graph patterns are described. In one or more implementations, an utterance of a user is obtained that has been indicated as corresponding to a graph pattern through linguistic analysis. The utterance is displayed in a user interface as a representation of the graph pattern.
    Type: Grant
    Filed: October 6, 2010
    Date of Patent: December 20, 2016
    Assignee: Virtuoz SA
    Inventors: Dan Teodosiu, Elizabeth Ireland Powers, Pierre Serge Vincent LeRoy, Sebastien Jean-Marie Christian Saunier
  • Patent number: 9514391
    Abstract: In an image classification method, a feature vector representing an input image is generated by unsupervised operations including extracting local descriptors from patches distributed over the input image, and a classification value for the input image is generated by applying a neural network (NN) to the feature vector. Extracting the feature vector may include encoding the local descriptors extracted from each patch using a generative model, such as Fisher vector encoding, aggregating the encoded local descriptors to form a vector, projecting the vector into a space of lower dimensionality, for example using Principal Component Analysis (PCA), and normalizing the feature vector of lower dimensionality to produce the feature vector representing the input image. A set of mid-level features representing the input image may be generated as the output of an intermediate layer of the NN.
    Type: Grant
    Filed: April 20, 2015
    Date of Patent: December 6, 2016
    Assignee: XEROX CORPORATION
    Inventors: Florent C. Perronnin, Diane Larlus-Larrondo
  • Patent number: 9449051
    Abstract: According to one embodiment, a topic extracting apparatus extracts each term from a target document set, and calculates an appearance frequency of each term and a document frequency that each term appears. The topic extracting apparatus acquires a document set of appearance documents with respect to each extracted term, calculates a topic degree, extracts each term whose topic degree is not lower than a predetermined value as a topic word, and calculates freshness of the extracted topic word based on an appearance date and time. The topic extracting apparatus presents the extracted topic words in order of the freshness and also presents the number of appearance documents of each presented topic word per unit span.
    Type: Grant
    Filed: September 10, 2013
    Date of Patent: September 20, 2016
    Assignees: KABUSHIKI KAISHA TOSHIBA, TOSHIBA SOLUTIONS CORPORATION
    Inventors: Hideki Iwasaki, Kazuyuki Goto, Shigeru Matsumoto, Yasunari Miyabe, Mikito Kobayashi
  • Patent number: 9412381
    Abstract: A triple factor authentication in one step method and system is disclosed. According to one embodiment, an Integrated Voice Biometrics Cloud Security Gateway (IVCS Gateway) intercepts an access request to a resource server from a user using a user device. IVCS Gateway then authenticates the user by placing a call to the user device and sending a challenge message prompting the user to respond by voice. After receiving the voice sample of the user, the voice sample is compared against a stored voice biometrics record for the user. The voice sample is also converted into a text phrase and compared against a stored secret text phrase. In an alternative embodiment, an IVCS Gateway that is capable of making non-binary access decisions and associating multiple levels of access with a single user or group is described.
    Type: Grant
    Filed: March 30, 2011
    Date of Patent: August 9, 2016
    Assignee: ACK3 BIONETICS PRIVATE LTD.
    Inventor: Sajit Bhaskaran
  • Patent number: 9411829
    Abstract: Disclosed herein is a system and method that facilitate searching and/or browsing of images by clustering, or grouping, the images into a set of image clusters using facets, such as without limitation visual properties or visual characteristics, of the images, and representing each image cluster by a representative image selected for the image cluster. A map-reduce based probabilistic topic model may be used to identify one or more images belonging to each image cluster and update model parameters.
    Type: Grant
    Filed: June 10, 2013
    Date of Patent: August 9, 2016
    Assignee: Yahoo! Inc.
    Inventors: Jia Li, Nadav Golbandi, XianXing Zhang
  • Patent number: 9378742
    Abstract: Disclosed are an apparatus for recognizing voice using multiple acoustic models according to the present invention and a method thereof. An apparatus for recognizing voice using multiple acoustic models includes a voice data database (DB) configured to store voice data collected in various noise environments; a model generating means configured to perform classification for each speaker and environment based on the collected voice data, and to generate an acoustic model of a binary tree structure as the classification result; and a voice recognizing means configured to extract feature data of voice data when the voice data is received from a user, to select multiple models from the generated acoustic model based on the extracted feature data, to parallel recognize the voice data based on the selected multiple models, and to output a word string corresponding to the voice data as the recognition result.
    Type: Grant
    Filed: March 18, 2013
    Date of Patent: June 28, 2016
    Assignee: Electronics and Telecommunications Research Institute
    Inventor: Dong Hyun Kim
  • Patent number: 9373338
    Abstract: An automatic speech recognition engine receives an acoustic-echo processed signal from an acoustic-echo processing (AEP) module, where said echo processed signal contains mainly the speech from the near-end talker. The automatic speech recognition engine analyzes the content of the acoustic-echo processed signal to determine whether words or keywords are present. Based upon the results of this analysis, the automatic speech recognition engine produces a value reflecting the likelihood that some words or keywords are detected. Said value is provided to the AEP module. Based upon the value, the AEP module determines if there is double talk and processes the incoming signals accordingly to enhance its performance.
    Type: Grant
    Filed: June 25, 2012
    Date of Patent: June 21, 2016
    Assignee: Amazon Technologies, Inc.
    Inventors: Ramya Gopalan, Kavitha Velusamy, Wai C. Chu, Amit S. Chhetri
  • Patent number: 9336774
    Abstract: Methods, systems, and apparatus, for pattern recognition. One aspect includes a pattern recognizing engine that includes multiple pattern recognizer processors that form a hierarchy of pattern recognizer processors. The pattern recognizer processors include a child pattern recognizer processor at a lower level in the hierarch and a parent pattern recognizer processor at a higher level of the hierarchy, where the child pattern recognizer processor is configured to provide a first complex recognition output signal to a pattern recognizer processor at a higher level than the child pattern recognizer processor, and the parent pattern recognizer processor is configured to receive as an input a second complex recognition output signal from a pattern recognizer processor at a lower level than the parent pattern recognizer processor.
    Type: Grant
    Filed: April 22, 2013
    Date of Patent: May 10, 2016
    Assignee: Google Inc.
    Inventor: Raymond C. Kurzweil
  • Patent number: 9305553
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition. In one aspect, a computer-based method includes receiving a speech corpus at a speech management server system that includes multiple speech recognition engines tuned to different speaker types; using the speech recognition engines to associate the received speech corpus with a selected one of multiple different speaker types; and sending a speaker category identification code that corresponds to the associated speaker type from the speech management server system over a network. The speaker category identification code can be used by any one of speech-interactive applications coupled to the network to select one of an appropriate one of multiple application-accessible speech recognition engines tuned to the different speaker types in response to an indication that a user accessing the application is associated with a particular one of the speaker category identification codes.
    Type: Grant
    Filed: April 28, 2011
    Date of Patent: April 5, 2016
    Inventor: William S. Meisel
  • Patent number: 9305547
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
    Type: Grant
    Filed: April 28, 2015
    Date of Patent: April 5, 2016
    Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 9275044
    Abstract: A method and system are provided for finding synonyms which are more contextually relevant to the intended use of a particular word. The system finds a list of synonyms for the input word and also finds a list of synonyms for an additional word entered by the user to approximate the intended usage of the input word. These two lists of synonyms are compared to find words common to both lists, and the common words are presented to the user as potential synonyms which are appropriate for the intended use.
    Type: Grant
    Filed: March 6, 2013
    Date of Patent: March 1, 2016
    Assignee: SearchLeaf, LLC
    Inventors: Thomas Lund, Bryce Lund
  • Patent number: 9262694
    Abstract: Provided is a technology which enables further improvement of the accuracy of the determination in the pattern matching processing. A dictionary learning device 1 includes a score calculation unit 2 and a learning unit 3. The score calculation unit 2 calculates a matching score representing a similarity-degree between a sample pattern, which is a sample of a pattern which is likely to be subjected to a pattern matching processing, and a degradation pattern resulting from a degrading processing on the sample pattern. The learning unit 3 learns a quality dictionary based on the calculated matching score and the degradation pattern. The quality dictionary is a dictionary which is used in a processing to evaluate a degradation degree (quality) of a matching target pattern of being pattern of an object on which the pattern matching processing is carried out.
    Type: Grant
    Filed: December 12, 2012
    Date of Patent: February 16, 2016
    Assignee: NEC Corporation
    Inventor: Masato Ishii
  • Patent number: 9122931
    Abstract: An object identification method is provided. The method includes dividing an input video into a number of video shots, each containing one or more video frames. The method also includes detecting target-class object occurrences and related-class object occurrences in each video shot. Further, the method includes generating hint information including a small subset of frames representing the input video and performing object tracking and recognition based on the hint information. The method also includes fusing tracking and recognition results and outputting labeled objects based on the combined tracking and recognition results.
    Type: Grant
    Filed: October 25, 2013
    Date of Patent: September 1, 2015
    Assignee: TCL RESEARCH AMERICA INC.
    Inventors: Liang Peng, Haohong Wang
  • Patent number: 9098576
    Abstract: Systems and methods for audio matching are disclosed herein. In one embodiment, a system includes both interest point mixing and fingerprint mixing by using multiple interest point detection methods in parallel. Since multiple interest point detection methods are used in parallel, accuracy of audio matching is improved across a wide variety of audio signals. In addition the scalability of the disclosed audio matching system is increased by matching the fingerprint of an audio sample with a fingerprint of a reference sample versus matching an entire spectrogram. Accordingly, a more accurate and more general solution to audio matching can be accomplished.
    Type: Grant
    Filed: October 17, 2011
    Date of Patent: August 4, 2015
    Assignee: Google Inc.
    Inventors: Matthew Sharifi, Gheorghe Postelnicu, George Tzanetakis, Dominik Roblek
  • Patent number: 9053579
    Abstract: A system and method generate a graph lattice from exemplary images. At least one processor receives exemplary data graphs of the exemplary images and generates graph lattice nodes of size one from primitives. Until a termination condition is met, the at least one processor repeatedly: 1) generates candidate graph lattice nodes from accepted graph lattice nodes; 2) selects one or more candidate graph lattice nodes preferentially discriminating exemplary data graphs which are less discriminable than other exemplary data graphs using the accepted graph lattice nodes; and 3) promotes the selected graph lattice nodes to accepted status. The graph lattice is formed from the accepted graph lattice nodes and relations between the accepted graph lattice nodes.
    Type: Grant
    Filed: June 19, 2012
    Date of Patent: June 9, 2015
    Assignee: Palo Alto Research Center Incorporated
    Inventor: Eric Saund
  • Patent number: 9047286
    Abstract: Content from multiple different stations can be divided into segments based on time. Matched segments associated with each station can be identified by comparing content included in a first segment associated with a first station, to content included in a second segment associated with a second station. Syndicated content can be identified and tagged based, at least in part, on a relationship between sequences of matched segments on different stations. Various embodiments also include identifying main sequences associated with each station under consideration, removing some of the main sequences, and consolidating remaining main sequences based on various threshold criteria.
    Type: Grant
    Filed: December 17, 2009
    Date of Patent: June 2, 2015
    Assignee: iHeartMedia Management Services, Inc.
    Inventors: Periklis Beltas, Philippe Generali, David C. Jellison, Jr.
  • Patent number: 9026442
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
    Type: Grant
    Filed: August 14, 2014
    Date of Patent: May 5, 2015
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 9020816
    Abstract: A method, system and apparatus are shown for identifying non-language speech sounds in a speech or audio signal. An audio signal is segmented and feature vectors are extracted from the segments of the audio signal. The segment is classified using a hidden Markov model (HMM) that has been trained on sequences of these feature vectors. Post-processing components can be utilized to enhance classification. An embodiment is described in which the hidden Markov model is used to classify a segment as a language speech sound or one of a variety of non-language speech sounds. Another embodiment is described in which the hidden Markov model is trained using discriminative learning.
    Type: Grant
    Filed: August 13, 2009
    Date of Patent: April 28, 2015
    Assignee: 21CT, Inc.
    Inventor: Matthew McClain
  • Patent number: 9009038
    Abstract: A method for analyzing a digital audio signal associated with a baby cry, comprising the steps of: (a) processing the digital audio signal using a spectral analysis to generate a spectral data; (b) processing the digital audio signal using a time-frequency analysis to generate a time-frequency characteristic; (c) categorizing the baby cry into one of a basic type and a special type based on the spectral data; (d) if the baby cry is of the basic type, determining a basic need based on the time-frequency characteristic and a predetermined lookup table; and (e) if the baby cry is of the special type, determining a special need by inputting the time-frequency characteristic into a pre-trained artificial neural network.
    Type: Grant
    Filed: May 22, 2013
    Date of Patent: April 14, 2015
    Assignee: National Taiwan Normal University
    Inventors: Jon-Chao Hong, Chao-Hsin Wu, Mei-Yung Chen
  • Publication number: 20150081298
    Abstract: In a speech processing apparatus, an acquisition unit is configured to acquire a speech. A separation unit is configured to separate the speech into a plurality of sections in accordance with a prescribed rule. A calculation unit is configured to calculate a degree of similarity in each combination of the sections. An estimation unit is configured to estimate, with respect to the each section, a direction of arrival of the speech. A correction unit is configured to group the sections whose directions of arrival are mutually similar into a same group and correct the degree of similarity with respect to the combination of the sections in the same group. A clustering unit is configured to cluster the sections by using the corrected degree of similarity.
    Type: Application
    Filed: September 12, 2014
    Publication date: March 19, 2015
    Applicant: Kabushiki Kaisha Toshiba
    Inventors: Ning DING, Yusuke KIDA, Makoto HIROHATA
  • Publication number: 20150073798
    Abstract: Technologies for automatic domain model generation include a computing device that accesses an n-gram index of a web corpus. The computing device generates a semantic graph of the web corpus for a relevant domain using the n-gram index. The semantic graph includes one or more related entities that are related to a seed entity. The computing device performs similarity discovery to identify and rank contextual synonyms within the domain. The computing device maintains a domain model including intents representing actions in the domain and slots representing parameters of actions or entities in the domain. The computing device performs intent discovery to discover intents and intent patterns by analyzing the web corpus using the semantic graph. The computing device performs slot discovery to discover slots, slot patterns, and slot values by analyzing the web corpus using the semantic graph. Other embodiments are described and claimed.
    Type: Application
    Filed: September 8, 2014
    Publication date: March 12, 2015
    Inventors: Yael Karov, Eran Levy, Sari Brosh-Lipstein
  • Patent number: 8972261
    Abstract: A computer-implemented system and method for voice transcription error reduction is provided. Speech utterances are obtained from a voice stream and each speech utterance is associated with a transcribed value and a confidence score. Those utterances with transcription values associated with lower confidence scores are identified as questionable utterances. One of the questionable utterances is selected from the voice stream. A predetermined number of questionable utterances from other voice streams and having transcribed values similar to the transcribed value of the selected questionable utterance are identified as a pool of related utterances. A further transcribed value is received for each of a plurality of the questionable utterances in the pool of related utterances. A transcribed message is generated for the voice stream using those transcribed values with higher confidence scores and the further transcribed value for the selected questionable utterance.
    Type: Grant
    Filed: February 3, 2014
    Date of Patent: March 3, 2015
    Assignee: Intellisist, Inc.
    Inventor: David Milstein
  • Publication number: 20150051910
    Abstract: A natural language understanding system performs automatic unsupervised clustering of dialog data from a natural language dialog application. A log parser automatically extracts structured dialog data from application logs. A dialog generalizing module generalizes the extracted dialog data to generalization identifier vectors. A data clustering module automatically clusters the dialog data based on the generalization identifier vectors using an unsupervised density-based clustering algorithm without a predefined number of clusters and without a predefined distance threshold in an iterative approach based on a hierarchical ordering of the generalization.
    Type: Application
    Filed: August 19, 2013
    Publication date: February 19, 2015
    Applicant: Nuance Communications, Inc.
    Inventor: Jean-Francois Lavallée
  • Publication number: 20150032452
    Abstract: A method for identifying concepts in a plurality of interactions includes: filtering, on a processor, the interactions based on intervals; creating, on the processor, a plurality of sentences from the filtered interactions; computing, on the processor, a saliency of each the sentences; pruning away, on the processor, sentences with low saliency for generating a set of informative sentences; clustering, on the processor, the sentences of the set of informative sentences for generating a plurality of sentence clusters, each of the clusters corresponding to a concept of the concepts; computing, on the processor, a saliency of each of the clusters; and naming, on the processor, each of the clusters.
    Type: Application
    Filed: July 26, 2013
    Publication date: January 29, 2015
    Applicant: GENESYS TELECOMMUNICATIONS LABORATORIES, INC.
    Inventors: Amir Lev-Tov, Avraham Faizakof, David Ollinger, Yochai Konig
  • Patent number: 8942979
    Abstract: An acoustic processing apparatus is provided. The acoustic processing apparatus including a first extracting unit configured to extract a first acoustic model that corresponds with a first position among positions set in a speech recognition target area, a second extracting unit configured to extract at least one second acoustic model that corresponds with, respectively, at least one second position in proximity to the first position, and an acoustic model generating unit configured to generate a third acoustic model based on the first acoustic model, the second acoustic model, or a combination thereof.
    Type: Grant
    Filed: July 28, 2011
    Date of Patent: January 27, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Nam-Hoon Kim, Jeong-Su Kim, Jeong-Mi Cho
  • Publication number: 20150025887
    Abstract: In a method of diarization of audio data, audio data is segmented into a plurality of utterances. Each utterance is represented as an utterance model representative of a plurality of feature vectors. The utterance models are clustered. A plurality of speaker models are constructed from the clustered utterance models. A hidden Markov model is constructed of the plurality of speaker models. A sequence of identified speaker models is decoded.
    Type: Application
    Filed: June 30, 2014
    Publication date: January 22, 2015
    Applicant: VERINT SYSTEMS LTD.
    Inventors: Oana Sidi, Ron Wein
  • Patent number: 8938390
    Abstract: In one embodiment, a method for detecting autism in a natural language environment using a microphone, sound recorder, and a computer programmed with software for the specialized purpose of processing recordings captured by the microphone and sound recorder combination, the computer programmed to execute the method, includes segmenting an audio signal captured by the microphone and sound recorder combination using the computer programmed for the specialized purpose into a plurality recording segments. The method further includes determining which of the plurality of recording segments correspond to a key child. The method further includes determining which of the plurality of recording segments that correspond to the key child are classified as key child recordings.
    Type: Grant
    Filed: February 27, 2009
    Date of Patent: January 20, 2015
    Assignee: LENA Foundation
    Inventors: Dongxin D. Xu, Terrance D. Paul
  • Patent number: 8930190
    Abstract: An audio processing device including a feature calculation unit, a boundary calculation unit and a judgment unit, detects points of change of audio features from an audio signal in an AV content. The feature calculation unit calculates, for each unit section of the audio signal, section feature data expressing features of the audio signal in the unit section. The boundary calculation unit calculates, for each target unit section among the unit sections of the audio signal, a piece of boundary information relating to at least one boundary of a similarity section. The similarity section consists of consecutive unit sections, inclusive of the target unit section, which each have similar section feature data. The judgment unit calculates a priority of each boundary indicated by one or more of the pieces of boundary information and judges whether the boundary is a scene change point based on the priority.
    Type: Grant
    Filed: March 11, 2013
    Date of Patent: January 6, 2015
    Assignee: Panasonic Intellectual Property Corporation Of America
    Inventors: Tomohiro Konuma, Tsutomu Uenoyama
  • Publication number: 20150006175
    Abstract: The present invention relates to an apparatus and a method for recognizing continuous speech having large vocabulary. In the present invention, large vocabulary in large vocabulary continuous speech having a lot of same kinds of vocabulary is divided to a reasonable number of clusters, then representative vocabulary for pertinent clusters is selected and first recognition is performed with the representative vocabulary, then if the representative vocabulary is recognized by use of the result of first recognition, re-recognition is performed against all words in the cluster where the recognized representative vocabulary belongs.
    Type: Application
    Filed: June 13, 2014
    Publication date: January 1, 2015
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Ki-Young PARK, Yun-Keun LEE, Hoon CHUNG
  • Publication number: 20140358541
    Abstract: Reliable speaker-based clustering of speech utterances allows improved speaker recognition and speaker-based speech segmentation. According to at least one example embodiment, an iterative bottom-up speaker-based clustering approach employs voiceprints of speech utterances, such as i-vectors. At each iteration, a clustering confidence score in terms of Silhouette Width Criterion (SWC) values is evaluated, and a pair of nearest clusters is merged into a single cluster. The pair of nearest clusters merged is determined based on a similarity score indicative of similarity between voiceprints associated with different clusters. A final clustering pattern is then determined as a set of clusters associated with an iteration corresponding to the highest clustering confidence score evaluated. The SWC used may further be a modified SWC enabling detection of an early stop of the iterative approach.
    Type: Application
    Filed: May 31, 2013
    Publication date: December 4, 2014
    Inventors: Daniele Ernesto Colibro, Claudio Vair, Kevin R. Farrell
  • Patent number: 8892438
    Abstract: An apparatus, a method, and a machine-readable medium are provided for characterizing differences between two language models. A group of utterances from each of a group of time domains are examined. One of a significant word change or a significant word class change within the plurality of utterances is determined. A first cluster of utterances including a word or a word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances. A second cluster of utterances not including the word or the word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances.
    Type: Grant
    Filed: September 14, 2010
    Date of Patent: November 18, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Allen Louis Gorin, John Grothendieck, Jeremy Huntley Greet Wright
  • Patent number: 8892436
    Abstract: A method of recognizing speech is provided. The method includes the operations of (a) dividing first speech that is input to a speech recognizing apparatus into frames; (b) converting the frames of the first speech into frames of second speech by applying conversion rules to the divided frames, respectively; and (c) recognizing, by the speech recognizing apparatus, the frames of the second speech, wherein (b) comprises converting the frames of the first speech into the frames of the second speech by reflecting at least one frame from among the frames that are previously positioned with respect to a frame of the first speech.
    Type: Grant
    Filed: October 19, 2011
    Date of Patent: November 18, 2014
    Assignees: Samsung Electronics Co., Ltd., Seoul National University Industry Foundation
    Inventors: Ki-wan Eom, Chang-woo Han, Tae-gyoon Kang, Nam-soo Kim, Doo-hwa Hong, Jae-won Lee, Hyung-joon Lim
  • Publication number: 20140337027
    Abstract: A voice processing device includes: an acquirer which acquires feature quantities of vowel sections included in voice data; a classifier which classifies, among the acquired feature quantities, feature quantities corresponding to a plurality of same vowels into a plurality of clusters for respective vowels with unsupervised classification; and a determiner which determines a combination of clusters corresponding to the same speaker from clusters classified for the plurality of vowels.
    Type: Application
    Filed: April 11, 2014
    Publication date: November 13, 2014
    Applicant: CASIO COMPUTER CO., LTD.
    Inventor: Hiroyasu IDE
  • Patent number: 8886535
    Abstract: A method of optimizing the calculation of matching scores between phone states and acoustic frames across a matrix of an expected progression of phone states aligned with an observed progression of acoustic frames within an utterance is provided. The matrix has a plurality of cells associated with a characteristic acoustic frame and a characteristic phone state. A first set and second set of cells that meet a threshold probability of matching a first phone state or a second phone state, respectively, are determined. The phone states are stored on a local cache of a first core and a second core, respectively. The first and second sets of cells are also provided to the first core and second core, respectively. Further, matching scores of each characteristic state and characteristic observation of each cell of the first set of cells and of the second set of cells are calculated.
    Type: Grant
    Filed: January 23, 2014
    Date of Patent: November 11, 2014
    Assignee: Accumente, LLC
    Inventors: Jike Chong, Ian Richard Lane, Senaka Wimal Buthpitiya
  • Patent number: 8880107
    Abstract: In one embodiment, a method provides for monitoring and analyzing communications of a monitored user on behalf of a monitoring user, to determine whether the communication includes a violation. For example, SMS messages, MMS messages, IMs, e-mails, social network site postings or voice mails of a child may be monitored on behalf of a parent. In one embodiment, an algorithm is used to analyze a normalized version of the communication, which algorithm is retrained using results of past analysis, to determine a probability of a communication including a violation.
    Type: Grant
    Filed: January 28, 2011
    Date of Patent: November 4, 2014
    Assignee: Protext Mobility, Inc.
    Inventors: Edward Movsesyan, Igor Slavinsky
  • Publication number: 20140316784
    Abstract: Technology for improving the predictive accuracy of input word recognition on a device by dynamically updating the lexicon of recognized words based on the word choices made by similar users. The technology collects users' vocabulary choices (e.g., words that each user uses, or adds to or removes from a word recognition dictionary), associates users who make similar choices, aggregates related vocabulary choices, filters the words, and sends words identified as likely choices for that user to the user's device. Clusters may include, for example, users in a particular location (e.g., sets of people who use words such as “Puyallup,” “Gloucester,” or “Waiheke”), users with a particular professional or hobby vocabulary, or application-specific vocabulary (e.g., word choices in map searches or email messages).
    Type: Application
    Filed: April 24, 2013
    Publication date: October 23, 2014
    Inventors: Ethan R. Bradford, Simon Corston, David J. Kay, Donni McCray, Keith Trnka
  • Publication number: 20140303978
    Abstract: A method and apparatus are provided for automatically acquiring grammar fragments for recognizing and understanding fluently spoken language. Grammar fragments representing a set of syntactically and semantically similar phrases may be generated using three probability distributions: of succeeding words, of preceding words, and of associated call-types. The similarity between phrases may be measured by applying Kullback-Leibler distance to these tree probability distributions. Phrases being close in all three distances may be clustered into a grammar fragment.
    Type: Application
    Filed: March 4, 2014
    Publication date: October 9, 2014
    Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventors: Kazuhiro Arai, Allen L. Gorin, Giuseppe Riccardi, Jeremy H. Wright
  • Patent number: 8838448
    Abstract: A method is described for use with automatic speech recognition using discriminative criteria for speaker adaptation. An adaptation evaluation is performed of speech recognition performance data for speech recognition system users. Adaptation candidate users are identified based on the adaptation evaluation for whom an adaptation process is likely to improve system performance.
    Type: Grant
    Filed: April 5, 2012
    Date of Patent: September 16, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Dan Ning Jiang, Vaibhava Goel, Dimitri Kanevsky, Yong Qin
  • Patent number: 8812315
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.
    Type: Grant
    Filed: October 1, 2013
    Date of Patent: August 19, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 8812316
    Abstract: A speech control system that can recognize a spoken command and associated words (such as “call mom at home”) and can cause a selected application (such as a telephone dialer) to execute the command to cause a data processing system, such as a smartphone, to perform an operation based on the command (such as look up mom's phone number at home and dial it to establish a telephone call). The speech control system can use a set of interpreters to repair recognized text from a speech recognition system, and results from the set can be merged into a final repaired transcription which is provided to the selected application.
    Type: Grant
    Filed: June 5, 2014
    Date of Patent: August 19, 2014
    Assignee: Apple Inc.
    Inventor: Lik Harry Chen
  • Patent number: 8804973
    Abstract: In an example signal clustering apparatus, a feature of a signal is divided into segments. A first feature vector of each segment is calculated, the first feature vector having has a plurality of elements corresponding to each reference model. A value of an element attenuates when a feature of the segment shifts from a center of a distribution of the reference model corresponding to the element. A similarity between two reference models is calculated. A second feature vector of each segment is calculated, the second feature vector having a plurality of elements corresponding to each reference model. A value of an element is a weighted sum and segments of second feature vectors of which the plurality of elements are similar values are clustered to one class.
    Type: Grant
    Filed: March 19, 2012
    Date of Patent: August 12, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Makoto Hirohata, Kazunori Imoto, Hisashi Aoki
  • Patent number: 8788266
    Abstract: The present invention uses a language model creation device 200 that creates a new language model using a standard language model created from standard language text. The language model creation device 200 includes a transformation rule storage section 201 that stores transformation rules used for transforming dialect-containing word strings into standard language word strings, and a dialect language model creation section 203 that creates dialect-containing n-grams by applying the transformation rules to word n-grams in the standard language model and, furthermore, creates the new language model (dialect language model) by adding the created dialect-containing n-grams to the word n-grams.
    Type: Grant
    Filed: March 16, 2010
    Date of Patent: July 22, 2014
    Assignee: NEC Corporation
    Inventors: Tasuku Kitade, Takafumi Koshinaka, Yoshifumi Onishi
  • Patent number: 8781831
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.
    Type: Grant
    Filed: September 5, 2013
    Date of Patent: July 15, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Andrej Ljolje, Bernard S. Renger, Steven Neil Tischer