Clustering Patents (Class 704/245)

Semi-supervised learning of word embeddings

Patent number: 9672814

Abstract: Software that trains an artificial neural network for generating vector representations for natural language text, by performing the following steps: (i) receiving, by one or more processors, a set of natural language text; (ii) generating, by one or more processors, a set of first metadata for the set of natural language text, where the first metadata is generated using supervised learning method(s); (iii) generating, by one or more processors, a set of second metadata for the set of natural language text, where the second metadata is generated using unsupervised learning method(s); and (iv) training, by one or more processors, an artificial neural network adapted to generate vector representations for natural language text, where the training is based, at least in part, on the received natural language text, the generated set of first metadata, and the generated set of second metadata.

Type: Grant

Filed: May 8, 2015

Date of Patent: June 6, 2017

Assignee: International Business Machines Corporation

Inventors: Liangliang Cao, James J. Fan, Chang Wang, Bing Xiang, Bowen Zhou
Methods and apparatus for reducing latency in speech recognition applications

Patent number: 9666192

Abstract: Methods and apparatus for reducing latency in speech recognition applications. The method comprises receive first audio comprising speech from a user of a computing device, detecting an end of speech in the first audio, generating an ASR result based, at least in part, on a portion of the first audio prior to the detected end of speech, determining whether a valid action can be performed by a speech-enabled application installed on the computing device using the ASR result, and processing second audio when it is determined that a valid action cannot be performed by the speech-enabled application using the ASR result.

Type: Grant

Filed: May 26, 2015

Date of Patent: May 30, 2017

Assignee: Nuance Communications, Inc.

Inventor: Mark Fanty
Semi-supervised learning of word embeddings

Patent number: 9659560

Abstract: Software that trains an artificial neural network for generating vector representations for natural language text, by performing the following steps: (i) receiving, by one or more processors, a set of natural language text; (ii) generating, by one or more processors, a set of first metadata for the set of natural language text, where the first metadata is generated using supervised learning method(s); (iii) generating, by one or more processors, a set of second metadata for the set of natural language text, where the second metadata is generated using unsupervised learning method(s); and (iv) training, by one or more processors, an artificial neural network adapted to generate vector representations for natural language text, where the training is based, at least in part, on the received natural language text, the generated set of first metadata, and the generated set of second metadata.

Type: Grant

Filed: September 30, 2015

Date of Patent: May 23, 2017

Assignee: International Business Machines Corporation

Inventors: Liangliang Cao, James J. Fan, Chang Wang, Bing Xiang, Bowen Zhou
Sharing moment experiences

Patent number: 9641968

Abstract: A system for sharing moment experiences is described. A system receives moment data from an input to a mobile device. The system receives geographic location information, time information, and contextual information that is local to the mobile device. The system creates a message about the moment data based on the geographic location information, the time information, and the contextual information. The system outputs the moment data with the message.

Type: Grant

Filed: May 15, 2015

Date of Patent: May 2, 2017

Assignee: Krumbs, Inc.

Inventors: Neilesh Jain, Ramesh Jain, Pinaki Sinha
Systems, vehicles, and methods for limiting speech-based access to an audio metadata database

Patent number: 9620148

Abstract: Systems, vehicles, and methods for limiting speech-based access to an audio metadata database are described herein. Audio metadata databases described herein include a plurality of audio metadata entries. Each audio metadata entry includes metadata information associated with at least one audio file. Embodiments described herein determine when a size of the audio metadata database reaches a threshold size, and limit which of the plurality of audio metadata entries may be accessed in response to the speech input signal when the size of the audio metadata database reaches the threshold size.

Type: Grant

Filed: July 1, 2013

Date of Patent: April 11, 2017

Assignee: Toyota Motor Engineering & Manufacturing North America, Inc.

Inventor: Eric Randell Schmidt
Modeling device and method for speaker recognition, and speaker recognition system

Patent number: 9595260

Abstract: A modeling device comprises a front end which receives enrollment speech data from each target speaker, a reference anchor set generation unit which generates a reference anchor set using the enrollment speech data based on an anchor space, and a voice print generation unit which generates voice prints based on the reference anchor set and the enrollment speech data. By taking the enrollment speech and speaker adaptation technique into account, anchor models with a smaller size can be generated, so reliable and robust speaker recognition with a smaller size reference anchor set is possible.

Type: Grant

Filed: December 10, 2010

Date of Patent: March 14, 2017

Assignee: PANASONIC INTELLECTUAL PROPERTY CORPORATION OF AMERICA

Inventors: Haifeng Shen, Long Ma, Bingqi Zhang
System and method for adapting automatic speech recognition pronunciation by acoustic model restructuring

Patent number: 9576582

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

Type: Grant

Filed: February 23, 2016

Date of Patent: February 21, 2017

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
Visual display of semantic information

Patent number: 9524291

Abstract: Techniques involving visual display of information related to matching user utterances against graph patterns are described. In one or more implementations, an utterance of a user is obtained that has been indicated as corresponding to a graph pattern through linguistic analysis. The utterance is displayed in a user interface as a representation of the graph pattern.

Type: Grant

Filed: October 6, 2010

Date of Patent: December 20, 2016

Assignee: Virtuoz SA

Inventors: Dan Teodosiu, Elizabeth Ireland Powers, Pierre Serge Vincent LeRoy, Sebastien Jean-Marie Christian Saunier
Fisher vectors meet neural networks: a hybrid visual classification architecture

Patent number: 9514391

Abstract: In an image classification method, a feature vector representing an input image is generated by unsupervised operations including extracting local descriptors from patches distributed over the input image, and a classification value for the input image is generated by applying a neural network (NN) to the feature vector. Extracting the feature vector may include encoding the local descriptors extracted from each patch using a generative model, such as Fisher vector encoding, aggregating the encoded local descriptors to form a vector, projecting the vector into a space of lower dimensionality, for example using Principal Component Analysis (PCA), and normalizing the feature vector of lower dimensionality to produce the feature vector representing the input image. A set of mid-level features representing the input image may be generated as the output of an intermediate layer of the NN.

Type: Grant

Filed: April 20, 2015

Date of Patent: December 6, 2016

Assignee: XEROX CORPORATION

Inventors: Florent C. Perronnin, Diane Larlus-Larrondo
Topic extraction apparatus and program

Patent number: 9449051

Abstract: According to one embodiment, a topic extracting apparatus extracts each term from a target document set, and calculates an appearance frequency of each term and a document frequency that each term appears. The topic extracting apparatus acquires a document set of appearance documents with respect to each extracted term, calculates a topic degree, extracts each term whose topic degree is not lower than a predetermined value as a topic word, and calculates freshness of the extracted topic word based on an appearance date and time. The topic extracting apparatus presents the extracted topic words in order of the freshness and also presents the number of appearance documents of each presented topic word per unit span.

Type: Grant

Filed: September 10, 2013

Date of Patent: September 20, 2016

Assignees: KABUSHIKI KAISHA TOSHIBA, TOSHIBA SOLUTIONS CORPORATION

Inventors: Hideki Iwasaki, Kazuyuki Goto, Shigeru Matsumoto, Yasunari Miyabe, Mikito Kobayashi
Integrated voice biometrics cloud security gateway

Patent number: 9412381

Abstract: A triple factor authentication in one step method and system is disclosed. According to one embodiment, an Integrated Voice Biometrics Cloud Security Gateway (IVCS Gateway) intercepts an access request to a resource server from a user using a user device. IVCS Gateway then authenticates the user by placing a call to the user device and sending a challenge message prompting the user to respond by voice. After receiving the voice sample of the user, the voice sample is compared against a stored voice biometrics record for the user. The voice sample is also converted into a text phrase and compared against a stored secret text phrase. In an alternative embodiment, an IVCS Gateway that is capable of making non-binary access decisions and associating multiple levels of access with a single user or group is described.

Type: Grant

Filed: March 30, 2011

Date of Patent: August 9, 2016

Assignee: ACK3 BIONETICS PRIVATE LTD.

Inventor: Sajit Bhaskaran
Image-based faceted system and method

Patent number: 9411829

Abstract: Disclosed herein is a system and method that facilitate searching and/or browsing of images by clustering, or grouping, the images into a set of image clusters using facets, such as without limitation visual properties or visual characteristics, of the images, and representing each image cluster by a representative image selected for the image cluster. A map-reduce based probabilistic topic model may be used to identify one or more images belonging to each image cluster and update model parameters.

Type: Grant

Filed: June 10, 2013

Date of Patent: August 9, 2016

Assignee: Yahoo! Inc.

Inventors: Jia Li, Nadav Golbandi, XianXing Zhang
Apparatus for speech recognition using multiple acoustic model and method thereof

Patent number: 9378742

Abstract: Disclosed are an apparatus for recognizing voice using multiple acoustic models according to the present invention and a method thereof. An apparatus for recognizing voice using multiple acoustic models includes a voice data database (DB) configured to store voice data collected in various noise environments; a model generating means configured to perform classification for each speaker and environment based on the collected voice data, and to generate an acoustic model of a binary tree structure as the classification result; and a voice recognizing means configured to extract feature data of voice data when the voice data is received from a user, to select multiple models from the generated acoustic model based on the extracted feature data, to parallel recognize the voice data based on the selected multiple models, and to output a word string corresponding to the voice data as the recognition result.

Type: Grant

Filed: March 18, 2013

Date of Patent: June 28, 2016

Assignee: Electronics and Telecommunications Research Institute

Inventor: Dong Hyun Kim
Acoustic echo cancellation processing based on feedback from speech recognizer

Patent number: 9373338

Abstract: An automatic speech recognition engine receives an acoustic-echo processed signal from an acoustic-echo processing (AEP) module, where said echo processed signal contains mainly the speech from the near-end talker. The automatic speech recognition engine analyzes the content of the acoustic-echo processed signal to determine whether words or keywords are present. Based upon the results of this analysis, the automatic speech recognition engine produces a value reflecting the likelihood that some words or keywords are detected. Said value is provided to the AEP module. Based upon the value, the AEP module determines if there is double talk and processes the incoming signals accordingly to enhance its performance.

Type: Grant

Filed: June 25, 2012

Date of Patent: June 21, 2016

Assignee: Amazon Technologies, Inc.

Inventors: Ramya Gopalan, Kavitha Velusamy, Wai C. Chu, Amit S. Chhetri
Pattern recognizing engine

Patent number: 9336774

Abstract: Methods, systems, and apparatus, for pattern recognition. One aspect includes a pattern recognizing engine that includes multiple pattern recognizer processors that form a hierarchy of pattern recognizer processors. The pattern recognizer processors include a child pattern recognizer processor at a lower level in the hierarch and a parent pattern recognizer processor at a higher level of the hierarchy, where the child pattern recognizer processor is configured to provide a first complex recognition output signal to a pattern recognizer processor at a higher level than the child pattern recognizer processor, and the parent pattern recognizer processor is configured to receive as an input a second complex recognition output signal from a pattern recognizer processor at a lower level than the parent pattern recognizer processor.

Type: Grant

Filed: April 22, 2013

Date of Patent: May 10, 2016

Assignee: Google Inc.

Inventor: Raymond C. Kurzweil
Speech recognition accuracy improvement through speaker categories

Patent number: 9305553

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition. In one aspect, a computer-based method includes receiving a speech corpus at a speech management server system that includes multiple speech recognition engines tuned to different speaker types; using the speech recognition engines to associate the received speech corpus with a selected one of multiple different speaker types; and sending a speaker category identification code that corresponds to the associated speaker type from the speech management server system over a network. The speaker category identification code can be used by any one of speech-interactive applications coupled to the network to select one of an appropriate one of multiple application-accessible speech recognition engines tuned to the different speaker types in response to an indication that a user accessing the application is associated with a particular one of the speaker category identification codes.

Type: Grant

Filed: April 28, 2011

Date of Patent: April 5, 2016

Inventor: William S. Meisel
System and method for adapting automatic speech recognition pronunciation by acoustic model restructuring

Patent number: 9305547

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

Type: Grant

Filed: April 28, 2015

Date of Patent: April 5, 2016

Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
Method, apparatus and system for finding synonyms

Patent number: 9275044

Abstract: A method and system are provided for finding synonyms which are more contextually relevant to the intended use of a particular word. The system finds a list of synonyms for the input word and also finds a list of synonyms for an additional word entered by the user to approximate the intended usage of the input word. These two lists of synonyms are compared to find words common to both lists, and the common words are presented to the user as potential synonyms which are appropriate for the intended use.

Type: Grant

Filed: March 6, 2013

Date of Patent: March 1, 2016

Assignee: SearchLeaf, LLC

Inventors: Thomas Lund, Bryce Lund
Dictionary learning device, pattern matching apparatus, method for learning dictionary and storage medium

Patent number: 9262694

Abstract: Provided is a technology which enables further improvement of the accuracy of the determination in the pattern matching processing. A dictionary learning device 1 includes a score calculation unit 2 and a learning unit 3. The score calculation unit 2 calculates a matching score representing a similarity-degree between a sample pattern, which is a sample of a pattern which is likely to be subjected to a pattern matching processing, and a degradation pattern resulting from a degrading processing on the sample pattern. The learning unit 3 learns a quality dictionary based on the calculated matching score and the degradation pattern. The quality dictionary is a dictionary which is used in a processing to evaluate a degradation degree (quality) of a matching target pattern of being pattern of an object on which the pattern matching processing is carried out.

Type: Grant

Filed: December 12, 2012

Date of Patent: February 16, 2016

Assignee: NEC Corporation

Inventor: Masato Ishii
Object identification system and method

Patent number: 9122931

Abstract: An object identification method is provided. The method includes dividing an input video into a number of video shots, each containing one or more video frames. The method also includes detecting target-class object occurrences and related-class object occurrences in each video shot. Further, the method includes generating hint information including a small subset of frames representing the input video and performing object tracking and recognition based on the hint information. The method also includes fusing tracking and recognition results and outputting labeled objects based on the combined tracking and recognition results.

Type: Grant

Filed: October 25, 2013

Date of Patent: September 1, 2015

Assignee: TCL RESEARCH AMERICA INC.

Inventors: Liang Peng, Haohong Wang
Ensemble interest point detection for audio matching

Patent number: 9098576

Abstract: Systems and methods for audio matching are disclosed herein. In one embodiment, a system includes both interest point mixing and fingerprint mixing by using multiple interest point detection methods in parallel. Since multiple interest point detection methods are used in parallel, accuracy of audio matching is improved across a wide variety of audio signals. In addition the scalability of the disclosed audio matching system is increased by matching the fingerprint of an audio sample with a fingerprint of a reference sample versus matching an entire spectrogram. Accordingly, a more accurate and more general solution to audio matching can be accomplished.

Type: Grant

Filed: October 17, 2011

Date of Patent: August 4, 2015

Assignee: Google Inc.

Inventors: Matthew Sharifi, Gheorghe Postelnicu, George Tzanetakis, Dominik Roblek
Selective learning for growing a graph lattice

Patent number: 9053579

Abstract: A system and method generate a graph lattice from exemplary images. At least one processor receives exemplary data graphs of the exemplary images and generates graph lattice nodes of size one from primitives. Until a termination condition is met, the at least one processor repeatedly: 1) generates candidate graph lattice nodes from accepted graph lattice nodes; 2) selects one or more candidate graph lattice nodes preferentially discriminating exemplary data graphs which are less discriminable than other exemplary data graphs using the accepted graph lattice nodes; and 3) promotes the selected graph lattice nodes to accepted status. The graph lattice is formed from the accepted graph lattice nodes and relations between the accepted graph lattice nodes.

Type: Grant

Filed: June 19, 2012

Date of Patent: June 9, 2015

Assignee: Palo Alto Research Center Incorporated

Inventor: Eric Saund
Program and syndicated content detection

Patent number: 9047286

Abstract: Content from multiple different stations can be divided into segments based on time. Matched segments associated with each station can be identified by comparing content included in a first segment associated with a first station, to content included in a second segment associated with a second station. Syndicated content can be identified and tagged based, at least in part, on a relationship between sequences of matched segments on different stations. Various embodiments also include identifying main sequences associated with each station under consideration, removing some of the main sequences, and consolidating remaining main sequences based on various threshold criteria.

Type: Grant

Filed: December 17, 2009

Date of Patent: June 2, 2015

Assignee: iHeartMedia Management Services, Inc.

Inventors: Periklis Beltas, Philippe Generali, David C. Jellison, Jr.
System and method for adapting automatic speech recognition pronunciation by acoustic model restructuring

Patent number: 9026442

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

Type: Grant

Filed: August 14, 2014

Date of Patent: May 5, 2015

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
Hidden markov model for speech processing with training method

Patent number: 9020816

Abstract: A method, system and apparatus are shown for identifying non-language speech sounds in a speech or audio signal. An audio signal is segmented and feature vectors are extracted from the segments of the audio signal. The segment is classified using a hidden Markov model (HMM) that has been trained on sequences of these feature vectors. Post-processing components can be utilized to enhance classification. An embodiment is described in which the hidden Markov model is used to classify a segment as a language speech sound or one of a variety of non-language speech sounds. Another embodiment is described in which the hidden Markov model is trained using discriminative learning.

Type: Grant

Filed: August 13, 2009

Date of Patent: April 28, 2015

Assignee: 21CT, Inc.

Inventor: Matthew McClain
Method and system for analyzing digital sound audio signal associated with baby cry

Patent number: 9009038

Abstract: A method for analyzing a digital audio signal associated with a baby cry, comprising the steps of: (a) processing the digital audio signal using a spectral analysis to generate a spectral data; (b) processing the digital audio signal using a time-frequency analysis to generate a time-frequency characteristic; (c) categorizing the baby cry into one of a basic type and a special type based on the spectral data; (d) if the baby cry is of the basic type, determining a basic need based on the time-frequency characteristic and a predetermined lookup table; and (e) if the baby cry is of the special type, determining a special need by inputting the time-frequency characteristic into a pre-trained artificial neural network.

Type: Grant

Filed: May 22, 2013

Date of Patent: April 14, 2015

Assignee: National Taiwan Normal University

Inventors: Jon-Chao Hong, Chao-Hsin Wu, Mei-Yung Chen
SPEECH PROCESSING APPARATUS AND METHOD

Publication number: 20150081298

Abstract: In a speech processing apparatus, an acquisition unit is configured to acquire a speech. A separation unit is configured to separate the speech into a plurality of sections in accordance with a prescribed rule. A calculation unit is configured to calculate a degree of similarity in each combination of the sections. An estimation unit is configured to estimate, with respect to the each section, a direction of arrival of the speech. A correction unit is configured to group the sections whose directions of arrival are mutually similar into a same group and correct the degree of similarity with respect to the combination of the sections in the same group. A clustering unit is configured to cluster the sections by using the corrected degree of similarity.

Type: Application

Filed: September 12, 2014

Publication date: March 19, 2015

Applicant: Kabushiki Kaisha Toshiba

Inventors: Ning DING, Yusuke KIDA, Makoto HIROHATA
AUTOMATIC GENERATION OF DOMAIN MODELS FOR VIRTUAL PERSONAL ASSISTANTS

Publication number: 20150073798

Abstract: Technologies for automatic domain model generation include a computing device that accesses an n-gram index of a web corpus. The computing device generates a semantic graph of the web corpus for a relevant domain using the n-gram index. The semantic graph includes one or more related entities that are related to a seed entity. The computing device performs similarity discovery to identify and rank contextual synonyms within the domain. The computing device maintains a domain model including intents representing actions in the domain and slots representing parameters of actions or entities in the domain. The computing device performs intent discovery to discover intents and intent patterns by analyzing the web corpus using the semantic graph. The computing device performs slot discovery to discover slots, slot patterns, and slot values by analyzing the web corpus using the semantic graph. Other embodiments are described and claimed.

Type: Application

Filed: September 8, 2014

Publication date: March 12, 2015

Inventors: Yael Karov, Eran Levy, Sari Brosh-Lipstein
Computer-implemented system and method for voice transcription error reduction

Patent number: 8972261

Abstract: A computer-implemented system and method for voice transcription error reduction is provided. Speech utterances are obtained from a voice stream and each speech utterance is associated with a transcribed value and a confidence score. Those utterances with transcription values associated with lower confidence scores are identified as questionable utterances. One of the questionable utterances is selected from the voice stream. A predetermined number of questionable utterances from other voice streams and having transcribed values similar to the transcribed value of the selected questionable utterance are identified as a pool of related utterances. A further transcribed value is received for each of a plurality of the questionable utterances in the pool of related utterances. A transcribed message is generated for the voice stream using those transcribed values with higher confidence scores and the further transcribed value for the selected questionable utterance.

Type: Grant

Filed: February 3, 2014

Date of Patent: March 3, 2015

Assignee: Intellisist, Inc.

Inventor: David Milstein
Unsupervised Clustering of Dialogs Extracted from Released Application Logs

Publication number: 20150051910

Abstract: A natural language understanding system performs automatic unsupervised clustering of dialog data from a natural language dialog application. A log parser automatically extracts structured dialog data from application logs. A dialog generalizing module generalizes the extracted dialog data to generalization identifier vectors. A data clustering module automatically clusters the dialog data based on the generalization identifier vectors using an unsupervised density-based clustering algorithm without a predefined number of clusters and without a predefined distance threshold in an iterative approach based on a hierarchical ordering of the generalization.

Type: Application

Filed: August 19, 2013

Publication date: February 19, 2015

Applicant: Nuance Communications, Inc.

Inventor: Jean-Francois Lavallée
SYSTEM AND METHOD FOR DISCOVERING AND EXPLORING CONCEPTS

Publication number: 20150032452

Abstract: A method for identifying concepts in a plurality of interactions includes: filtering, on a processor, the interactions based on intervals; creating, on the processor, a plurality of sentences from the filtered interactions; computing, on the processor, a saliency of each the sentences; pruning away, on the processor, sentences with low saliency for generating a set of informative sentences; clustering, on the processor, the sentences of the set of informative sentences for generating a plurality of sentence clusters, each of the clusters corresponding to a concept of the concepts; computing, on the processor, a saliency of each of the clusters; and naming, on the processor, each of the clusters.

Type: Application

Filed: July 26, 2013

Publication date: January 29, 2015

Applicant: GENESYS TELECOMMUNICATIONS LABORATORIES, INC.

Inventors: Amir Lev-Tov, Avraham Faizakof, David Ollinger, Yochai Konig
Acoustic processing apparatus and method

Patent number: 8942979

Abstract: An acoustic processing apparatus is provided. The acoustic processing apparatus including a first extracting unit configured to extract a first acoustic model that corresponds with a first position among positions set in a speech recognition target area, a second extracting unit configured to extract at least one second acoustic model that corresponds with, respectively, at least one second position in proximity to the first position, and an acoustic model generating unit configured to generate a third acoustic model based on the first acoustic model, the second acoustic model, or a combination thereof.

Type: Grant

Filed: July 28, 2011

Date of Patent: January 27, 2015

Assignee: Samsung Electronics Co., Ltd.

Inventors: Nam-Hoon Kim, Jeong-Su Kim, Jeong-Mi Cho
Blind Diarization of Recorded Calls with Arbitrary Number of Speakers

Publication number: 20150025887

Abstract: In a method of diarization of audio data, audio data is segmented into a plurality of utterances. Each utterance is represented as an utterance model representative of a plurality of feature vectors. The utterance models are clustered. A plurality of speaker models are constructed from the clustered utterance models. A hidden Markov model is constructed of the plurality of speaker models. A sequence of identified speaker models is decoded.

Type: Application

Filed: June 30, 2014

Publication date: January 22, 2015

Applicant: VERINT SYSTEMS LTD.

Inventors: Oana Sidi, Ron Wein
System and method for expressive language and developmental disorder assessment

Patent number: 8938390

Abstract: In one embodiment, a method for detecting autism in a natural language environment using a microphone, sound recorder, and a computer programmed with software for the specialized purpose of processing recordings captured by the microphone and sound recorder combination, the computer programmed to execute the method, includes segmenting an audio signal captured by the microphone and sound recorder combination using the computer programmed for the specialized purpose into a plurality recording segments. The method further includes determining which of the plurality of recording segments correspond to a key child. The method further includes determining which of the plurality of recording segments that correspond to the key child are classified as key child recordings.

Type: Grant

Filed: February 27, 2009

Date of Patent: January 20, 2015

Assignee: LENA Foundation

Inventors: Dongxin D. Xu, Terrance D. Paul
Audio processing device, audio processing method, program and integrated circuit

Patent number: 8930190

Abstract: An audio processing device including a feature calculation unit, a boundary calculation unit and a judgment unit, detects points of change of audio features from an audio signal in an AV content. The feature calculation unit calculates, for each unit section of the audio signal, section feature data expressing features of the audio signal in the unit section. The boundary calculation unit calculates, for each target unit section among the unit sections of the audio signal, a piece of boundary information relating to at least one boundary of a similarity section. The similarity section consists of consecutive unit sections, inclusive of the target unit section, which each have similar section feature data. The judgment unit calculates a priority of each boundary indicated by one or more of the pieces of boundary information and judges whether the boundary is a scene change point based on the priority.

Type: Grant

Filed: March 11, 2013

Date of Patent: January 6, 2015

Assignee: Panasonic Intellectual Property Corporation Of America

Inventors: Tomohiro Konuma, Tsutomu Uenoyama
APPARATUS AND METHOD FOR RECOGNIZING CONTINUOUS SPEECH

Publication number: 20150006175

Abstract: The present invention relates to an apparatus and a method for recognizing continuous speech having large vocabulary. In the present invention, large vocabulary in large vocabulary continuous speech having a lot of same kinds of vocabulary is divided to a reasonable number of clusters, then representative vocabulary for pertinent clusters is selected and first recognition is performed with the representative vocabulary, then if the representative vocabulary is recognized by use of the result of first recognition, re-recognition is performed against all words in the cluster where the recognized representative vocabulary belongs.

Type: Application

Filed: June 13, 2014

Publication date: January 1, 2015

Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Inventors: Ki-Young PARK, Yun-Keun LEE, Hoon CHUNG
Method and Apparatus for Automatic Speaker-Based Speech Clustering

Publication number: 20140358541

Abstract: Reliable speaker-based clustering of speech utterances allows improved speaker recognition and speaker-based speech segmentation. According to at least one example embodiment, an iterative bottom-up speaker-based clustering approach employs voiceprints of speech utterances, such as i-vectors. At each iteration, a clustering confidence score in terms of Silhouette Width Criterion (SWC) values is evaluated, and a pair of nearest clusters is merged into a single cluster. The pair of nearest clusters merged is determined based on a similarity score indicative of similarity between voiceprints associated with different clusters. A final clustering pattern is then determined as a set of clusters associated with an iteration corresponding to the highest clustering confidence score evaluated. The SWC used may further be a modified SWC enabling detection of an early stop of the iterative approach.

Type: Application

Filed: May 31, 2013

Publication date: December 4, 2014

Inventors: Daniele Ernesto Colibro, Claudio Vair, Kevin R. Farrell
Apparatus and method for analysis of language model changes

Patent number: 8892438

Abstract: An apparatus, a method, and a machine-readable medium are provided for characterizing differences between two language models. A group of utterances from each of a group of time domains are examined. One of a significant word change or a significant word class change within the plurality of utterances is determined. A first cluster of utterances including a word or a word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances. A second cluster of utterances not including the word or the word class corresponding to the one of the significant word change or the significant word class change is generated from the utterances.

Type: Grant

Filed: September 14, 2010

Date of Patent: November 18, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Allen Louis Gorin, John Grothendieck, Jeremy Huntley Greet Wright
Front-end processor for speech recognition, and speech recognizing apparatus and method using the same

Patent number: 8892436

Abstract: A method of recognizing speech is provided. The method includes the operations of (a) dividing first speech that is input to a speech recognizing apparatus into frames; (b) converting the frames of the first speech into frames of second speech by applying conversion rules to the divided frames, respectively; and (c) recognizing, by the speech recognizing apparatus, the frames of the second speech, wherein (b) comprises converting the frames of the first speech into the frames of the second speech by reflecting at least one frame from among the frames that are previously positioned with respect to a frame of the first speech.

Type: Grant

Filed: October 19, 2011

Date of Patent: November 18, 2014

Assignees: Samsung Electronics Co., Ltd., Seoul National University Industry Foundation

Inventors: Ki-wan Eom, Chang-woo Han, Tae-gyoon Kang, Nam-soo Kim, Doo-hwa Hong, Jae-won Lee, Hyung-joon Lim
VOICE PROCESSING DEVICE, VOICE PROCESSING METHOD, AND NON-TRANSITORY RECORDING MEDIUM THAT STORES PROGRAM

Publication number: 20140337027

Abstract: A voice processing device includes: an acquirer which acquires feature quantities of vowel sections included in voice data; a classifier which classifies, among the acquired feature quantities, feature quantities corresponding to a plurality of same vowels into a plurality of clusters for respective vowels with unsupervised classification; and a determiner which determines a combination of clusters corresponding to the same speaker from clusters classified for the plurality of vowels.

Type: Application

Filed: April 11, 2014

Publication date: November 13, 2014

Applicant: CASIO COMPUTER CO., LTD.

Inventor: Hiroyasu IDE
Utilizing multiple processing units for rapid training of hidden markov models

Patent number: 8886535

Abstract: A method of optimizing the calculation of matching scores between phone states and acoustic frames across a matrix of an expected progression of phone states aligned with an observed progression of acoustic frames within an utterance is provided. The matrix has a plurality of cells associated with a characteristic acoustic frame and a characteristic phone state. A first set and second set of cells that meet a threshold probability of matching a first phone state or a second phone state, respectively, are determined. The phone states are stored on a local cache of a first core and a second core, respectively. The first and second sets of cells are also provided to the first core and second core, respectively. Further, matching scores of each characteristic state and characteristic observation of each cell of the first set of cells and of the second set of cells are calculated.

Type: Grant

Filed: January 23, 2014

Date of Patent: November 11, 2014

Assignee: Accumente, LLC

Inventors: Jike Chong, Ian Richard Lane, Senaka Wimal Buthpitiya
Systems and methods for monitoring communications

Patent number: 8880107

Abstract: In one embodiment, a method provides for monitoring and analyzing communications of a monitored user on behalf of a monitoring user, to determine whether the communication includes a violation. For example, SMS messages, MMS messages, IMs, e-mails, social network site postings or voice mails of a child may be monitored on behalf of a parent. In one embodiment, an algorithm is used to analyze a normalized version of the communication, which algorithm is retrained using results of past analysis, to determine a probability of a communication including a violation.

Type: Grant

Filed: January 28, 2011

Date of Patent: November 4, 2014

Assignee: Protext Mobility, Inc.

Inventors: Edward Movsesyan, Igor Slavinsky
UPDATING POPULATION LANGUAGE MODELS BASED ON CHANGES MADE BY USER CLUSTERS

Publication number: 20140316784

Abstract: Technology for improving the predictive accuracy of input word recognition on a device by dynamically updating the lexicon of recognized words based on the word choices made by similar users. The technology collects users' vocabulary choices (e.g., words that each user uses, or adds to or removes from a word recognition dictionary), associates users who make similar choices, aggregates related vocabulary choices, filters the words, and sends words identified as likely choices for that user to the user's device. Clusters may include, for example, users in a particular location (e.g., sets of people who use words such as “Puyallup,” “Gloucester,” or “Waiheke”), users with a particular professional or hobby vocabulary, or application-specific vocabulary (e.g., word choices in map searches or email messages).

Type: Application

Filed: April 24, 2013

Publication date: October 23, 2014

Inventors: Ethan R. Bradford, Simon Corston, David J. Kay, Donni McCray, Keith Trnka
GRAMMAR FRAGMENT ACQUISITION USING SYNTACTIC AND SEMANTIC CLUSTERING

Publication number: 20140303978

Abstract: A method and apparatus are provided for automatically acquiring grammar fragments for recognizing and understanding fluently spoken language. Grammar fragments representing a set of syntactically and semantically similar phrases may be generated using three probability distributions: of succeeding words, of preceding words, and of associated call-types. The similarity between phrases may be measured by applying Kullback-Leibler distance to these tree probability distributions. Phrases being close in all three distances may be clustered into a grammar fragment.

Type: Application

Filed: March 4, 2014

Publication date: October 9, 2014

Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventors: Kazuhiro Arai, Allen L. Gorin, Giuseppe Riccardi, Jeremy H. Wright
Forced/predictable adaptation for speech recognition

Patent number: 8838448

Abstract: A method is described for use with automatic speech recognition using discriminative criteria for speaker adaptation. An adaptation evaluation is performed of speech recognition performance data for speech recognition system users. Adaptation candidate users are identified based on the adaptation evaluation for whom an adaptation process is likely to improve system performance.

Type: Grant

Filed: April 5, 2012

Date of Patent: September 16, 2014

Assignee: Nuance Communications, Inc.

Inventors: Dan Ning Jiang, Vaibhava Goel, Dimitri Kanevsky, Yong Qin
System and method for adapting automatic speech recognition pronunciation by acoustic model restructuring

Patent number: 8812315

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for recognizing speech by adapting automatic speech recognition pronunciation by acoustic model restructuring. The method identifies an acoustic model and a matching pronouncing dictionary trained on typical native speech in a target dialect. The method collects speech from a new speaker resulting in collected speech and transcribes the collected speech to generate a lattice of plausible phonemes. Then the method creates a custom speech model for representing each phoneme used in the pronouncing dictionary by a weighted sum of acoustic models for all the plausible phonemes, wherein the pronouncing dictionary does not change, but the model of the acoustic space for each phoneme in the dictionary becomes a weighted sum of the acoustic models of phonemes of the typical native speech. Finally the method includes recognizing via a processor additional speech from the target speaker using the custom speech model.

Type: Grant

Filed: October 1, 2013

Date of Patent: August 19, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Alistair D. Conkie, Ann K. Syrdal
Speech recognition repair using contextual information

Patent number: 8812316

Abstract: A speech control system that can recognize a spoken command and associated words (such as “call mom at home”) and can cause a selected application (such as a telephone dialer) to execute the command to cause a data processing system, such as a smartphone, to perform an operation based on the command (such as look up mom's phone number at home and dial it to establish a telephone call). The speech control system can use a set of interpreters to repair recognized text from a speech recognition system, and results from the set can be merged into a final repaired transcription which is provided to the selected application.

Type: Grant

Filed: June 5, 2014

Date of Patent: August 19, 2014

Assignee: Apple Inc.

Inventor: Lik Harry Chen
Signal clustering apparatus

Patent number: 8804973

Abstract: In an example signal clustering apparatus, a feature of a signal is divided into segments. A first feature vector of each segment is calculated, the first feature vector having has a plurality of elements corresponding to each reference model. A value of an element attenuates when a feature of the segment shifts from a center of a distribution of the reference model corresponding to the element. A similarity between two reference models is calculated. A second feature vector of each segment is calculated, the second feature vector having a plurality of elements corresponding to each reference model. A value of an element is a weighted sum and segments of second feature vectors of which the plurality of elements are similar values are clustered to one class.

Type: Grant

Filed: March 19, 2012

Date of Patent: August 12, 2014

Assignee: Kabushiki Kaisha Toshiba

Inventors: Makoto Hirohata, Kazunori Imoto, Hisashi Aoki
Language model creation device, language model creation method, and computer-readable storage medium

Patent number: 8788266

Abstract: The present invention uses a language model creation device 200 that creates a new language model using a standard language model created from standard language text. The language model creation device 200 includes a transformation rule storage section 201 that stores transformation rules used for transforming dialect-containing word strings into standard language word strings, and a dialect language model creation section 203 that creates dialect-containing n-grams by applying the transformation rules to word n-grams in the standard language model and, furthermore, creates the new language model (dialect language model) by adding the created dialect-containing n-grams to the word n-grams.

Type: Grant

Filed: March 16, 2010

Date of Patent: July 22, 2014

Assignee: NEC Corporation

Inventors: Tasuku Kitade, Takafumi Koshinaka, Yoshifumi Onishi
System and method for standardized speech recognition infrastructure

Patent number: 8781831

Abstract: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.

Type: Grant

Filed: September 5, 2013

Date of Patent: July 15, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Andrej Ljolje, Bernard S. Renger, Steven Neil Tischer

prev 1 2 3 4 5 6 … next