Specialized Models Patents (Class 704/250)
  • Patent number: 10366693
    Abstract: Disclosed herein are methods of diarizing audio data using first-pass blind diarization and second-pass blind diarization that generate speaker statistical models, wherein the first pass-blind diarization is on a per-frame basis and the second pass-blind diarization is on a per-word basis, and methods of creating acoustic signatures for a common speaker based only on the statistical models of the speakers in each audio session.
    Type: Grant
    Filed: January 22, 2018
    Date of Patent: July 30, 2019
    Assignee: Verint Systems LTD.
    Inventors: Alex Gorodetski, Ido Shapira, Ron Wein, Oana Sidi
  • Patent number: 10304460
    Abstract: According to an embodiment, a conference support system includes a recognizer, a classifier, a first caption controller, a second caption controller, and a display controller. The recognizer is configured to recognize text data corresponding speech from a speech section and configured to distinguish between the speech section and a non-speech section in speech data. The classifier is configured to classify the text data into first utterance data representing a principal utterance and second utterance data representing another utterance. The first caption controller is configured to generate first caption data for displaying the first utterance data without waiting for identification of the first utterance data to finish. The second caption controller is configured to generate second caption data for displaying the second utterance data after identification of the second utterance data finishes. The display controller is configured to control a display of the first caption data and the second caption data.
    Type: Grant
    Filed: February 23, 2017
    Date of Patent: May 28, 2019
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Taira Ashikawa, Kosei Fume, Masayuki Ashikawa, Hiroshi Fujimura
  • Patent number: 10304445
    Abstract: A wearable utterance training system includes a wearable utterance training device. The system may, for example: (1) receive one or more target utterances from the user; (2) detect a use of one of the one or more target utterances by the user; and (3) in response, provide one or more responsive effects. The one or more responsive effects may include, for example: (1) providing one or more shocks to the user using the wearable utterance training device; (2) initiating a transfer of money between an account associated with the user and a third party account; (3) creating a public disclosure of the utterance (e.g., by posting the disclosure on one or more social media websites) and/or (4) playing a recording of the user's use of the target utterance or other sound.
    Type: Grant
    Filed: October 13, 2016
    Date of Patent: May 28, 2019
    Assignee: Viesoft, Inc.
    Inventor: Anthony Vierra
  • Patent number: 10276149
    Abstract: Systems, methods, and devices for dynamically outputting TTS content are disclosed. A speech-controlled device captures a spoken command, and sends audio data corresponding thereto to a server(s). The server(s) determines output content responsive to the spoken command. The server(s) may also determine a user that spoke the command and determine an average speech characteristic (e.g., tone, pitch, speed, number of words, etc.) used by the user when speaking commands. The server(s) may also determine a speech characteristic of the presently spoken command, as well as determine a difference between the speech characteristic of the presently spoken command and the average speech characteristic of the user. The server(s) may then cause the speech-controlled device to output audio based on the difference.
    Type: Grant
    Filed: December 21, 2016
    Date of Patent: April 30, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Nancy Yi Liang, Aaron Takayanagi Barnet
  • Patent number: 10269345
    Abstract: This relates to systems and processes for operating an automated assistant to process messages. In one example process, an electronic device receives a communication including a text string and determines whether a portion of the text string is associated with a data type of a plurality of data types. The data type is associated with at least one task. In accordance with a determination that the portion of the text string is associated with the data type, the electronic device receives a user input indicative of a task of the at least one task, and in response, causes the task to be performed based on the portion of the text string. In accordance with a determination that the portion of the text string is not associated with the data type, the electronic device foregoes causing the task to be performed based on the portion of the text string.
    Type: Grant
    Filed: September 19, 2016
    Date of Patent: April 23, 2019
    Assignee: Apple Inc.
    Inventors: Jose A. Castillo Sanchez, Garett R. Nell, Kimberly D. Beverett
  • Patent number: 10249314
    Abstract: A voice conversion system for generating realistic, natural-sounding target speech is disclosed. The voice conversion system preferably comprises a neural network for converting the source speech data to estimated target speech data; a global variance correction module; a modulation spectrum correction module; and a waveform generator. The global variance correction module is configured to scale and shift (or normalize and de-normalize) the estimated target speech based on (i) a mean and standard deviation of the source speech data, and further based on (ii) a mean and standard deviation of the estimated target speech data. The modulation spectrum correction module is configured to apply a plurality of filters to the estimated target speech data after it has been scaled and shifted by the global variance correction module. Each filter is designed to correct the trajectory representing the curve of one MCEP coefficient over time.
    Type: Grant
    Filed: July 21, 2017
    Date of Patent: April 2, 2019
    Assignee: OBEN, INC.
    Inventor: Sandesh Aryal
  • Patent number: 10176163
    Abstract: Embodiments herein include a natural language computing system that provides a diagnosis for a participant in the conversation which indicates the likelihood that the participant exhibited a symptom of autism. To provide the diagnosis, the computing system includes a diagnosis system that performs a training process to generate a machine learning model which is then used to evaluate a textual representation of the conversation. For example, the diagnosis system may receive one or more examples of baseline conversations that exhibit symptoms of autisms and those that do not. The diagnosis system may annotate and the baseline conversations and identify features that are used to identify the symptoms of autism. The system generates a machine learning model that weights the features according to whether the identified features are, or are not, an indicator of autism.
    Type: Grant
    Filed: December 19, 2014
    Date of Patent: January 8, 2019
    Assignee: International Business Machines Corporation
    Inventors: Adam T. Clark, Brian J. Cragun, Anthony W. Eichenlaub, John E. Petri, John C. Unterholzner
  • Patent number: 10158593
    Abstract: Non-limiting examples of the present disclosure describe proactive action by an intelligent personal assistant application/service to improve functionality of one or more applications. In one example, an intelligent personal assistant service may interface with a messaging application to analyze a message thread within the messaging application. The intelligent personal assistant service may analyze the message thread by evaluating context of message content within the message thread. Analysis of the message thread may occur proactively without requiring an explicit request for assistance from a user of a processing device. In response to the analyzing of the message thread, the intelligent personal assistant service may proactively provide a cue that includes content retrieved by the intelligent personal assistant service. An input may be received to include the cue within the message thread. In response to receiving the input, the cue may be displayed within the message thread.
    Type: Grant
    Filed: April 8, 2016
    Date of Patent: December 18, 2018
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Zachary Adam Pfriem, Mark Jozef Vitazko, Jared Frew, Jason Michael Nelson
  • Patent number: 10102760
    Abstract: This disclosure is directed to a system includes a receiver configured to receive audio data from a vehicle. In some examples, the system includes processing circuitry configured to determine an expected maneuver for the vehicle based on the audio data. In some examples, the processing circuitry is further configured to determine whether to output an alert based on the expected maneuver determined from the audio data.
    Type: Grant
    Filed: August 23, 2017
    Date of Patent: October 16, 2018
    Assignee: Honeywell International Inc.
    Inventors: Stanislav Foltan, Robert Sosovicka, Eva Josth Adamova
  • Patent number: 10048079
    Abstract: A destination determination device for a vehicle includes: a communication unit that performs a wireless communication with a mobile terminal used by an occupant in the vehicle and having a destination search function; a search condition acquisition unit that acquires destination search conditions from the mobile terminal through the communication unit; a search unit that performs a destination search based on an AND search of a combined search condition in which a plurality of acquired destination search conditions are combined together when the search condition acquisition unit acquires the plurality of destination search condition; and a search result output unit that outputs a search result of the destination search performed by the search unit under the combined search condition.
    Type: Grant
    Filed: June 2, 2015
    Date of Patent: August 14, 2018
    Assignee: DENSO CORPORATION
    Inventors: Takamitsu Suzuki, Takahira Katoh, Takeshi Yamamoto, Yuuko Nakamura
  • Patent number: 9967724
    Abstract: A method and apparatus for changing a persona of a digital assistant is provided herein. During operation a digital assistant will determine a public-safety incident type and then change its persona based on the public-safety incident type.
    Type: Grant
    Filed: May 8, 2017
    Date of Patent: May 8, 2018
    Assignee: MOTOROLA SOLUTIONS, INC.
    Inventors: Guo Dong Gan, Kong Yong Foo, Mun Yew Tham, Bing Qin Lim
  • Patent number: 9916830
    Abstract: Techniques for altering audio being output by a voice-controlled device, or another device, to enable more accurate automatic speech recognition (ASR) by the voice-controlled device. For instance, a voice-controlled device may output audio within an environment using a speaker of the device. While outputting the audio, a microphone of the device may capture sound within the environment and may generate an audio signal based on the captured sound. The device may then analyze the audio signal to identify speech of a user within the signal, with the speech indicating that the user is going to provide a subsequent command to the device. Thereafter, the device may alter the output of the audio (e.g., attenuate the audio, pause the audio, switch from stereo to mono, etc.) to facilitate speech recognition of the user's subsequent command.
    Type: Grant
    Filed: January 13, 2016
    Date of Patent: March 13, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: Gregory Michael Hart, William Spencer Worley, III
  • Patent number: 9711145
    Abstract: The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word.
    Type: Grant
    Filed: November 14, 2016
    Date of Patent: July 18, 2017
    Assignee: Google Inc.
    Inventors: Michael J. LeBeau, William J. Byrne, John Nicholas Jitkoff, Brandon M. Ballinger, Trausti T. Kristjansson
  • Patent number: 9679569
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a dynamic threshold for speaker verification are disclosed. In one aspect, a method includes the actions of receiving, for each of multiple utterances of a hotword, a data set including at least a speaker verification confidence score, and environmental context data. The actions further include selecting from among the data sets, a subset of the data sets that are associated with a particular environmental context. The actions further include selecting a particular data set from among the subset of data sets based on one or more selection criteria. The actions further include selecting, as a speaker verification threshold for the particular environmental context, the speaker verification confidence score. The actions further include providing the speaker verification threshold for use in performing speaker verification of utterances that are associated with the particular environmental context.
    Type: Grant
    Filed: November 3, 2016
    Date of Patent: June 13, 2017
    Assignee: Google Inc.
    Inventors: Jakob Nicolaus Foerster, Diego Melendo Casado
  • Patent number: 9495127
    Abstract: Methods, computer program products and systems are described for converting speech to text. Sound information is received at a computer server system from an electronic device, where the sound information is from a user of the electronic device. A context identifier indicates a context within which the user provided the sound information. The context identifier is used to select, from among multiple language models, a language model appropriate for the context. Speech in the sound information is converted to text using the selected language model. The text is provided for use by the electronic device.
    Type: Grant
    Filed: December 22, 2010
    Date of Patent: November 15, 2016
    Assignee: Google Inc.
    Inventors: Brandon M. Ballinger, Johan Schalkwyk, Michael H. Cohen, Cyril Georges Luc Allauzen
  • Patent number: 9465794
    Abstract: Disclose is a mobile terminal and control method thereof for inputting a voice to automatically generate a message to be sent during conversation using a mobile messenger, and it may include a microphone for inputting a user's voice, a display unit for displaying a mobile messenger; and a controller for inputting and recognizing a user's voice when a mobile messenger is implemented and then converting into a message to display the message on a message input window of the mobile messenger, and sending the displayed message to the other party which has been preset, and displaying the message sent to the other party and a message received from the other party in the sending and receiving order on a send/receive display window of the mobile messenger.
    Type: Grant
    Filed: May 17, 2010
    Date of Patent: October 11, 2016
    Assignee: LG ELECTRONICS INC.
    Inventors: Sun-Hwa Cha, Jong-Keun Youn
  • Patent number: 9460722
    Abstract: In a method of diarization of audio data, audio data is segmented into a plurality of utterances. Each utterance is represented as an utterance model representative of a plurality of feature vectors. The utterance models are clustered. A plurality of speaker models are constructed from the clustered utterance models. A hidden Markov model is constructed of the plurality of speaker models. A sequence of identified speaker models is decoded.
    Type: Grant
    Filed: June 30, 2014
    Date of Patent: October 4, 2016
    Assignee: Verint Systems Ltd.
    Inventors: Oana Sidi, Ron Wein
  • Patent number: 9412392
    Abstract: An electronic device may capture a voice command from a user. The electronic device may store contextual information about the state of the electronic device when the voice command is received. The electronic device may transmit the voice command and the contextual information to computing equipment such as a desktop computer or a remote server. The computing equipment may perform a speech recognition operation on the voice command and may process the contextual information. The computing equipment may respond to the voice command. The computing equipment may also transmit information to the electronic device that allows the electronic device to respond to the voice command.
    Type: Grant
    Filed: January 27, 2014
    Date of Patent: August 9, 2016
    Assignee: Apple Inc.
    Inventor: Aram M. Lindahl
  • Patent number: 9379884
    Abstract: A symbol clock recovery circuit comprising an ADC, a controllable inverter and a timing detector. A timing detector input terminal is configured to receive an ADC output signal from an ADC output terminal; a timing detector output terminal is configured to provide a digital output signal; and a first timing detector feedback terminal is configured to provide a first feedback signal to the inverter control terminal. The timing detector is configured to determine an error signal associated with the received ADC output signal, and set the first feedback signal in accordance with the error signal.
    Type: Grant
    Filed: May 1, 2015
    Date of Patent: June 28, 2016
    Assignee: NXP B.V.
    Inventors: Massimo Ciacci, Ghiath Al-kadi, Remco van de Beek
  • Patent number: 9378729
    Abstract: Features are disclosed for applying maximum likelihood methods to channel normalization in automatic speech recognition (“ASR”). Feature vectors computed from an audio input of a user utterance can be compared to a Gaussian mixture model. The Gaussian that corresponds to each feature vector can be determined, and statistics (e.g., constrained maximum likelihood linear regression statistics) can then be accumulated for each feature vector. Using these statistics, or some subset thereof, offsets and/or a diagonal transform matrix can be computed for each feature vector. The offsets and/or diagonal transform matrix can be applied to the corresponding feature vector to generate a feature vector normalized based on maximum likelihood methods. The ASR process can then proceed using the transformed feature vectors.
    Type: Grant
    Filed: March 12, 2013
    Date of Patent: June 28, 2016
    Assignee: Amazon Technologies, Inc.
    Inventor: Stan Weidner Salvador
  • Patent number: 9223863
    Abstract: Disclosed are various systems, methods, and programs embodied in a computer-readable medium for sound analysis. The sound analysis involves transforming a sound print into a frequency domain in a memory to generate a frequency spectrum. A plurality of signatures are identified in the frequency spectrum. Also, a plurality of frequency ranges associated with the signatures are identified in the sound print. The frequencies associated with a physiological profile are cross-referenced with the frequency ranges to determine if the physiological profile is applicable to the sound print.
    Type: Grant
    Filed: December 5, 2012
    Date of Patent: December 29, 2015
    Assignee: Dean Enterprises, LLC
    Inventor: Vickie A. Dean
  • Patent number: 9122453
    Abstract: The disclosed embodiments illustrate methods and systems for processing one or more crowdsourced tasks. The method comprises converting an audio input received from a crowdworker to one or more phrases by one or more processors in at least one computing device. The audio input is at least a response to a crowdsourced task. A mode of the audio input is selected based on one or more parameters associated with the crowdworker. Thereafter, the one or more phrases are presented on a display of the at least one computing device by the one or more processors. Finally, one of the one or more phrases is selected by the crowdworker as a correct response to the crowdsourced task.
    Type: Grant
    Filed: July 16, 2013
    Date of Patent: September 1, 2015
    Assignee: Xerox Corporation
    Inventor: Shailesh Vaya
  • Patent number: 9075870
    Abstract: A system for detecting related topics and competition topics for a target topic includes an information extracting apparatus configured to create topic templates and association words from documents created online to generate topic templates and association words. The system also includes a related topic detecting apparatus configured to detect and trace related topics and competition topics for the target topic based on the topic templates and the association words.
    Type: Grant
    Filed: September 12, 2012
    Date of Patent: July 7, 2015
    Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventor: Chung Hee Lee
  • Patent number: 9043207
    Abstract: The present invention relates to a method for speaker recognition, comprising the steps of obtaining and storing speaker information for at least one target speaker; obtaining a plurality of speech samples from a plurality of telephone calls from at least one unknown speaker; classifying the speech samples according to the at least one unknown speaker thereby providing speaker-dependent classes of speech samples; extracting speaker information for the speech samples of each of the speaker-dependent classes of speech samples; combining the extracted speaker information for each of the speaker-dependent classes of speech samples; comparing the combined extracted speaker information for each of the speaker-dependent classes of speech samples with the stored speaker information for the at least one target speaker to obtain at least one comparison result; and determining whether one of the at least one unknown speakers is identical with the at least one target speaker based on the at least one comparison result.
    Type: Grant
    Filed: November 12, 2009
    Date of Patent: May 26, 2015
    Assignee: Agnitio S.L.
    Inventors: Johan Nikolaas Langehoven Brummer, Luis Buera Rodriguez, Marta Garcia Gomar
  • Patent number: 9026431
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for semantic parsing with multiple parsers. One of the methods includes obtaining one or more transcribed prompt n-grams from a speech to text recognizer, providing the transcribed prompt n-grams to a first semantic parser that executes on the user device and accesses a first knowledge base for results responsive to the spoken prompt, providing the transcribed prompt n-grams to a second semantic parser that accesses a second knowledge base for results responsive to the spoken prompt, the first knowledge base including first data not included in the second knowledge base, receiving a result responsive to the spoken prompt from the first semantic parser or the second semantic parser, wherein the result is selected from the knowledge base associated with the semantic parser that provided the result to the user device, and performing an operation based on the result.
    Type: Grant
    Filed: July 30, 2013
    Date of Patent: May 5, 2015
    Assignee: Google Inc.
    Inventors: Pedro J. Moreno Mengibar, Diego Melendo Casado, Fadi Biadsy
  • Patent number: 9009045
    Abstract: Methods and systems for model-driven candidate sorting for evaluating digital interviews are described. In one embodiment, a model-driven candidate-sorting tool selects a data set of digital interview data for sorting. The data set includes candidate for interviewing candidates (also referred to herein as interviewees). The model-driven candidate-sorting tool analyzes the candidate data for the respective interviewing candidate to identify digital interviewing cues and applies the digital interview cues to a prediction model to predict an achievement index for the respective interviewing candidate. This is performed without reviewer input at the model-driven candidate-sorting tool. The list of interview candidates is sorted according the predicted achievement indices and the sorted list is presented to the reviewer in a user interface.
    Type: Grant
    Filed: February 18, 2014
    Date of Patent: April 14, 2015
    Assignee: HireVue, Inc.
    Inventors: Loren Larsen, Benjamin Taylor
  • Patent number: 9002709
    Abstract: Provided is a voice recognition system capable of, while suppressing negative influences from sound not to be recognized, correctly estimating utterance sections that are to be recognized. A voice segmenting means calculates voice feature values, and segments voice sections or non-voice sections by comparing the voice feature values with a threshold value. Then, the voice segmenting means determines, to be first voice sections, those segmented sections or sections obtained by adding a margin to the front and rear of each of those segmented sections. On the basis of voice and non-voice likelihoods, a search means determines, to be second voice sections, sections to which voice recognition is to be applied. A parameter updating means updates the threshold value and the margin. The voice segmenting means determines the first voice sections by using the one of the threshold value and the margin which has been updated by the parameter updating means.
    Type: Grant
    Filed: November 26, 2010
    Date of Patent: April 7, 2015
    Assignee: NEC Corporation
    Inventor: Takayuki Arakawa
  • Patent number: 9002707
    Abstract: An information processing apparatus includes: a plurality of information input units; an event detection unit that generates event information including estimated position information and estimated identification information of users present in the real space based on analysis of the information from the information input unit; and an information integration processing unit that inputs the event information, and generates target information including a position of each user and user identification information based on the input event information, and signal information representing a probability value of the event generation source, wherein the information integration processing unit includes an utterance source probability calculation unit, and wherein the utterance source probability calculation unit performs a process of calculating an utterance source score as an index value representing an utterance source probability of each target by multiplying weights based on utterance situations by a plurality of d
    Type: Grant
    Filed: November 6, 2012
    Date of Patent: April 7, 2015
    Assignee: Sony Corporation
    Inventor: Keiichi Yamada
  • Patent number: 8996373
    Abstract: A state detection device includes: a first model generation unit to generate a first specific speaker model obtained by modeling speech features of a specific speaker in an undepressed state; a second model generation unit to generate a second specific speaker model obtained by modeling speech features of the specific speaker in the depressed state; a likelihood calculation unit to calculate a first likelihood as a likelihood of the first specific speaker model with respect to input voice, and a second likelihood as a likelihood of the second specific speaker model with respect to the input voice; and a state determination unit to determine a state of the speaker of the input voice using the first likelihood and the second likelihood.
    Type: Grant
    Filed: October 5, 2011
    Date of Patent: March 31, 2015
    Assignee: Fujitsu Limited
    Inventors: Shoji Hayakawa, Naoshi Matsuo
  • Patent number: 8996387
    Abstract: For clearing transaction data selected for a processing, there is generated in a portable data carrier (1) a transaction acoustic signal (003; 103; 203) (S007; S107; S207) upon whose acoustic reproduction by an end device (10) at least transaction data selected for the processing are reproduced superimposed acoustically with a melody specific to a user of the data carrier (1) (S009; S109; S209). The generated transaction acoustic signal (003; 103; 203) is electronically transferred to an end device (10) (S108; S208), which processes the selected transaction data (S011; S121; S216) only when the user of the data carrier (1) confirms vis-à-vis the end device (10) an at least partial match both of the acoustically reproduced melody with the user-specific melody and of the acoustically reproduced transaction data with the selected transaction data (S010; S110, S116; S210).
    Type: Grant
    Filed: September 8, 2009
    Date of Patent: March 31, 2015
    Assignee: Giesecke & Devrient GmbH
    Inventors: Thomas Stocker, Michael Baldischweiler
  • Patent number: 8977547
    Abstract: A voice recognition system includes: a voice input unit 11 for inputting a voice uttered a plurality of times; a registering voice data storage unit 12 for storing voice data uttered the plurality of times and input into the voice input unit 11; an utterance stability verification unit 13 for determining a similarity between the voice data uttered the plurality of times that are read from the registering voice data storage unit 12, and determining that registration of the voice data is acceptable when the similarity is greater than a threshold Tl; and a standard pattern creation unit 14 for creating a standard pattern by using the voice data where the utterance stability verification unit 13 determines that registration is acceptable.
    Type: Grant
    Filed: October 8, 2009
    Date of Patent: March 10, 2015
    Assignee: Mitsubishi Electric Corporation
    Inventors: Michihiro Yamazaki, Jun Ishii, Hiroki Sakashita, Kazuyuki Nogi
  • Patent number: 8972266
    Abstract: A speaker intent analysis system and method for validating the truthfulness and intent of a plurality of participants' responses to questions. A computer stores, retrieves, and transmits a series of questions to be answered audibly by participants. The participants' answers are received by a data processor. The data processor analyzes and records the participants' speech parameters for determining the likelihood of dishonesty. In addition to analyzing participants' speech parameters for distinguishing stress or other abnormality, the processor may be equipped with voice recognition software to screen responses that while not dishonest, are indicative of possible malfeasance on the part of the participants. Once the responses are analyzed, the processor produces an output that is indicative of the participant's credibility. The output may be sent to proper parties and/or devices such as a web page, computer, e-mail, PDA, pager, database, report, etc. for appropriate action.
    Type: Grant
    Filed: June 12, 2012
    Date of Patent: March 3, 2015
    Inventor: David Bezar
  • Patent number: 8954327
    Abstract: A voice data analyzing device comprises speaker model deriving means which derives speaker models as models each specifying character of voice of each speaker from voice data including a plurality of utterances to each of which a speaker label as information for identifying a speaker has been assigned and speaker co-occurrence model deriving means which derives a speaker co-occurrence model as a model representing the strength of co-occurrence relationship among the speakers from session data obtained by segmenting the voice data in units of sequences of conversation by use of the speaker models derived by the speaker model deriving means.
    Type: Grant
    Filed: June 3, 2010
    Date of Patent: February 10, 2015
    Assignee: NEC Corporation
    Inventor: Takafumi Koshinaka
  • Patent number: 8935169
    Abstract: According to one embodiment, an electronic apparatus includes an acquiring module and a display process module. The acquiring module is configured to acquire information regarding a plurality of persons using information of video content data, the plurality of persons appearing in a plurality of sections in the video content data. The display process module is configured to display (i) a time bar representative of a sequence of the video content data, (ii) information regarding a first person appearing in a first section of the sections, and (iii) information regarding a second person different from the first person, the second person appearing in a second section of the sections. The first area of the time bar corresponds to the first section is displayed in a first form, and a second area of the time bar corresponds to the second section is displayed in a second form different from the first form.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: January 13, 2015
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Tetsuya Fujii
  • Patent number: 8935151
    Abstract: A source language sentence is tagged with non-lexical tags, such as part-of-speech tags and is parsed using a lexicalized parser trained in the source language. A target language sentence that is a translation of the source language sentence is tagged with non-lexical labels (e.g., part-of speech tags) and is parsed using a delexicalized parser that has been trained in the source language to produce k-best parses. The best parse is selected based on the parse's alignment with lexicalized parse of the source language sentence. The selected best parse can be used to update the parameter vector of a lexicalized parser for the target language.
    Type: Grant
    Filed: December 7, 2011
    Date of Patent: January 13, 2015
    Assignee: Google Inc.
    Inventors: Slav Petrov, Ryan McDonald, Keith Hall
  • Publication number: 20140334682
    Abstract: A monitoring device is provided, which includes an inputter configured to receive an input of a plurality of images captured at separate positions and a plurality of sound sources heard at separate positions, a saliency map generator configured to generate a plurality of mono saliency maps for the plurality of images and to generate a dynamic saliency map using the plurality of mono saliency maps generated, a position determinator configured to determine the positions of the sound sources through analysis of the plurality of sound sources, a scan path recognizer configured to generate scan paths of the plurality of images based on the generated dynamic saliency map and the determined positions of the sound sources, and an outputter configured to output the generated scan paths.
    Type: Application
    Filed: December 5, 2012
    Publication date: November 13, 2014
    Applicants: KYUNGPOCK NATIONAL INDUSRTY ACADEMIC COOPERATION FOUNDATION, INDUSTRY-UNIVERSITY COOPERATION FOUNDATION SOGANG UNIVERSITY
    Inventors: Minho Lee, Young-Min Jang, Sungmoon Jeong, Bumhwi Kim, Hyung-Min Park, Minook Kim
  • Patent number: 8874442
    Abstract: Device, system, and method of liveness detection using voice biometrics. For example, a method comprises: generating a first matching score based on a comparison between: (a) a voice-print from a first text-dependent audio sample received at an enrollment stage, and (b) a second text-dependent audio sample received at an authentication stage; generating a second matching score based on a text-independent audio sample; and generating a liveness score by taking into account at least the first matching score and the second matching score.
    Type: Grant
    Filed: April 17, 2013
    Date of Patent: October 28, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Almog Aley-Raz, Nir Moshe Krause, Michael Itzhak Salmon, Ran Yehoshua Gazit
  • Patent number: 8868409
    Abstract: In some implementations, audio data for an utterance is provided over a network. At a client device and over the network, information is received that indicates candidate transcriptions for the utterance and semantic information for the candidate transcriptions. A semantic parser is used at the client device to evaluate each of at least a plurality of the candidate transcriptions. One of the candidate transcriptions is selected based on at least the received semantic information and the output of the semantic parser for the plurality of candidate transcriptions that are evaluated.
    Type: Grant
    Filed: January 16, 2014
    Date of Patent: October 21, 2014
    Assignee: Google Inc.
    Inventors: Pedro J. Moreno Mengibar, Fadi Biadsy, Diego Melendo Casado
  • Patent number: 8854232
    Abstract: A handheld electronic device includes a reduced QWERTY keyboard and is enabled with disambiguation software. The device provides output in the form of a default output and a number of variants. The output is based largely upon the frequency, i.e., the likelihood that a user intended a particular output, but various features of the device provide additional variants that are not based solely on frequency and rather are provided by various logic structures resident on the device. The device enables editing during text entry and also provides a learning function that allows the disambiguation function to adapt to provide a customized experience for the user. The disambiguation function can be selectively disabled and an alternate keystroke interpretation system provided.
    Type: Grant
    Filed: June 30, 2011
    Date of Patent: October 7, 2014
    Assignee: BlackBerry Limited
    Inventors: Vadim Fux, Michael G. Elizarov, Sergey V. Kolomiets
  • Patent number: 8843372
    Abstract: A system for analyzing conversations or speech, especially “turns” (a point in time in a person's or an animal's talk when another may or does speak) comprises a computer (100) with a memory (105), at least one microphone (115, 120), and software (110) running in the computer. The system is arranged to recognize and quantify utterances including spoken words, pauses between words, in-breaths, vowel extensions, and the like, and to recognize questions and sentences. The system rapidly and efficiently quantifies and qualifies speech and thus offers substantial improvement over prior-art computerized response systems that use traditional linguistic approaches that depend on single words or a small number of words in grammatical sentences as a basic unit of analysis. The system and method are useful in many applications, including teaching colloquial use of turn-taking, and in forensic linguistics.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: September 23, 2014
    Inventor: Herbert M. Isenberg
  • Patent number: 8831942
    Abstract: A method is provided for identifying a gender of a speaker. The method steps include obtaining speech data of the speaker, extracting vowel-like speech frames from the speech data, analyzing the vowel-like speech frames to generate a feature vector having pitch values corresponding to the vowel-like frames, analyzing the pitch values to generate a most frequent pitch value, determining, in response to the most frequent pitch value being between a first pre-determined threshold and a second pre-determined threshold, an output of a male Gaussian Mixture Model (GMM) and an output of a female GMM using the pitch values as inputs to the male GMM and the female GMM, and identifying the gender of the speaker by comparing the output of the male GMM and the output of the female GMM based on a pre-determined criterion.
    Type: Grant
    Filed: March 19, 2010
    Date of Patent: September 9, 2014
    Assignee: Narus, Inc.
    Inventor: Antonio Nucci
  • Patent number: 8825482
    Abstract: Consumer electronic devices have been developed with enormous information processing capabilities, high quality audio and video outputs, large amounts of memory, and may also include wired and/or wireless networking capabilities. Additionally, relatively unsophisticated and inexpensive sensors, such as microphones, video camera, GPS or other position sensors, when coupled with devices having these enhanced capabilities, can be used to detect subtle features about users and their environments. A variety of audio, video, simulation and user interface paradigms have been developed to utilize the enhanced capabilities of these devices. These paradigms can be used separately or together in any combination. One paradigm automatically creating user identities using speaker identification. Another paradigm includes a control button with 3-axis pressure sensitivity for use with game controllers and other input devices.
    Type: Grant
    Filed: September 15, 2006
    Date of Patent: September 2, 2014
    Assignee: Sony Computer Entertainment Inc.
    Inventors: Gustavo Hernandez-Abrego, Xavier Menendez-Pidal, Steven Osman, Ruxin Chen, Rishi Deshpande, Care Michaud-Wideman, Richard Marks, Eric Larsen, Xiaodong Mao
  • Patent number: 8818810
    Abstract: A method for verifying that a person is registered to use a telemedical device includes identifying an unprompted trigger phrase in words spoken by a person and received by the telemedical device. The telemedical device prompts the person to state a name of a registered user and optionally prompts the person to state health tips for the person. The telemedical device verifies that the person is the registered user using utterance data generated from the unprompted trigger phrase, name of the registered user, and health tips.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: August 26, 2014
    Assignee: Robert Bosch GmbH
    Inventors: Fuliang Weng, Taufiq Hasan, Zhe Feng
  • Patent number: 8812318
    Abstract: One-to-many comparisons of callers' voice prints with known voice prints to identify any matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract at least a portion of the customer's voice to create a customer voice print, and it formats the segmented voice print for network transmission to a server. The server compares the customer's voice print with multiple known voice prints to determine any matches, meaning that the customer's voice print and one of the known voice prints are likely from the same person. The identification of any matches can be used for a variety of purposes, such as determining whether to authorize a transaction requested by the customer.
    Type: Grant
    Filed: February 6, 2012
    Date of Patent: August 19, 2014
    Assignee: III Holdings 1, LLC
    Inventors: Vicki Broman, Vernon Marshall, Seshasayee Bellamkonda, Marcel Leyva, Cynthia Hanson
  • Patent number: 8805685
    Abstract: Disclosed herein are systems, methods, and tangible computer readable-media for detecting synthetic speaker verification. The method comprises receiving a plurality of speech samples of the same word or phrase for verification, comparing each of the plurality of speech samples to each other, denying verification if the plurality of speech samples demonstrate little variance over time or are the same, and verifying the plurality of speech samples if the plurality of speech samples demonstrates sufficient variance over time. One embodiment further adds that each of the plurality of speech samples is collected at different times or in different contexts. In other embodiments, variance is based on a pre-determined threshold or the threshold for variance is adjusted based on a need for authentication certainty. In another embodiment, if the initial comparison is inconclusive, additional speech samples are received.
    Type: Grant
    Filed: August 5, 2013
    Date of Patent: August 12, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Horst J. Schroeter
  • Publication number: 20140222428
    Abstract: Most speaker recognition systems use i-vectors which are compact representations of speaker voice characteristics. Typical i-vector extraction procedures are complex in terms of computations and memory usage. According to an embodiment, a method and corresponding apparatus for speaker identification, comprise determining a representation for each component of a variability operator, representing statistical inter- and intra-speaker variability of voice features with respect to a background statistical model, in terms of a linear operator common to all components of the variability operator and having a first dimension larger than a second dimension of the components of the variability operator; computing statistical voice characteristics of a particular speaker using the determined representations; and employing the statistical voice characteristics of the particular speaker in performing speaker recognition.
    Type: Application
    Filed: April 4, 2013
    Publication date: August 7, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Sandro Cumani, Pietro Laface
  • Patent number: 8798994
    Abstract: The present invention discloses a solution for conserving computing resources when implementing transformation based adaptation techniques. The disclosed solution limits the amount of speech data used by real-time adaptation algorithms to compute a transformation, which results in substantial computational savings. Appreciably, application of a transform is a relatively low memory and computationally cheap process compared to memory and resource requirements for computing the transform to be applied.
    Type: Grant
    Filed: February 6, 2008
    Date of Patent: August 5, 2014
    Assignee: International Business Machines Corporation
    Inventors: John W. Eckhart, Michael Florio, Radek Hampl, Pavel Krbec, Jonathan Palgon
  • Patent number: 8793127
    Abstract: In addition to conveying primary information, human speech also conveys information concerning the speaker's gender, age, socioeconomic status, accent, language spoken, emotional state, or other personal characteristics, which is referred to as secondary information. Disclosed herein are both the means of automatic discovery and use of such secondary information to direct other aspects of the behavior of a controlled system. One embodiment of the invention comprises an improved method to determine, with high reliability, the gender of an adult speaker. A further embodiment of the invention comprises the use of this information to display a gender-appropriate advertisement to the user of an information retrieval system that uses a cell phone as the input and output device.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: July 29, 2014
    Assignee: Promptu Systems Corporation
    Inventors: Harry Printz, Vikas Gulati
  • Patent number: 8775178
    Abstract: Updating a voice template for recognizing a speaker on the basis of a voice uttered by the speaker is disclosed. Stored voice templates indicate distinctive characteristics of utterances from speakers. Distinctive characteristics are extracted for a specific speaker based on a voice message utterance received from that speaker. The distinctive characteristics are compared to the characteristics indicated by the stored voice templates to selected a template that matches within a predetermined threshold. The selected template is updated on the basis of the extracted characteristics.
    Type: Grant
    Filed: October 27, 2009
    Date of Patent: July 8, 2014
    Assignee: International Business Machines Corporation
    Inventors: Yukari Miki, Masami Noguchi
  • Publication number: 20140188468
    Abstract: An apparatus, system and method for calculating passphrase variability are disclosed. The passphrase variability value can then be used for generating phonetically rich passwords in text-dependent speaker recognition systems, or for estimating the variability of the input passphrase in text-independent system during the enrolling process in a speech recognition security system.
    Type: Application
    Filed: December 28, 2012
    Publication date: July 3, 2014
    Inventors: Dmitry Dyrmovskiy, Mikhail Khitrov