Abstract: Techniques for generating language models. The techniques include: obtaining language data comprising training data and associated values for one or more metadata attributes, the language data comprising a plurality of instances of language data, an instance of language data comprising an instance of training data and one or more metadata attribute values associated with the instance of training data; identifying, by processing the language data using at least one processor, a set of one or more of the metadata attributes to use for clustering the instances of training data into a plurality of clusters; clustering the training data instances based on their respective values for the identified set of metadata attributes into the plurality of clusters; and generating a language model for each of the plurality of clusters.
Abstract: The disclosed solution includes a method for dynamically switching modalities based upon inferred conditions in a dialog session involving a speech application. The method establishes a dialog session between a user and the speech application. During the dialog session, the user interacts using an original modality and a second modality. The speech application interacts using a speech modality only. A set of conditions indicative of interaction problems using the original modality can be inferred. Responsive to the inferring step, the original modality can be changed to the second modality. A modality transition to the second modality can be transparent the speech application and can occur without interrupting the dialog session. The original modality and the second modality can be different modalities; one including a text exchange modality and another including a speech modality.
Type:
Grant
Filed:
July 6, 2012
Date of Patent:
October 28, 2014
Assignee:
Nuance Communications, Inc.
Inventors:
William V. Da Palma, Baiju D. Mandalia, Victor S. Moore, Wendi L. Nusbickel
Abstract: Device, system, and method of liveness detection using voice biometrics. For example, a method comprises: generating a first matching score based on a comparison between: (a) a voice-print from a first text-dependent audio sample received at an enrollment stage, and (b) a second text-dependent audio sample received at an authentication stage; generating a second matching score based on a text-independent audio sample; and generating a liveness score by taking into account at least the first matching score and the second matching score.
Type:
Grant
Filed:
April 17, 2013
Date of Patent:
October 28, 2014
Assignee:
Nuance Communications, Inc.
Inventors:
Almog Aley-Raz, Nir Moshe Krause, Michael Itzhak Salmon, Ran Yehoshua Gazit
Abstract: A speech recognition system includes distributed processing across a client and server for recognizing a spoken query by a user. A number of different speech models for different languages are used to support and detect a language spoken by a user. In some implementations an interactive electronic agent responds in the user's language to facilitate a real-time, human like dialogue.
Type:
Application
Filed:
April 18, 2014
Publication date:
October 23, 2014
Applicant:
Nuance Communications, Inc.
Inventors:
Ian M. Bennett, Bandi Ramesh Babu, Kishor Morkhandikar, Pallaki Gururaj
Abstract: A text content summary is created from speech content. A focus more signal is issued by a user while receiving the speech content. The focus more signal is associated with a time window, and the time window is associated with a part of the speech content. It is determined whether to use the part of the speech content associated with the time window to generate a text content summary based on a number of the focus more signals that are associated with the time window. The user may express relative significance to different portions of speech content, so as to generate a personal text content summary.
Type:
Grant
Filed:
August 22, 2011
Date of Patent:
October 21, 2014
Assignee:
Nuance Communications, Inc.
Inventors:
Bao Hua Cao, Le He, Xing Jin, Qing Bo Wang, Xin Zhou
Abstract: A system and method for providing automatic and coordinated sharing of conversational resources, e.g., functions and arguments, between network-connected servers and devices and their corresponding applications. In one aspect, a system for providing automatic and coordinated sharing of conversational resources includes a network having a first and second network device, the first and second network device each comprising a set of conversational resources, a dialog manager for managing a conversation and executing calls requesting a conversational service, and a communication stack for communicating messages over the network using conversational protocols, wherein the conversational protocols establish coordinated network communication between the dialog managers of the first and second network device to automatically share the set of conversational resources of the first and second network device, when necessary, to perform their respective requested conversational service.
Type:
Grant
Filed:
September 11, 2012
Date of Patent:
October 21, 2014
Assignee:
Nuance Communications, Inc.
Inventors:
Stephane H. Maes, Ponani S. Gopalakrishnan
Abstract: Network communications, Web-based services and customized services using the Web-based services may be provided over a peer-to-peer network from a first peer to a second peer (e.g., automobile head unit) wherein the first peer has a separate connection to a more general server-based network such as the Internet. A communications device application based on a peer communications framework component in communication with a peer network stack on the communications device may work as middleware, with a connection to both a more general server-based network such as the Internet and to an external device, such as a head unit of an automobile. Although the communications device has a separate connection out to the Internet via a general network stack co-existing on the same communications device, the peer network stack and the general network stack are not directly connected.
Abstract: Methods and apparatus for beamforming and performing echo compensation for the beamformed signal with an echo canceller including calculating a set of filter coefficients as an estimate for a new steering direction without a complete adaptation of the echo canceller.
Type:
Application
Filed:
June 25, 2014
Publication date:
October 16, 2014
Applicant:
NUANCE COMMUNICATIONS, INC.
Inventors:
Markus Buck, Gerhard Uwe Schmidt, Tobias Wolff
Abstract: A method for intercepting calls from a remote or mobile device for customer self-support detects when users or subscribers dial one or more predetermined numbers. If the number corresponds to one of the predetermined numbers (such as a customer support number), the phone may intercept the call and display a list of potential solutions to the subscriber's problems. Various other features and embodiments art disclosed.
Type:
Application
Filed:
May 19, 2014
Publication date:
October 16, 2014
Applicant:
Nuance Communication, Inc.
Inventors:
Brian Roundtree, Kevin Allan, Linda Beinikis, Keldon Rush
Abstract: A method for training a system is provided. The method may include storing one or more backend communication logs, each of the one or more backend communication logs including a user query and a corresponding backend query. The method may further include parsing the one or more backend communication logs to extract statistical information and generating a mapping between each user query and a corresponding set of language tags. The method may also include sorting the one or more backend communication logs based upon, at least in part, the extracted statistical information.
Abstract: Speech-enabled content navigation and control of a distributed multimodal browser is disclosed, the browser providing an execution environment for a multimodal application, the browser including a graphical user agent (‘GUA’) and a voice user agent (‘VUA’), the GUA operating on a multimodal device, the VUA operating on a voice server, that includes: transmitting, by the GUA, a link message to the VUA, the link message specifying voice commands that control the browser and an event corresponding to each voice command; receiving, by the GUA, a voice utterance from a user, the voice utterance specifying a particular voice command; transmitting, by the GUA, the voice utterance to the VUA for speech recognition by the VUA; receiving, by the GUA, an event message from the VUA, the event message specifying a particular event corresponding to the particular voice command; and controlling, by the GUA, the browser in dependence upon the particular event.
Type:
Grant
Filed:
April 12, 2007
Date of Patent:
October 14, 2014
Assignee:
Nuance Communications, Inc.
Inventors:
Soonthorn Ativanichayaphong, Charles W. Cross, Jr., Gerald M. McCobb
Abstract: Establishing a multimodal advertising personality for a sponsor of a multimodal application, including associating one or more vocal demeanors with a sponsor of a multimodal application and presenting a speech portion of the multimodal application for the sponsor using at least one of the vocal demeanors associated with the sponsor.
Abstract: A method and a system for a speech recognition system, comprising an electronic speech-based document is associated with a document template and comprises one or more sections of text recognized or transcribed from sections of speech. The sections of speech are transcribed by the speech recognition system into corresponding sections of text of the electronic speech based document. The method includes the steps of dynamically creating sub contexts and associating the sub context to sections of text of the document template.
Abstract: Techniques for converting spoken speech into written speech are provided. The techniques include transcribing input speech via speech recognition, mapping each spoken utterance from input speech into a corresponding formal utterance, and mapping each formal utterance into a stylistically formatted written utterance.
Type:
Grant
Filed:
May 13, 2011
Date of Patent:
October 7, 2014
Assignee:
Nuance Communications, Inc.
Inventors:
Sara H. Basson, Rick Hamilton, Dan Ning Jiang, Dimitri Kanevsky, David Nahamoo, Michael Picheny, Bhuvana Ramabhadran, Tara N. Sainath
Abstract: A speech analysis system and method for analyzing speech. The system includes: a voice recognition system for converting inputted speech to text; an analytics system for generating feedback information by analyzing the inputted speech and text; and a feedback system for outputting the feedback information.
Abstract: A plurality of clinical facts may be extracted from a free-form narration of a patient encounter provided by a clinician. The plurality of clinical facts may include a first fact and a second fact. The first fact may be extracted from a first portion of the free-form narration, and the second fact may be extracted from a second portion of the free-form narration. A first indicator that indicates a first linkage between the first fact and the first portion of the free-form narration may be provided to a user. A second indicator, different from the first indicator, that indicates a second linkage between the second fact and the second portion of the free-form narration may also be provided to the user.
Type:
Application
Filed:
June 11, 2014
Publication date:
October 2, 2014
Applicant:
Nuance Communications, Inc.
Inventors:
James R. Flanagan, Davide Zaccagnini, Frank Montyne, David Decraene, David Hellman, Matthew R. Shelton, Mariana Casella dos Santos, Karen Anne Doyle, Johan Raedemaeker, Joeri Van der Vloet, Isam H. Habboush, Anthony J. Elcocks, Anush Hartunian
Abstract: An automated arrangement is described for conducting natural language interactions with a human user. A user interface is provided for user communication in a given active natural language interaction with a natural language application during an automated dialog session. An automatic speech recognition (ASR) engine processes unknown user speech inputs from the user interface to produce corresponding speech recognition results. A natural language concept module processes the speech recognition results to develop corresponding natural language concept items. A concept item storage holds selected concept items for reuse in a subsequent natural language interaction with the user during the automated dialog session.
Abstract: An ontology stores information about a domain of an automatic speech recognition (ASR) application program. The ontology is augmented with information that enables subsequent automatic generation of a speech understanding grammar for use by the ASR application program. The information includes hints about how a human might talk about objects in the domain, such as preludes (phrases that introduce an identification of the object) and postludes (phrases that follow an identification of the object).
Abstract: An embodiment according to the invention provides automatic discovery, via Automatic Speech Recognition (ASR) and Voice Biometrics, of the identification of a caller, when the caller is making a phone call from, for example, a residential line. The caller may, for example, initiate a phone call by voice request to a computer or other device. The device initiates the call, but rather than using the conventional technique of determining Calling Name via lookup to the Transaction Capabilities Application Part (TCAP) database, the embodiment uses a technique of ASR in tandem with voice or other biometrics to recognize who within the residence is making the call, and to use the name associated with the requesting caller's voiceprint for determining the Calling Name to display to the called party. Other forms of biometrics, such as image biometrics (e.g., facial or iris biometrics), may alternatively be employed.
Abstract: An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and/or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech, including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats.
Type:
Grant
Filed:
April 3, 2013
Date of Patent:
September 30, 2014
Assignee:
Nuance Communications, Inc.
Inventors:
Raimo Bakis, Ellen Marie Eide, Roberto Pieraccini, Maria E. Smith, Jie Z. Zeng