Patents Assigned to Nuance Communications
-
Patent number: 10276181Abstract: A method, computer program product, and computer system for addressing acoustic signal reverberation is provided. Embodiments may include receiving, at one or more microphones, a first audio signal and a reverberation audio signal. Embodiments may further include processing at least one of the first audio signal and the reverberation audio signal. Embodiments may also include limiting a model based reverberation equalizer using a temporal constraint for direct sound distortions, the model based reverberation equalizer configured to generate one or more outputs, based upon, at least in part, at least one of the first audio signal and the reverberation audio signal.Type: GrantFiled: September 5, 2017Date of Patent: April 30, 2019Assignee: Nuance Communications, Inc.Inventors: Tobias Wolff, Lars Tebelmann
-
Patent number: 10276157Abstract: Some embodiments provide techniques performed by at least one voice agent. The techniques include receiving voice input; identifying at least one application program as relating to the received voice input; and displaying at least one selectable visual representation that, when selected, causes focus of the computing device to be directed to the at least one application program identified as relating to the received voice input.Type: GrantFiled: October 1, 2012Date of Patent: April 30, 2019Assignee: Nuance Communications, Inc.Inventors: Timothy Lynch, Sean P. Brown, Paweena Attayadmawittaya, Tiago Goncalves Cabaco, Victor Shine Chen
-
Patent number: 10276166Abstract: A method of detecting an occurrence of splicing in a speech signal includes comparing one or more discontinuities in the test speech signal to one or more reference speech signals corresponding to the test speech signal. The method may further include calculating a frame-based spectral-like representation ST of the speech signal, and calculating a frame-based spectral-like representation SE of a reference speech signal corresponding to the speech signal. The method further includes aligning ST and SE in time and frequency, calculating a distance function associated with aligned ST and SE, and evaluating the distance function to determine a score. The method also includes comparing the score to a threshold to detect if splicing occurs in the speech signal.Type: GrantFiled: July 22, 2014Date of Patent: April 30, 2019Assignee: Nuance Communications, Inc.Inventors: Zvi Kons, Ron Hoory, Hagai Aronowitz
-
Publication number: 20190109880Abstract: Methods and apparatus for communicating between virtual agents associated with users of electronic devices connected via at least one network. A first user may instruct an associated first virtual agent to invoke a communication session with a second virtual agent associated with a second user. To invoke the communication session, the first virtual agent may send an outgoing communication to the second virtual agent and the outgoing communication may instruct the second virtual agent to perform at least one action on behalf of the first user. Virtual agents associated with different users may alternatively communicate with each other in the absence of user interaction to perform a collaborative action.Type: ApplicationFiled: December 7, 2018Publication date: April 11, 2019Applicant: Nuance Communications, Inc.Inventors: Michael Stuart Phillips, John Nguyen, Thomas Jay Leonard, David Grannan
-
Publication number: 20190108830Abstract: Techniques for performing multi-style speech synthesis. The techniques include using at least one computer hardware processor to perform: obtaining input comprising text and an identification of a desired speaking style to use in rendering the text as speech; identifying a plurality of speech segments for use in rendering the text as speech, the identifying comprising identifying a first speech segment recorded and/or synthesized in a first speaking style that is different from the desired speaking style based at least in part on a measure of similarity between the desired speaking style and the first speaking style; synthesizing speech from the text in the desired speaking style at least in part by using the first speech segment; and outputting the synthesized speech.Type: ApplicationFiled: June 4, 2018Publication date: April 11, 2019Applicant: Nuance Communications, Inc.Inventor: Vincent Pollet
-
Patent number: 10242690Abstract: Embodiments of the present disclosure may include a system and method for speech enhancement using the coherent to diffuse sound ratio. Embodiments may include receiving an audio signal at one or more microphones and controlling one or more adaptive filters of a beamformer using a coherent to diffuse ratio (“CDR”).Type: GrantFiled: December 12, 2014Date of Patent: March 26, 2019Assignee: Nuance Communications, Inc.Inventors: Tobias Wolff, Timo Matheja, Markus Buck
-
Patent number: 10237412Abstract: The present disclosure is directed towards an audio conferencing method. Some embodiments may include receiving, at a first mixing device, a first audio stream from one or more participant conferencing devices. The method may further include generating a top-N voice stream at the first mixing device, wherein the top-N voice stream corresponds with at least one top-N talker and wherein the identification of the at least one top-N talker is based upon, at least in part, an activity ranking. The method may also include receiving the top-N voice stream at a centralized mixing device and generating at least one mixed audio stream at the centralized mixing device.Type: GrantFiled: April 18, 2014Date of Patent: March 19, 2019Assignee: Nuance Communications, Inc.Inventors: Sridhar Pilli, Mahesh Godavarti
-
Patent number: 10235359Abstract: Inferring a natural language grammar is based on providing natural language understanding (NLU) data with concept annotations according to an application ontology characterizing a relationship structure between application-related concepts for a given NLU application. An application grammar is then inferred from the concept annotations and the application ontology.Type: GrantFiled: July 15, 2013Date of Patent: March 19, 2019Assignee: Nuance Communications, Inc.Inventors: Réal Tremblay, Jerome Tremblay, Stephen Douglas Peters, Serge Robillard
-
Patent number: 10229106Abstract: Designing a natural language understanding (NLU) model for an application from scratch can be difficult for non-experts. A system can simplify the design process by providing an interface allowing a designer to input example usage sentences and build an NLU model based on presented matches to those example sentences. In one embodiment, a method for initializing a workspace for building an NLU system includes parsing a sample sentence to select at least one candidate stub grammar from among multiple candidate stub grammars. The method can include presenting, to a user, respective representations of the candidate stub grammars selected by the parsing of the sample sentence. The method can include enabling the user to choose one of the respective representations of the candidate stub grammars. The method can include adding to the workspace a stub grammar corresponding to the representation of the candidate stub grammar chosen by the user.Type: GrantFiled: July 26, 2013Date of Patent: March 12, 2019Assignee: Nuance Communications, Inc.Inventor: Jeffrey N. Marcus
-
Patent number: 10229701Abstract: A mobile device is adapted for automatic speech recognition (ASR). A user interface for interaction with a user includes an input microphone for obtaining speech inputs from the user for automatic speech recognition, and an output interface for system output to the user based on ASR results that correspond to the speech input. A local controller obtains a sample of non-ASR audio from the input microphone for ASR-adaptation to channel-specific ASR characteristics, and then provides a representation of the non-ASR audio to a remote ASR server for server-side adaptation to the channel-specific ASR characteristics, and then provides a representation of an unknown ASR speech input from the input microphone to the remote ASR server for determining ASR results corresponding to the unknown ASR speech input, and then provides the system output to the output interface.Type: GrantFiled: June 12, 2017Date of Patent: March 12, 2019Assignee: Nuance Communications, Inc.Inventors: Daniel Willett, Jean-Guy E. Dahan, William F. Ganong, III, Jianxiong Wu
-
Publication number: 20190073999Abstract: According to some aspects, a system for detecting a designated wake-up word is provided, the system comprising a plurality of microphones to detect acoustic information from a physical space having a plurality of acoustic zones, at least one processor configured to receive a first acoustic signal representing the acoustic information received by the plurality of microphones, process the first acoustic signal to identify content of the first acoustic signal originating from each of the plurality of acoustic zones, provide a plurality of second acoustic signals, each of the plurality of second acoustic signals substantially corresponding to the content identified as originating from a respective one of the plurality of acoustic zones, and performing automatic speech recognition on each of the plurality of second acoustic signals to determine whether the designated wake-up word was spoken.Type: ApplicationFiled: February 10, 2016Publication date: March 7, 2019Applicant: Nuance Communications, Inc.Inventors: Julien Prémont, Tim Haulick, Emanuele Dalmasso, Munir Nikolai Alexander Georges, Andreas Kellner, Gaetan Martens, Oliver van Porten, Holger Quast, Martin Rößler, Tobias Wolff, Markus Buck
-
Patent number: 10223411Abstract: A method of providing a task assistant is described. The task assistant is designed to receive input from a user through multimodal input including a plurality of speech input, typing input, and touch input, determine the meaning of the input, and determining whether there is a context based on prior interactions with the user. The method further to generate an interpreted input based on a combination of the input and the context, and providing a formatted query to an application. The method further to receive data from the application in response to the formatted query, and provide a response to the user through multimodal output including a plurality of: speech output, text output, non-speech audio output, haptic output, and visual non-text output. The method further to update the context based on the interpreted input.Type: GrantFiled: March 6, 2013Date of Patent: March 5, 2019Assignee: Nuance Communications, Inc.Inventors: David Andrew Mauro, Henri Bouvier, Elizabeth Ann Dykstra-Erickson, Simona Gandrabur, Susan Dawnstarr Daniel, Aimee Piercy, Robert Douglas Sharp
-
Patent number: 10199124Abstract: Techniques for documenting a clinical procedure involve transcribing audio data comprising audio of one or more clinical personnel speaking while performing the clinical procedure. Examples of applicable clinical procedures include sterile procedures such as surgical procedures, as well as non-sterile procedures such as those conventionally involving a core code reporter. The transcribed audio data may be analyzed to identify relevant information for documenting the clinical procedure, and a text report including the relevant information documenting the clinical procedure may be automatically generated.Type: GrantFiled: October 9, 2017Date of Patent: February 5, 2019Assignee: Nuance Communications, Inc.Inventor: Mariana Casella dos Santos
-
Patent number: 10192541Abstract: A text-to-speech (TTS) system includes components capable of supporting the generation of speech output in any of multiple styles, and may switch seamlessly from producing speech output in one style to producing speech output in another style. For example, a concatenative TTS system may include a speech base storing speech units associated with multiple speech styles, and a linguistic analysis component to generate a phonetic transcription specifying speech output in any of multiple styles. Text input may include a style indication associated with a particular segment of the input text. The linguistic analysis component may invoke encoded rules and/or components based upon the style indication, and generate a phonetic transcription specifying a speech style, which may be processed to generate output speech.Type: GrantFiled: June 5, 2014Date of Patent: January 29, 2019Assignee: Nuance Communications, Inc.Inventors: Paolo Mairano, Corinne Bos-Plachez, Sourav Nandy, Johan Wouters, Silvia Maria Antonella Quazza, Dong-Jian Yue
-
Patent number: 10192543Abstract: A method (300) and system (100) is provided to add the creation of examples at a developer level in the generation of Natural Language Understanding (NLU) models, tying the examples into a NLU sentence database (130), automatically validating (310) a correct outcome of using the examples, and automatically resolving (316) problems the user has using the examples. The method (300) can convey examples of what a caller can say to a Natural Language Understanding (NLU) application. The method includes entering at least one example associated with an existing routing destination, and ensuring an NLU model correctly interprets the example unambiguously for correctly routing a call to the routing destination. The method can include presenting the example sentence in a help message (126) within an NLU dialogue as an example of what a caller can say for connecting the caller to a desired routing destination.Type: GrantFiled: May 10, 2016Date of Patent: January 29, 2019Assignee: Nuance Communications, Inc.Inventors: Rajesh Balchandran, Linda M. Boyer, James R. Lewis, Brent D. Metz
-
Publication number: 20190027149Abstract: Described herein are embodiments of a system configured to receive text input (e.g., in the form of speech input) that includes provisional text and interpret the provisional text to produce substitute text with which the provisional text is replaced. A user dictating speech input may dictate the provisional text along with other content of the speech, and the speech input including the provisional text may be converted to text in a speech recognition process performed by an automatic speech recognition (ASR) system. The text corresponding to the speech input may be reviewed to determine whether any character strings included in the text match a character pattern defined for provisional text. If so, the character string is interpreted to determine a data field indicated by the provisional text, and substitute text including a value for the data field is determined. The provisional text may then be replaced with the substitute text.Type: ApplicationFiled: July 20, 2017Publication date: January 24, 2019Applicant: Nuance Communications, Inc.Inventor: Markus Vogel
-
Patent number: 10186256Abstract: Typical speech recognition systems usually use speaker-specific speech data to apply speaker adaptation to models and parameters associated with the speech recognition system. Given that speaker-specific speech data may not be available to the speech recognition system, information indicative of language skills is employed in adapting configurations of a speech recognition system. According to at least one example embodiment, a method and corresponding apparatus, for speech recognition comprise maintaining information indicative of language skills of users of the speech recognition system. A configuration of the speech recognition system for a user is determined based at least in part on corresponding information indicative of language skills of the user. Upon receiving speech data from the user, the configuration of the speech recognition system determined is employed in performing speech recognition.Type: GrantFiled: January 23, 2014Date of Patent: January 22, 2019Assignee: Nuance Communications, Inc.Inventors: Weiying Li, Daniel Willett
-
Patent number: 10181325Abstract: Aspects described herein are directed towards methods, computing devices, systems, and computer-readable media that apply scattering operations to extracted visual features of audiovisual input to generate predictions regarding the speech status of a subject. Visual scattering coefficients generated according to one or more aspects described herein may be used as input to a neural network operative to generate the predictions regarding the speech status of the subject. Predictions generated based on the visual features may be combined with predictions based on audio input associated with the visual features. In some embodiments, the extracted visual features may be combined with the audio input to generate a combined feature vector for use in generating predictions.Type: GrantFiled: June 30, 2017Date of Patent: January 15, 2019Assignee: Nuance Communications, Inc.Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
-
Patent number: 10176803Abstract: Technology for improving the predictive accuracy of input word recognition on a device by dynamically updating the lexicon of recognized words based on the word choices made by similar users. The technology collects users' vocabulary choices (e.g., words that each user uses, or adds to or removes from a word recognition dictionary), associates users who make similar choices, aggregates related vocabulary choices, filters the words, and sends words identified as likely choices for that user to the user's device. Clusters may include, for example, users in a particular location (e.g., sets of people who use words such as “Puyallup,” “Gloucester,” or “Waiheke”), users with a particular professional or hobby vocabulary, or application-specific vocabulary (e.g., word choices in map searches or email messages).Type: GrantFiled: June 5, 2017Date of Patent: January 8, 2019Assignee: Nuance Communications, Inc.Inventors: Ethan R. Bradford, Simon Corston, David J. Kay, Donni McCray, Keith Trnka
-
Publication number: 20180367674Abstract: A method for residual echo suppression is provided. Embodiments may include receiving an original reference signal and applying a distortion function to the original reference signal to generate a second signal. Embodiments may include generating a non-linear signal from the distortion function that does not include linear components of the original reference signal. Embodiments may also include calculating a residual echo power of a linear component and a non-linear component, wherein the linear component is based upon the original reference signal and the non-linear component is based upon the non-linear signal. Embodiments may further include applying a room model to each of the original reference signal and the non-linear signal and estimating a power associated with the original reference signal and the non-linear signal. Embodiments may include calculating a combined echo power estimate as a weighted sum of a weighted original reference signal power and a weighted non-linear signal power.Type: ApplicationFiled: December 8, 2015Publication date: December 20, 2018Applicant: Nuance Communications, Inc.Inventors: Ingo Schalk-Schupp, Markus Buck, Friedrich FaubeI