Speech Recognition (epo) Patents (Class 704/E15.001)

E Subclasses

Assessment or evaluation of speech recognition systems (epo) (Class 704/E15.002)

Language recognition (epo) (Class 704/E15.003)

Feature extraction for speech recognition; selection of recognition unit (epo) (Class 704/E15.004)

Segmentation or word limit detection (epo) (Class 704/E15.005)

Word boundary detection (EPO) (Class 704/E15.006)

Creation of reference templates; training of speech recognition systems, e.g., adaption to the characteristics of the speaker's voice, etc. (epo) (Class 704/E15.007)

Speech classification or search (epo) (Class 704/E15.014)

Speech recognition techniques for robustness in adverse environments, e.g., in noise, of stress induced speech, etc. (epo) (Class 704/E15.039)

Procedures used during a speech recognition process, e.g., man-machine dialogue, etc. (epo) (Class 704/E15.04)

Speech recognition using nonacoustical features, e.g., position of the lips, etc. (epo) (Class 704/E15.041)

Using position of the lips, movement of the lips, or face analysis (EPO) (Class 704/E15.042)

Speech to text systems (epo) (Class 704/E15.043)

Constructional details of speech recognition systems (epo) (Class 704/E15.046)

PHRASE SPOTTING SYSTEMS AND METHODS

Publication number: 20140100848

Abstract: Methods and systems for identifying specified phrases within audio streams are provided. More particularly, a phrase is specified. An audio stream is them monitored for the phrase. In response to determining that the audio stream contains the phrase, verification from a user that the phrase was in fact included in the audio stream is requested. If such verification is received, the portion of the audio stream including the phrase is recorded. The recorded phrase can then be applied to identify future instances of the phrase in monitored audio streams.

Type: Application

Filed: October 5, 2012

Publication date: April 10, 2014

Applicant: AVAYA INC.

Inventors: Shmuel Shaffer, Keith Ponting, Valentine C. Matula
Techniques for personal security via mobile devices

Patent number: 8693977

Abstract: Techniques for achieving personal security via mobile devices are presented. A portable mobile communication device, such as a phone or a personal digital assistant (PDA), is equipped with geographic positioning capabilities and is equipped with audio and visual devices. A panic mode of operation can be automatically detected in which real time audio and video for an environment surrounding the portable communication device are captured along with a geographic location for the portable communication device. This information is streamed over the Internet to a secure site where it can be viewed in real time and/or later inspected.

Type: Grant

Filed: August 13, 2009

Date of Patent: April 8, 2014

Assignee: Novell, Inc.

Inventors: Sandeep Patnaik, Saheednanda Singh, AnilKumar Bolleni
System and Method for Automatic Prediction of Speech Suitability for Statistical Modeling

Publication number: 20140074468

Abstract: An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model decision maker with an output quality that is high, stable and depends gradually on the system footprint. In speaker selection for a statistical Text-to-Speech synthesis (TTS) system build, as another example context, an embodiment according to the invention enables a fast selection of the most appropriate speaker among several available ones for the full voice dataset recording and preparation, based on a small amount of recorded speech material.

Type: Application

Filed: September 7, 2012

Publication date: March 13, 2014

Applicant: Nuance Communications, Inc.

Inventors: Alexander Sorin, Slava Shechtman, Vincent Pollet
VOICE CONTROL SYSTEM WITH PORTABLE VOICE CONTROL DEVICE

Publication number: 20140074472

Abstract: A voice control system is adapted for controlling an electrical appliance, and includes a host and a portable voice control device. The portable voice control device is capable of wireless communication with the host, and includes an audio pick-up unit for receiving a voice input. One of the host and the portable voice control device includes a voice recognition control module that is configured to recognize a control command from the voice input. The host controls operation of the electrical appliance according to the control command, and transmits an appliance status message to the portable voice control device. The portable voice control device further includes an output unit for outputting the appliance status message.

Type: Application

Filed: September 12, 2012

Publication date: March 13, 2014

Inventors: Chih-Hung Lin, Teh-Jang Chen
Method and System for Predicting Speech Recognition Performance Using Accuracy Scores

Publication number: 20140067391

Abstract: A system and method are presented for predicting speech recognition performance using accuracy scores in speech recognition systems within the speech analytics field. A keyword set is selected. Figure of Merit (FOM) is computed for the keyword set. Relevant features that describe the word individually and in relation to other words in the language are computed. A mapping from these features to FOM is learned. This mapping can be generalized via a suitable machine learning algorithm and be used to predict FOM for a new keyword. In at least embodiment, the predicted FOM may be used to adjust internals of speech recognition engine to achieve a consistent behavior for all inputs for various settings of confidence values.

Type: Application

Filed: August 30, 2012

Publication date: March 6, 2014

Applicant: INTERACTIVE INTELLIGENCE, INC.

Inventors: Aravind Ganapathiraju, Yingyi Tan, Felix Immanuel Wyss, Scott Allen Randal
CENTRALIZED SPEECH LOGGER ANALYSIS

Publication number: 20140067392

Abstract: A method of providing hands-free services using a mobile device having wireless access to computer-based services includes receiving speech in a vehicle from a vehicle occupant; recording the speech using a mobile device; transmitting the recorded speech from the mobile device to a cloud speech service; receiving automatic speech recognition (ASR) results from the cloud speech service at the mobile device; and comparing the recorded speech with the received ASR results at the mobile device to identify one or more error conditions.

Type: Application

Filed: September 5, 2012

Publication date: March 6, 2014

Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLC

Inventors: Denis R. Burke, Danilo Gurovich, Daniel E. Rudman, Keith A. Fry, Shane M. McCutchen, Marco T. Carnevale, Mukesh Gupta
Method for dynamic learning of individual voice patterns

Patent number: 8655660

Abstract: The present invention is a system and method for generating a personal voice font including, monitoring voice segments automatically from phone conversations of a user by a voice learning processor to generate a personalized voice font and delivering the personalized voice font (PVF) to the a server.

Type: Grant

Filed: February 10, 2009

Date of Patent: February 18, 2014

Assignee: International Business Machines Corporation

Inventors: Zsolt Szalai, Philippe Bazot, Bernard Pucci, Joel Vitale
METHODS AND APPARATUS FOR VOICE-ENABLING A WEB APPLICATION

Publication number: 20140039885

Abstract: Methods and apparatus for voice-enabling a web application, wherein the web application includes one or more web pages rendered by a web browser on a computer. At least one information source external to the web application is queried to determine whether information describing a set of one or more supported voice interactions for the web application is available, and in response to determining that the information is available, the information is retrieved from the at least one information source. Voice input for the web application is then enabled based on the retrieved information.

Type: Application

Filed: August 2, 2012

Publication date: February 6, 2014

Applicant: Nuance Communications, Inc.

Inventors: David E. Reich, Christopher Hardy
AUTOMATIC SEPARATION OF AUDIO DATA

Publication number: 20140039891

Abstract: Systems and methods for audio editing are provided. In one implementation, a computer-implemented method is provided. The method includes receiving digital audio data including a plurality of distinct vocal components. Each distinct vocal component is automatically identified using one or more attributes that uniquely identify each distinct vocal component. The audio data is separated into two or more individual tracks where each individual track comprises audio data corresponding to one distinct vocal component. The separated individual tracks are then made available for further processing.

Type: Application

Filed: October 16, 2007

Publication date: February 6, 2014

Applicant: ADOBE SYSTEMS INCORPORATED

Inventors: Nariman Sodeifi, David E. Johnston
SPEECH RECOGNITION ADAPTATION SYSTEMS BASED ON ADAPTATION DATA

Publication number: 20140039881

Abstract: The instant application includes computationally-implemented systems and methods that include managing adaptation data, the adaptation data is at least partly based on at least one speech interaction of a particular party, facilitating transmission of the adaptation data to a target device when there is an indication of a speech-facilitated transaction between the target device and the particular party, such that the adaptation data is to be applied to the target device to assist in execution of the speech-facilitated transaction, and facilitating acquisition of adaptation result data that is based on at least one aspect of the speech-facilitated transaction and to be used in determining whether to modify the adaptation data. In addition to the foregoing, other aspects are described in the claims, drawings, and text.

Type: Application

Filed: August 1, 2012

Publication date: February 6, 2014

Inventors: Royce A. Levien, Richard T. Lord, Robert W. Lord, Mark A. Malamud
METHOD AND APPARATUS FOR TONE DETECTION

Publication number: 20140029733

Abstract: A speech server and methods provide audio stream analysis for tone detection in addition to speech recognition to implement an accurate and efficient answering machine detection strategy. By performing both tone detection and speech recognition in a single component, such as the speech server, the number of components for digital signal processing may be reduced. The speech server communicates tone events detected at the telephony level and enables voice applications to detect tone events consistently and provide consistent support and accuracy of both inbound and outbound voice applications independent of the hardware or geographical location of the telephony network. In addition, an improved opportunity for signaling of an appropriate moment for an application to leave a message is provided, thereby supporting automation.

Type: Application

Filed: July 26, 2012

Publication date: January 30, 2014

Applicant: Nuance Communications, Inc.

Inventors: Kenneth W.D. Smith, Jaques de Broin
SYSTEM, METHOD AND PROGRAM PRODUCT FOR PROVIDING AUTOMATIC SPEECH RECOGNITION (ASR) IN A SHARED RESOURCE ENVIRONMENT

Publication number: 20140025377

Abstract: A speech recognition system, method of recognizing speech and a computer program product therefor. A client device identified with a context for an associated user selectively streams audio to a provider computer, e.g., a cloud computer. Speech recognition receives streaming audio, maps utterances to specific textual candidates and determines a likelihood of a correct match for each mapped textual candidate. A context model selectively winnows candidate to resolve recognition ambiguity according to context whenever multiple textual candidates are recognized as potential matches for the same mapped utterance. Matches are used to update the context model, which may be used for multiple users in the same context.

Type: Application

Filed: August 10, 2012

Publication date: January 23, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Fernando Luiz Koch, Julio Nogima
DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS

Publication number: 20140012579

Abstract: In some embodiments, recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential errors. In some embodiments, the indications of potential errors may include discrepancies between recognition results that are meaningful for a domain, such as medically-meaningful discrepancies. The evaluation of the recognition results may be carried out using any suitable criteria, including one or more criteria that differ from criteria used by an ASR system in determining the top recognition result and the alternative recognition results from the speech input. In some embodiments, a recognition result may additionally or alternatively be processed to determine whether the recognition result includes a word or phrase that is unlikely to appear in a domain to which speech input relates.

Type: Application

Filed: July 9, 2012

Publication date: January 9, 2014

Applicant: Nuance Communications, Inc.

Inventors: William F. Ganong, III, Raghu Vemula, Robert Fleming
DETECTING POTENTIAL SIGNIFICANT ERRORS IN SPEECH RECOGNITION RESULTS

Publication number: 20140012582

Abstract: In some embodiments, a recognition result produced by a speech processing system based on an analysis of a speech input is evaluated for indications of potential errors. In some embodiments, sets of words/phrases that may be acoustically similar or otherwise confusable, the misrecognition of which can be significant in the domain, may be used together with a language model to evaluate a recognition result to determine whether the recognition result includes such an indication. In some embodiments, a word/phrase of a set that appears in the result is iteratively replaced with each of the other words/phrases of the set. The result of the replacement may be evaluated using a language model to determine a likelihood of the newly-created string of words appearing in a language and/or domain. The likelihood may then be evaluated to determine whether the result of the replacement is sufficiently likely for an alert to be triggered.

Type: Application

Filed: July 9, 2012

Publication date: January 9, 2014

Applicant: Nuance Communications, Inc.

Inventors: William F. Ganong, III, Raghu Vemula, Robert Fleming
PROVIDING AUDIO-ACTIVATED RESOURCE ACCESS FOR USER DEVICES BASED ON SPEAKER VOICEPRINT

Publication number: 20140006025

Abstract: This disclosure includes, for example, methods and computer systems for providing audio-activated resource access for user devices. The computer systems may store instructions to cause the processor to perform operations, comprising capturing audio at a user device. The operations may also comprise using a speaker recognition system to identify a speaker in the transmitted audio and/or using a speech-to-text converter to identify text in the captured audio. The speaker identity or a condensed version of the speaker identity or other metadata along with the speaker identity may be transmitted to a server system to determine a corresponding speaker identity entry. The operations may also comprise receiving a resource corresponding to the identified speaker entry in the server system.

Type: Application

Filed: June 29, 2012

Publication date: January 2, 2014

Inventors: Harshini Ramnath Krishnan, Andrew Fregly
Joint Decoding of Words and Tags for Conversational Understanding

Publication number: 20130346066

Abstract: Joint decoding of words and tags may be provided. Upon receiving an input from a user comprising a plurality of elements, the input may be decoded into a word lattice comprising a plurality of words. A tag may be assigned to each of the plurality of words and a most-likely sequence of word-tag pairs may be identified. The most-likely sequence of word-tag pairs may be evaluated to identify an action request from the user.

Type: Application

Filed: June 20, 2012

Publication date: December 26, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Anoop Kiran Deoras, Dilek Zeynep Hakkani-Tur, Ruhi Sarikaya, Gokhan Tur
DEPTH BASED CONTEXT IDENTIFICATION

Publication number: 20130339027

Abstract: A method or system for selecting or pruning applicable verbal commands associated with speech recognition based on a user's motions detected from a depth camera. Depending on the depth of the user's hand or arm, the context of the verbal command is determined and verbal commands corresponding to the determined context are selected. Speech recognition is then performed on an audio signal using the selected verbal commands. By using an appropriate set of verbal commands, the accuracy of the speech recognition is increased.

Type: Application

Filed: June 15, 2012

Publication date: December 19, 2013

Inventors: Tarek El Dokor, James Holmes, Jordan Cluster, Stuart Yamamoto, Pedram Vaghefinazari
Intent Discovery in Audio or Text-Based Conversation

Publication number: 20130339021

Abstract: Techniques, an apparatus and an article of manufacture identifying one or more utterances that are likely to carry the intent of a speaker, from a conversation between two or more parties. A method includes obtaining an input of a set of utterances in chronological order from a conversation between two or more parties, computing an intent confidence value of each utterance by summing intent confidence scores from each of the constituent words of the utterance, wherein intent confidence scores capture each word's influence on the subsequent utterances in the conversation based on (i) the uniqueness of the word in the conversation and (ii) the number of times the word subsequently occurs in the conversation, and generating a ranked order of the utterances from highest to lowest intent confidence value, wherein the highest intent value corresponds to the utterance which is most likely to carry intent of the speaker.

Type: Application

Filed: June 19, 2012

Publication date: December 19, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Om D. Deshmukh, Sachindra Joshi, Saurabh Saket, Ashish Verma
MULTI-SAMPLE CONVERSATIONAL VOICE VERIFICATION

Publication number: 20130339018

Abstract: A system and method of verifying the identity of an authorized user in an authorized user group through a voice user interface for enabling secure access to one or more services via a mobile device includes receiving first voice information from a speaker through the voice user interface of the mobile device, calculating a confidence score based on a comparison of the first voice information with a stored voice model associated with the authorized user and specific to the authorized user, interpreting the first voice information as a specific service request, identifying a minimum confidence score for initiating the specific service request, determining whether or not the confidence score exceeds the minimum confidence score, and initiating the specific service request if the confidence score exceeds the minimum confidence score.

Type: Application

Filed: July 27, 2012

Publication date: December 19, 2013

Applicant: SRI INTERNATIONAL

Inventors: Nicolas Scheffer, Yun Lei, Douglas A. Bercow
Apparatus and Methods to Update a Language Model in a Speech Recognition System

Publication number: 20130332147

Abstract: The technology of the present application provides a method and apparatus to allow for dynamically updating a language model across a large number of similarly situated users. The system identifies individual changes to user profiles and evaluates the change for a broader application, such as, a dialect correction for a speech recognition engine, as administrator for the system identifies similarly situated user profiles and downloads the profile change to effect a dynamic change to the language model of similarly situated users.

Type: Application

Filed: June 11, 2012

Publication date: December 12, 2013

Applicant: NVOQ INCORPORATED

Inventor: Charles Corfield
SPEECH RECOGNITION ADAPTATION SYSTEMS BASED ON ADAPTATION DATA

Publication number: 20130325459

Abstract: Computationally implemented methods and systems include receiving indication of initiation of a speech-facilitated transaction between a party and a target device, and receiving adaptation data correlated to the party. The receiving is facilitated by a particular device associated with the party. The adaptation data is at least partly based on previous adaptation data derived at least in part from one or more previous speech interactions of the party. The methods and systems also include applying the received adaptation data correlated to the party to the target device, and processing speech from the party using the target device to which the received adaptation data has been applied. In addition to the foregoing, other aspects are described in the claims, drawings, and text.

Type: Application

Filed: May 31, 2012

Publication date: December 5, 2013

Inventors: Royce A. Levien, Richard T. Lord, Robert W. Lord, Mark A. Malamud, John D. Rinaldo, JR.
CUSTOMIZED VOICE ACTION SYSTEM

Publication number: 20130317823

Abstract: Systems, methods, and computer-readable media that may be used to modify a voice action system to include voice actions provided by advertisers or users are provided. One method includes receiving electronic voice action bids from advertisers to modify the voice action system to include a specific voice action (e.g., a triggering phrase and an action). One or more bids may be selected. The method includes, for each of the selected bids, modifying data associated with the voice action system to include the voice action associated with the bid, such that the action associated with the respective voice action is performed when voice input from a user is received that the voice action system determines to correspond to the triggering phrase associated with the respective voice action.

Type: Application

Filed: May 23, 2012

Publication date: November 28, 2013

Inventor: Pedro J. Moreno Mengibar
Automatic Methods to Predict Error Rates and Detect Performance Degradation

Publication number: 20130317820

Abstract: An automatic speech recognition dictation application is described that includes a dictation module for performing automatic speech recognition in a dictation session with a speaker user to determine representative text corresponding to input speech from the speaker user. A post-processing module develops a session level metric correlated to verbatim recognition error rate of the dictation session, and determines if recognition performance degraded during the dictation session based on a comparison of the session metric to a baseline metric.

Type: Application

Filed: May 24, 2012

Publication date: November 28, 2013

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Xiaoqiang Xiao, Venkatesh Nagesha
Contextual Voice Query Dilation

Publication number: 20130304468

Abstract: A method for contextual voice query dilation in a Spoken Web search includes determining a context in which a voice query is created, generating a set of multiple voice query terms based on the context and information derived by a speech recognizer component pertaining to the voice query, and processing the set of query terms with at least one dilation operator to produce a dilated set of queries. A method for performing a search on a voice query is also provided, including generating a set of multiple query terms based on information derived by a speech recognizer component processing a voice query, processing the set with multiple dilation operators to produce multiple dilated sub-sets of query terms, selecting at least one query term from each dilated sub-set to compose a query set, and performing a search on the query set.

Type: Application

Filed: August 8, 2012

Publication date: November 14, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Nitendra Rajput, Kundan Shrivastava
Adaptive Equalization System

Publication number: 20130297306

Abstract: An adaptive equalization system that adjusts the spectral shape of a speech signal based on an intelligibility measurement of the speech signal may improve the intelligibility of the output speech signal. Such an adaptive equalization system may include a speech intelligibility measurement module, a spectral shape adjustment module, and an adaptive equalization module. The speech intelligibility measurement module is configured to calculate a speech intelligibility measurement of a speech signal. The spectral shape adjustment module is configured to generate a weighted long-term speech curve based on a first predetermined long-term average speech curve, a second predetermined long-term average speech curve, and the speech intelligibility measurement. The adaptive equalization module is configured to adapt equalization coefficients for the speech signal based on the weighted long-term speech curve.

Type: Application

Filed: May 4, 2012

Publication date: November 7, 2013

Applicant: QNX Software Systems Limited

Inventors: Phillip Alan Hetherington, Xueman Li
VOICE ENTRY OF SENSITIVE INFORMATION

Publication number: 20130297316

Abstract: A method, system, and computer program product for voice entry of information are provided in the illustrative embodiments. A conversion rule is applied to a voice input. An entry field input is generated, wherein the conversion rule allows the voice input to be distinct from the entry field input, and wherein the voice input obfuscates the entry field input. The entry field input is provided to an application, wherein the entry field is usable to populate a data entry field in the application.

Type: Application

Filed: May 3, 2012

Publication date: November 7, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Brian John Cragun, Marc Kevin Johlic
Recognizing Repeated Speech in a Mobile Computing Device

Publication number: 20130282371

Abstract: A method is disclosed herein for recognizing a repeated utterance in a mobile computing device via a processor. A first utterance is detected being spoken into a first mobile computing device. Likewise, a second utterance is detected being spoken into a second mobile computing device within a predetermined time period. The second utterance substantially matches the first spoken utterance and the first and second mobile computing devices are communicatively coupled to each other. The processor enables capturing, at least temporarily, a matching utterance for performing a subsequent processing function. The performed subsequent processing function is based on a type of captured utterance.

Type: Application

Filed: April 20, 2012

Publication date: October 24, 2013

Applicant: Motorola Mobility, Inc.

Inventors: Rachid M Alameh, Jiri Slaby, Hisashi D. Watanabe
SYSTEM AND METHOD FOR ENHANCING SPEECH RECOGNITION ACCURACY

Publication number: 20130275136

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable media for enhancing speech recognition accuracy. The method includes dividing a system dialog turn into segments based on timing of probable user responses, generating a weighted grammar for each segment, exclusively activating the weighted grammar generated for a current segment of the dialog turn during the current segment of the dialog turn, and recognizing user speech received during the current segment using the activated weighted grammar generated for the current segment. The method can further include assigning probability to the weighted grammar based on historical user responses and activating each weighted grammar is based on the assigned probability. Weighted grammars can be generated based on a user profile. A weighted grammar can be generated for two or more segments.

Type: Application

Filed: April 13, 2012

Publication date: October 17, 2013

Applicant: AT&T Intellectual Property I, L.P.

Inventor: Michael Czahor
Forced/Predictable Adaptation for Speech Recognition

Publication number: 20130268270

Abstract: A method is described for use with automatic speech recognition using discriminative criteria for speaker adaptation. An adaptation evaluation is performed of speech recognition performance data for speech recognition system users. Adaptation candidate users are identified based on the adaptation evaluation for whom an adaptation process is likely to improve system performance.

Type: Application

Filed: April 5, 2012

Publication date: October 10, 2013

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Dan Ning Jiang, Vaibhava Goel, Dimitri Kanevsky, Yong Qin
FACTORED TRANSFORMS FOR SEPARABLE ADAPTATION OF ACOUSTIC MODELS

Publication number: 20130253930

Abstract: Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.

Type: Application

Filed: March 23, 2012

Publication date: September 26, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Michael Lewis Seltzer, Alejandro Acero
METHOD AND SYSTEM FOR ANNOTATING IMAGE REGIONS THROUGH GESTURES AND NATURAL SPEECH INTERACTION

Publication number: 20130249783

Abstract: The invention relates to a method and system for annotating image regions with specific concepts based on multimodal user input. The system (10) comprises an identification unit (11) for the identification of a region of interest on a multidimensional image; an automatic speech recognition unit (12) for recognizing speech input in a natural language; a natural language understanding unit (13) which interprets the speech input in the context of a specific application domain; a fusion unit (14) which combines the multimodal user input from the identification unit (11) and the natural language understanding unit (13); and an annotation unit (15) which annotates the result of the natural language understanding unit (13) on the image regions and optionally provides user feedback about the annotation process. Thus, the system advantageously facilitates a user's task to annotate specific image regions with standardized key concepts based on multimodal speech-based user input.

Type: Application

Filed: March 22, 2012

Publication date: September 26, 2013

Inventor: Daniel Sonntag
APPARATUS AND METHOD FOR MULTIPLE DEVICE VOICE CONTROL

Publication number: 20130238326

Abstract: In an environment including multiple electronic devices that are each capable of being controlled by a user's voice command, an individual device is able to distinguish a voice command intended particularly for the device from among other voice commands that are intended for other devices present in the common environment. The device is able to accomplish this distinction by identifying unique attributes belonging to the device itself from within a user's voice command. Thus only voice commands that include attribute information that are supported by the device will be recognized by the device, and other voice commands that include attribute information that are not supported by the device may be effectively ignored for voice control purposes of the device.

Type: Application

Filed: March 8, 2012

Publication date: September 12, 2013

Applicant: LG ELECTRONICS INC.

Inventors: Yongsin KIM, Dami CHOE, Hyorim PARK
Voice Activity Detection and Pitch Estimation

Publication number: 20130231932

Abstract: Implementations include systems, methods and/or devices operable to detect voice activity in an audible signal by detecting glottal pulses. The dominant frequency of a series of glottal pulses is perceived as the intonation pattern or melody of natural speech, which is also referred to as the pitch. However, as noted above, spoken communication typically occurs in the presence of noise and/or other interference. In turn, the undulation of voiced speech is masked in some portions of the frequency spectrum associated with human speech by the noise and/or other interference. In some implementations, detection of voice activity is facilitated by dividing the frequency spectrum associated with human speech into multiple sub-bands in order to identify glottal pulses that dominate the noise and/or other inference in particular sub-bands. Additionally and/or alternatively, in some implementations the analysis is furthered to provide a pitch estimate of the detected voice activity.

Type: Application

Filed: August 20, 2012

Publication date: September 5, 2013

Inventors: Pierre Zakarauskas, Alexander Escott, Clarence S.H. Chu, Shawn E. Stevenson
Virtual Participant-based Real-Time Translation and Transcription System for Audio and Video Teleconferences

Publication number: 20130226557

Abstract: The present disclosure describes a teleconferencing system that may use a virtual participant processor to translate language content of the teleconference into each participant's spoken language without additional user inputs. The virtual participant processor may connect to the teleconference as do the other participants. The virtual participant processor may intercept all text or audio data that was previously exchanged between the participants may now be intercepted by the virtual participant processor. Upon obtaining a partial or complete language recognition result or making a language preference determination, the virtual participant processor may call a translation engine appropriate for each of the participants. The virtual participant processor may send the resulting translation to a teleconference management processor. The teleconference management processor may deliver the respective translated text or audio data to the appropriate participant.

Type: Application

Filed: April 30, 2012

Publication date: August 29, 2013

Applicant: Google Inc.

Inventors: Jakob David Uszkoreit, Ashish Venugopal, Johan Schalkwyk, Joshua James Estelle
SPEECH-ASSISTED KEYPAD ENTRY

Publication number: 20130225240

Abstract: An electronic device is configured to receive data from a keypad key, wherein the key is associated with first and second alphanumeric characters. The device includes a keypad interface and a data entry processor. The keypad interface is configured to determine the first and second alphanumeric characters when the key is pressed. The data entry processor is configured to select the first alphanumeric character from among the first and second alphanumeric characters when a speech recognizer determines that a spoken entry identifies the first alphanumeric character.

Type: Application

Filed: February 29, 2012

Publication date: August 29, 2013

Applicant: NVIDIA Corporation

Inventors: Henry P. Largey, Gabriel Rivera
ZONE BASED PRESENCE DETERMINATION VIA VOICEPRINT LOCATION AWARENESS

Publication number: 20130195285

Abstract: A speech from a speaker proximate to one or more microphones within an environment can be received. The microphones can be a directional microphone or an omni-directional microphone. The speech can be processed to produce an utterance to determine the identity of the speaker. The identity of the speaker can be associated with a voiceprint. The identity can be associated with a user's credentials of a computing system. The credentials can uniquely identify the user within the computing system. The utterance can be analyzed to establish a zone in which the speaker is present. The zone can be a bounded region within the environment. The zone can be mapped within the environment to determine a location of the speaker. The location can be a relative or an absolute location.

Type: Application

Filed: January 30, 2012

Publication date: August 1, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: STEPHANIE DE LA FUENTE, GREGORY S. JONES, JOHN S. PANNELL
TECHNIQUES TO NORMALIZE NAMES EFFICIENTLY FOR NAME-BASED SPEECH RECOGNITNION GRAMMARS

Publication number: 20130197906

Abstract: Techniques to normalize names for name-based speech recognition grammars are described. Some embodiments are particularly directed to techniques to normalize names for name-based speech recognition grammars more efficiently by caching, and on a per-culture basis. A technique may comprise receiving a name for normalization, during name processing for a name-based speech grammar generating process. A normalization cache may be examined to determine if the name is already in the cache in a normalized form. When the name is not already in the cache, the name may be normalized and added to the cache. When the name is in the cache, the normalization result may be retrieved and passed to the next processing step. Other embodiments are described and claimed.

Type: Application

Filed: January 27, 2012

Publication date: August 1, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Mini Varkey, Bernardo Sana, Victor Boctor, Diego Carlomagno
VEHICLE VOICE INTERFACE SYSTEM CALIBRATION METHOD

Publication number: 20130179164

Abstract: A vehicle voice interface system calibration method comprising electronically convolving voice command data with voice impulse response data, electronically convolving audio system output data with feedback impulse response data, and calibrating the vehicle voice interface system. The voice command data is electronically convolved with voice impulse response data representing a voice acoustic signal path between an artificial mouth simulator and a first microphone, to simulate a voice acoustic transfer function pertaining to the passenger compartment. The audio system output data is convolved with feedback impulse response data representing a feedback acoustic signal path between a vehicle audio system output and a second microphone, to simulate a feedback acoustic transfer function pertaining to the passenger compartment. The voice interface system is calibrated to recognize voice commands represented by the voice command data based on the simulated voice and feedback acoustic transfer functions.

Type: Application

Filed: January 6, 2012

Publication date: July 11, 2013

Applicant: Nissan North America, Inc.

Inventor: Patrick Dennis
TOUCH FREE OPERATION OF DEVICES BY USE OF DEPTH SENSORS

Publication number: 20130179162

Abstract: An inventive system and method for touch free operation of a device is presented. The system can comprise a depth sensor for detecting a movement, motion software to receive the detected movement from the depth sensor, deduce a gesture based on the detected movement, and filter the gesture to accept an applicable gesture, and client software to receive the applicable gesture at a client computer for performing a task in accordance with client logic based on the applicable gesture. The client can be a mapping device and the task can be one of various mapping operations. The system can also comprise hardware for making the detected movement an applicable gesture. The system can also comprise voice recognition providing voice input for enabling the client to perform the task based on the voice input in conjunction with the applicable gesture. The applicable gesture can be a movement authorized using facial recognition.

Type: Application

Filed: January 11, 2012

Publication date: July 11, 2013

Applicant: BIOSENSE WEBSTER (ISRAEL), LTD.

Inventors: Asaf Merschon, Assaf Govari, Andres Claudio Altmann, Yitzhack Schwartz
Method and system for subframe accurate synchronization

Patent number: 8483540

Abstract: A method, apparatus and system for synchronizing between two recording modes includes identifying a common event in the two recording modes. The event in time is recognized for a higher accuracy mode of the two modes. The event is predicted in a lower accuracy mode of the two modes by determining a time when the event occurred between frames in the lower accuracy mode. The event in the higher accuracy mode is synchronized to the lower accuracy mode to provide sub-frame accuracy alignment between the two modes. In one embodiment of the invention, the common event includes the closing of a clap slate, and the two modes include audio and video recording modes.

Type: Grant

Filed: December 12, 2006

Date of Patent: July 9, 2013

Assignee: Thomson Licensing

Inventors: Ingo Doser, Ana Belen Benitez, Dong-Qing Zhang
METHODS, APPARATUSES AND COMPUTER PROGRAM PRODUCTS FOR IMPLEMENTING AUTOMATIC SPEECH RECOGNITION AND SENTIMENT DETECTION ON A DEVICE

Publication number: 20130173264

Abstract: An apparatus for utilizing textual data and acoustic data corresponding to speech data to detect sentiment may include a processor and memory storing executable computer code causing the apparatus to at least perform operations including evaluating textual data and acoustic data corresponding to voice data associated with captured speech content. The computer program code may further cause the apparatus to analyze the textual data and the acoustic data to detect whether the textual data or the acoustic data includes one or more words indicating at least one sentiment of a user that spoke the speech content. The computer program code may further cause the apparatus to assign at least one predefined sentiment to at least one of the words in response to detecting that the word(s) indicates the sentiment of the user. Corresponding methods and computer program products are also provided.

Type: Application

Filed: January 3, 2012

Publication date: July 4, 2013

Applicant: NOKIA CORPORATION

Inventors: Imre Attila Kiss, Joseph Polifroni, Francois Mairesse, Mark Adler
IMAGE, AUDIO, AND METADATA INPUTS FOR NAME SUGGESTION

Publication number: 20130173701

Abstract: A system, method, and computer-readable medium, is described that implements a domain name registration suggestion tool that receives one or more inputs, extracts information from the inputs into a submission string, submits the submission string to a domain name suggestion tool, and receives domain name suggestions based on the submission string. Inputs types may include images, audio clips, and metadata. The inputs sources may be processed to extract information related to the image source to build the submission string.

Type: Application

Filed: December 30, 2011

Publication date: July 4, 2013

Inventors: Neel Goyal, Vincent Raemy, Harshini Ramnath Krishnan
SPEAKER VERIFICATION IN A HEALTH MONITORING SYSTEM

Publication number: 20130173268

Abstract: A method for verifying that a person is registered to use a telemedical device includes identifying an unprompted trigger phrase in words spoken by a person and received by the telemedical device. The telemedical device prompts the person to state a name of a registered user and optionally prompts the person to state health tips for the person. The telemedical device verifies that the person is the registered user using utterance data generated from the unprompted trigger phrase, name of the registered user, and health tips.

Type: Application

Filed: December 29, 2011

Publication date: July 4, 2013

Applicant: Robert Bosch GmbH

Inventors: Fuliang Weng, Taufiq Hasan, Zhe Feng
METHODS, APPARATUSES AND COMPUTER PROGRAM PRODUCTS FOR JOINT USE OF SPEECH AND TEXT-BASED FEATURES FOR SENTIMENT DETECTION

Publication number: 20130173269

Abstract: An apparatus for generating a review based in part on detected sentiment may include a processor and memory storing executable computer code causing the apparatus to at least perform operations including determining a location(s) of the apparatus and a time(s) that the location(s) was determined responsive to capturing voice data of speech content associated with spoken reviews of entities. The computer program code may further cause the apparatus to analyze textual and acoustic data corresponding to the voice data to detect whether the textual or acoustic data includes words indicating a sentiment(s) of a user speaking the speech content. The computer program code may further cause the apparatus to generate a review of an entity corresponding to a spoken review(s) based on assigning a predefined sentiment to a word(s) responsive to detecting that the word indicates the sentiment of the user. Corresponding methods and computer program products are also provided.

Type: Application

Filed: January 3, 2012

Publication date: July 4, 2013

Applicant: NOKIA CORPORATION

Inventors: Mark Adler, Imre Attila Kiss, Francois Mairesse, Joseph Polifroni
Acoustic Processing Unit Interface

Publication number: 20130158997

Abstract: Embodiments of the present invention include an apparatus, method, and system for acoustic modeling. In an embodiment, a speech recognition system is provided. The system includes a processing unit configured to divide a received audio signal into consecutive frames having respective frame vectors, an acoustic processing unit (APU), a data bus that couples the processing unit and the APU. The APU includes a local, non-volatile memory that stores a plurality of senones, a memory buffer coupled to the memory, the acoustic processing unit being configured to load at least one Gaussian probability distribution vector stored in the memory into the memory buffer, and a scoring unit configured to simultaneously compare a plurality of dimensions of a Gaussian probability distribution vector loaded into the memory buffer with respective dimensions of a frame vector received from the processing unit and to output a corresponding score to the processing unit.

Type: Application

Filed: June 6, 2012

Publication date: June 20, 2013

Applicant: Spansion LLC

Inventors: Venkataraman Natarajan, Stephan Rosner
System and Method for Evaluating Speech Exposure

Publication number: 20130158977

Abstract: Systems and methods are provided for detecting and analyzing speech spoken in the vicinity of a user. The detected speech may be analyzed to determine the quality, volume, complexity, language, and other attributes. A value metric may be calculated for the received speech, such as to inform parents of a child's progress related to learning to speak, or to provide feedback to a foreign language learner. A corresponding device may display the number of words, the value metric, or other information about speech received by the device.

Type: Application

Filed: June 14, 2011

Publication date: June 20, 2013

Inventor: Andrew Senior
Acoustic Processing Unit

Publication number: 20130158996

Abstract: Embodiments of the present invention include an apparatus, method, and system for acoustic modeling. The apparatus can include a senone scoring unit (SSU) control module, a distance calculator, and an addition module. The SSU control module can be configured to receive a feature vector. The distance calculator can be configured to receive a plurality of Gaussian probability distributions via a data bus having a width of at least one Gaussian probability distribution and the feature vector from the SSU control module. The distance calculator can include a plurality of arithmetic logic units to calculate a plurality of dimension distance scores and an accumulator to sum the dimension distance scores to generate a Gaussian distance score. Further, the addition module is configured to sum a plurality of Gaussian distance scores to generate a senone score.

Type: Application

Filed: June 6, 2012

Publication date: June 20, 2013

Applicant: Spansion LLC

Inventors: Richard Fastow, Jens Olson
SYSTEM AND METHOD FOR MACHINE-MEDIATED HUMAN-HUMAN CONVERSATION

Publication number: 20130144616

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for processing speech. A system configured to practice the method monitors user utterances to generate a conversation context. Then the system receives a current user utterance independent of non-natural language input intended to trigger speech processing. The system compares the current user utterance to the conversation context to generate a context similarity score, and if the context similarity score is above a threshold, incorporates the current user utterance into the conversation context. If the context similarity score is below the threshold, the system discards the current user utterance. The system can compare the current user utterance to the conversation context based on an n-gram distribution, a perplexity score, and a perplexity threshold. Alternately, the system can use a task model to compare the current user utterance to the conversation context.

Type: Application

Filed: December 6, 2011

Publication date: June 6, 2013

Applicant: AT&T Intellectual Property I, L.P.

Inventor: Srinivas BANGALORE
VOICE CONTROL CIRCUIT FOR STARTING ELECTRONIC DEVICES

Publication number: 20130144627

Abstract: A control circuit employed in an electronic device includes a microphone, a level conversion circuit, and a voice processing circuit. The voice processing circuit includes a voice operated switch connected between the microphone and the level conversion circuit. The microphone picks up voice commands, the voice operated switch receives the voice commands from the microphone, and outputs a high voltage signal when a volume of the voice commands is greater than or equal to a predetermined volume threshold or is within a predetermined volume range, the level conversion circuit converts the high voltage signal into a low voltage signal for turning on the electronic device.

Type: Application

Filed: March 9, 2012

Publication date: June 6, 2013

Applicants: HON HAI PRECISION INDUSTRY CO., LTD., HONG FU JIN PRECISION INDUSTRY (ShenZhen) CO., LTD.

Inventor: JIE LI
VISUAL PRESENTATION OF SPEAKER-RELATED INFORMATION

Publication number: 20130144623

Abstract: Techniques for ability enhancement are described. Some embodiments provide an ability enhancement facilitator system (“AEFS”) configured to determine and present speaker-related information based on speaker utterances. In one embodiment, the AEFS receives data that represents an utterance of a speaker received by a hearing device of the user, such as a hearing aid, smart phone, media player/device, or the like. The AEFS identifies the speaker based on the received data, such as by performing speaker recognition. The AEFS determines speaker-related information associated with the identified speaker, such as by determining an identifier (e.g., name or title) of the speaker, by locating an information item (e.g., an email message, document) associated with the speaker, or the like. The AEFS then informs the user of the speaker-related information, such as by presenting the speaker-related information on a display of the hearing device or some other device accessible to the user.

Type: Application

Filed: December 13, 2011

Publication date: June 6, 2013

Inventors: Richard T. Lord, Robert W. Lord, Nathan P. Myhrvold, Clarence T. Tegreene, Roderick A. Hyde, Lowell L. Wood, JR., Muriel Y. Ishikawa, Victoria Y.H. Wood, Charles Whitmer, Paramvir Bahl, Douglas C. Burger, Ranveer Chandra, William H. Gates, III, Paul Holman, Jordin T. Kare, Craig J. Mundie, Tim Paek, Desney S. Tan, Lin Zhong, Matthew G. Dyor

prev 1 2 3 4 5 6 … next