Speech Recognition (epo) Patents (Class 704/E15.001)

  • Publication number: 20150127345
    Abstract: A computer-implemented method includes listening for audio name information indicative of a name of a computer, with the computer configured to listen for the audio name information in a first power mode that promotes a conservation of power; detecting the audio name information indicative of the name of the computer; after detection of the audio name information, switching to a second power mode that promotes a performance of speech recognition; receiving audio command information; and performing speech recognition on the audio command information.
    Type: Application
    Filed: September 30, 2011
    Publication date: May 7, 2015
    Inventors: Evan H. Parker, Michal R. Grabowski
  • Patent number: 8942980
    Abstract: A method of navigating in a sound content wherein at least one key word is stored in association with at least two positions representative of said key word in the sound content, and wherein the method comprises: a step of displaying a representation of the sound content; during playback of the sound content, a step of detecting a current extract representative of a key word stored at a first position; a step of determining at least one second extract representative of said key word and a second position as a function of the stored positions; and a step of highlighting the position of the extracts in the representation of the sound content. The invention also relates to a system adapted to implement the navigation method.
    Type: Grant
    Filed: February 11, 2011
    Date of Patent: January 27, 2015
    Assignee: Orange
    Inventors: Pascal Le Mer, Delphine Charlet, Marc Denjean, Antoine Gonot
  • Patent number: 8930576
    Abstract: The present invention is directed to a secure communication network that enables multi-point to multi-point proxy communication over the network. The network employs a smart server that establishes a secure communication link with each of a plurality of smart client devices deployed on local client networks. Each smart client device is in communication with a plurality of agent devices. A plurality of remote devices can access the smart server directly and communicate with an agent device via the secure communication link between the smart server and one of the smart client devices.
    Type: Grant
    Filed: July 11, 2014
    Date of Patent: January 6, 2015
    Assignee: KE2 Therm Solutions, Inc.
    Inventors: Steve Roberts, Cetin Sert
  • Publication number: 20140372119
    Abstract: In general, the subject matter described in this specification can be embodied in methods, systems, and program products for performing compounded text segmentation. Compounded text that is extracted from one or more search queries submitted to a search engine is received. The compounded text includes a plurality of individual words that are joined together without intervening spaces. An electronic dictionary including words is accessed. A data structure representing possible segmentations of the compounded text is generated based on whether words in the possible segmentations occur in the electronic dictionary. A data store comprising data associated with a same field of usage as the compounded text is accessed to determine a frequency of occurrence for possible segmentations of the data structure. A segmentation of the compounded text that is most probable based on the data is determined. A language model is trained using the determined segmentation of the compounded text.
    Type: Application
    Filed: September 28, 2009
    Publication date: December 18, 2014
    Inventors: Carolina Parada, Boulos Harb, Johan Schalkwyk
  • Patent number: 8850072
    Abstract: The present invention is directed to a secure communication network that enables multi-point to multi-point proxy communication over the network. The network employs a smart server that establishes a secure communication link with each of a plurality of smart client devices installed on local client networks. Each smart client device is in communication with a plurality of agent devices. A plurality of remote devices can access the smart server directly and communicate with agent devices via the secure communication link between the smart server and one of the smart client devices. This communication is enabled without complex configuration of firewall or network parameters by the user.
    Type: Grant
    Filed: July 25, 2013
    Date of Patent: September 30, 2014
    Assignee: KE2 Therm Solutions, Inc.
    Inventors: Steve Roberts, Cetin Sert
  • Publication number: 20140249813
    Abstract: A transcript interface for displaying a plurality of words of a transcript in a text editor can be provided and configured to receive a command to edit the transcript. Limited edits to data corresponding to the transcript can be made based on in response to commands received via the user interface module. For example, edits may be limited to selection of a single word in the text editor for editing via a given command. The edit may affect an adjacent word in some instances, such as when two adjacent words are merged. In some embodiments, data corresponding to the selected word of the transcript is changed to reflect the edit without changing data defining the relative timing of those words of the transcript that are not adjacent to the selected word.
    Type: Application
    Filed: December 1, 2008
    Publication date: September 4, 2014
    Applicant: Adobe Systems Incorporated
    Inventor: Steven Hoeg
  • Patent number: 8731609
    Abstract: A mobile device, such as a cellular telephone includes a voice interface that includes one part that may not be specific to a particular carrier, and a second part that provides an interface to services that are specific to a carrier or to service or information providers that are not necessarily available with all carriers. A voice command interface provides easy access to the carrier services. The set of carrier services is optionally extendible by the carrier.
    Type: Grant
    Filed: August 9, 2011
    Date of Patent: May 20, 2014
    Assignee: Nuanace Communications, Inc.
    Inventors: Daniel L. Roth, Chris Reiner, Mark Furnari, Jordan Cohen
  • Publication number: 20140129218
    Abstract: Computer-based speech recognition can be improved by recognizing words with an accurate accent model. In order to provide a large number of possible accents, while providing real-time speech recognition, a language tree data structure of possible accents is provided in one embodiment such that a computerized speech recognition system can benefit from choosing among accent categories when searching for an appropriate accent model for speech recognition.
    Type: Application
    Filed: November 6, 2012
    Publication date: May 8, 2014
    Applicant: Spansion LLC
    Inventors: Chen Liu, Richard Fastow
  • Publication number: 20140129217
    Abstract: Embodiments of the present invention include an apparatus, method, and system for calculating senone scores for multiple concurrent input speech streams. The method can include the following: receiving one or more feature vectors from one or more input streams; accessing the acoustic model one senone at a time; and calculating separate senone scores corresponding to each incoming feature vector. The calculation uses a single read access to the acoustic model for a single senone and calculates a set of separate senone scores for the one or more feature vectors, before proceeding to the next senone in the acoustic model.
    Type: Application
    Filed: November 6, 2012
    Publication date: May 8, 2014
    Inventor: Ojas A. BAPAT
  • Publication number: 20140122086
    Abstract: Embodiments related to the use of depth imaging to augment speech recognition are disclosed. For example, one disclosed embodiment provides, on a computing device, a method including receiving depth information of a physical space from a depth camera, receiving audio information from one or more microphones, identifying a set of one or more possible spoken words from the audio information, determining a speech input for the computing device based upon comparing the set of one or more possible spoken words from the audio information and the depth information, and taking an action on the computing device based upon the speech input determined.
    Type: Application
    Filed: October 26, 2012
    Publication date: May 1, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Jay Kapur, Ivan Tashev, Mike Seltzer, Stephen Edward Hodges
  • Publication number: 20140100848
    Abstract: Methods and systems for identifying specified phrases within audio streams are provided. More particularly, a phrase is specified. An audio stream is them monitored for the phrase. In response to determining that the audio stream contains the phrase, verification from a user that the phrase was in fact included in the audio stream is requested. If such verification is received, the portion of the audio stream including the phrase is recorded. The recorded phrase can then be applied to identify future instances of the phrase in monitored audio streams.
    Type: Application
    Filed: October 5, 2012
    Publication date: April 10, 2014
    Applicant: AVAYA INC.
    Inventors: Shmuel Shaffer, Keith Ponting, Valentine C. Matula
  • Patent number: 8693977
    Abstract: Techniques for achieving personal security via mobile devices are presented. A portable mobile communication device, such as a phone or a personal digital assistant (PDA), is equipped with geographic positioning capabilities and is equipped with audio and visual devices. A panic mode of operation can be automatically detected in which real time audio and video for an environment surrounding the portable communication device are captured along with a geographic location for the portable communication device. This information is streamed over the Internet to a secure site where it can be viewed in real time and/or later inspected.
    Type: Grant
    Filed: August 13, 2009
    Date of Patent: April 8, 2014
    Assignee: Novell, Inc.
    Inventors: Sandeep Patnaik, Saheednanda Singh, AnilKumar Bolleni
  • Publication number: 20140074472
    Abstract: A voice control system is adapted for controlling an electrical appliance, and includes a host and a portable voice control device. The portable voice control device is capable of wireless communication with the host, and includes an audio pick-up unit for receiving a voice input. One of the host and the portable voice control device includes a voice recognition control module that is configured to recognize a control command from the voice input. The host controls operation of the electrical appliance according to the control command, and transmits an appliance status message to the portable voice control device. The portable voice control device further includes an output unit for outputting the appliance status message.
    Type: Application
    Filed: September 12, 2012
    Publication date: March 13, 2014
    Inventors: Chih-Hung Lin, Teh-Jang Chen
  • Publication number: 20140074468
    Abstract: An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model decision maker with an output quality that is high, stable and depends gradually on the system footprint. In speaker selection for a statistical Text-to-Speech synthesis (TTS) system build, as another example context, an embodiment according to the invention enables a fast selection of the most appropriate speaker among several available ones for the full voice dataset recording and preparation, based on a small amount of recorded speech material.
    Type: Application
    Filed: September 7, 2012
    Publication date: March 13, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Alexander Sorin, Slava Shechtman, Vincent Pollet
  • Publication number: 20140067391
    Abstract: A system and method are presented for predicting speech recognition performance using accuracy scores in speech recognition systems within the speech analytics field. A keyword set is selected. Figure of Merit (FOM) is computed for the keyword set. Relevant features that describe the word individually and in relation to other words in the language are computed. A mapping from these features to FOM is learned. This mapping can be generalized via a suitable machine learning algorithm and be used to predict FOM for a new keyword. In at least embodiment, the predicted FOM may be used to adjust internals of speech recognition engine to achieve a consistent behavior for all inputs for various settings of confidence values.
    Type: Application
    Filed: August 30, 2012
    Publication date: March 6, 2014
    Applicant: INTERACTIVE INTELLIGENCE, INC.
    Inventors: Aravind Ganapathiraju, Yingyi Tan, Felix Immanuel Wyss, Scott Allen Randal
  • Publication number: 20140067392
    Abstract: A method of providing hands-free services using a mobile device having wireless access to computer-based services includes receiving speech in a vehicle from a vehicle occupant; recording the speech using a mobile device; transmitting the recorded speech from the mobile device to a cloud speech service; receiving automatic speech recognition (ASR) results from the cloud speech service at the mobile device; and comparing the recorded speech with the received ASR results at the mobile device to identify one or more error conditions.
    Type: Application
    Filed: September 5, 2012
    Publication date: March 6, 2014
    Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLC
    Inventors: Denis R. Burke, Danilo Gurovich, Daniel E. Rudman, Keith A. Fry, Shane M. McCutchen, Marco T. Carnevale, Mukesh Gupta
  • Patent number: 8655660
    Abstract: The present invention is a system and method for generating a personal voice font including, monitoring voice segments automatically from phone conversations of a user by a voice learning processor to generate a personalized voice font and delivering the personalized voice font (PVF) to the a server.
    Type: Grant
    Filed: February 10, 2009
    Date of Patent: February 18, 2014
    Assignee: International Business Machines Corporation
    Inventors: Zsolt Szalai, Philippe Bazot, Bernard Pucci, Joel Vitale
  • Publication number: 20140039891
    Abstract: Systems and methods for audio editing are provided. In one implementation, a computer-implemented method is provided. The method includes receiving digital audio data including a plurality of distinct vocal components. Each distinct vocal component is automatically identified using one or more attributes that uniquely identify each distinct vocal component. The audio data is separated into two or more individual tracks where each individual track comprises audio data corresponding to one distinct vocal component. The separated individual tracks are then made available for further processing.
    Type: Application
    Filed: October 16, 2007
    Publication date: February 6, 2014
    Applicant: ADOBE SYSTEMS INCORPORATED
    Inventors: Nariman Sodeifi, David E. Johnston
  • Publication number: 20140039881
    Abstract: The instant application includes computationally-implemented systems and methods that include managing adaptation data, the adaptation data is at least partly based on at least one speech interaction of a particular party, facilitating transmission of the adaptation data to a target device when there is an indication of a speech-facilitated transaction between the target device and the particular party, such that the adaptation data is to be applied to the target device to assist in execution of the speech-facilitated transaction, and facilitating acquisition of adaptation result data that is based on at least one aspect of the speech-facilitated transaction and to be used in determining whether to modify the adaptation data. In addition to the foregoing, other aspects are described in the claims, drawings, and text.
    Type: Application
    Filed: August 1, 2012
    Publication date: February 6, 2014
    Inventors: Royce A. Levien, Richard T. Lord, Robert W. Lord, Mark A. Malamud
  • Publication number: 20140039885
    Abstract: Methods and apparatus for voice-enabling a web application, wherein the web application includes one or more web pages rendered by a web browser on a computer. At least one information source external to the web application is queried to determine whether information describing a set of one or more supported voice interactions for the web application is available, and in response to determining that the information is available, the information is retrieved from the at least one information source. Voice input for the web application is then enabled based on the retrieved information.
    Type: Application
    Filed: August 2, 2012
    Publication date: February 6, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: David E. Reich, Christopher Hardy
  • Publication number: 20140029733
    Abstract: A speech server and methods provide audio stream analysis for tone detection in addition to speech recognition to implement an accurate and efficient answering machine detection strategy. By performing both tone detection and speech recognition in a single component, such as the speech server, the number of components for digital signal processing may be reduced. The speech server communicates tone events detected at the telephony level and enables voice applications to detect tone events consistently and provide consistent support and accuracy of both inbound and outbound voice applications independent of the hardware or geographical location of the telephony network. In addition, an improved opportunity for signaling of an appropriate moment for an application to leave a message is provided, thereby supporting automation.
    Type: Application
    Filed: July 26, 2012
    Publication date: January 30, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Kenneth W.D. Smith, Jaques de Broin
  • Publication number: 20140025377
    Abstract: A speech recognition system, method of recognizing speech and a computer program product therefor. A client device identified with a context for an associated user selectively streams audio to a provider computer, e.g., a cloud computer. Speech recognition receives streaming audio, maps utterances to specific textual candidates and determines a likelihood of a correct match for each mapped textual candidate. A context model selectively winnows candidate to resolve recognition ambiguity according to context whenever multiple textual candidates are recognized as potential matches for the same mapped utterance. Matches are used to update the context model, which may be used for multiple users in the same context.
    Type: Application
    Filed: August 10, 2012
    Publication date: January 23, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Fernando Luiz Koch, Julio Nogima
  • Publication number: 20140012582
    Abstract: In some embodiments, a recognition result produced by a speech processing system based on an analysis of a speech input is evaluated for indications of potential errors. In some embodiments, sets of words/phrases that may be acoustically similar or otherwise confusable, the misrecognition of which can be significant in the domain, may be used together with a language model to evaluate a recognition result to determine whether the recognition result includes such an indication. In some embodiments, a word/phrase of a set that appears in the result is iteratively replaced with each of the other words/phrases of the set. The result of the replacement may be evaluated using a language model to determine a likelihood of the newly-created string of words appearing in a language and/or domain. The likelihood may then be evaluated to determine whether the result of the replacement is sufficiently likely for an alert to be triggered.
    Type: Application
    Filed: July 9, 2012
    Publication date: January 9, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: William F. Ganong, III, Raghu Vemula, Robert Fleming
  • Publication number: 20140012579
    Abstract: In some embodiments, recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential errors. In some embodiments, the indications of potential errors may include discrepancies between recognition results that are meaningful for a domain, such as medically-meaningful discrepancies. The evaluation of the recognition results may be carried out using any suitable criteria, including one or more criteria that differ from criteria used by an ASR system in determining the top recognition result and the alternative recognition results from the speech input. In some embodiments, a recognition result may additionally or alternatively be processed to determine whether the recognition result includes a word or phrase that is unlikely to appear in a domain to which speech input relates.
    Type: Application
    Filed: July 9, 2012
    Publication date: January 9, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: William F. Ganong, III, Raghu Vemula, Robert Fleming
  • Publication number: 20140006025
    Abstract: This disclosure includes, for example, methods and computer systems for providing audio-activated resource access for user devices. The computer systems may store instructions to cause the processor to perform operations, comprising capturing audio at a user device. The operations may also comprise using a speaker recognition system to identify a speaker in the transmitted audio and/or using a speech-to-text converter to identify text in the captured audio. The speaker identity or a condensed version of the speaker identity or other metadata along with the speaker identity may be transmitted to a server system to determine a corresponding speaker identity entry. The operations may also comprise receiving a resource corresponding to the identified speaker entry in the server system.
    Type: Application
    Filed: June 29, 2012
    Publication date: January 2, 2014
    Inventors: Harshini Ramnath Krishnan, Andrew Fregly
  • Publication number: 20130346066
    Abstract: Joint decoding of words and tags may be provided. Upon receiving an input from a user comprising a plurality of elements, the input may be decoded into a word lattice comprising a plurality of words. A tag may be assigned to each of the plurality of words and a most-likely sequence of word-tag pairs may be identified. The most-likely sequence of word-tag pairs may be evaluated to identify an action request from the user.
    Type: Application
    Filed: June 20, 2012
    Publication date: December 26, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Anoop Kiran Deoras, Dilek Zeynep Hakkani-Tur, Ruhi Sarikaya, Gokhan Tur
  • Publication number: 20130339027
    Abstract: A method or system for selecting or pruning applicable verbal commands associated with speech recognition based on a user's motions detected from a depth camera. Depending on the depth of the user's hand or arm, the context of the verbal command is determined and verbal commands corresponding to the determined context are selected. Speech recognition is then performed on an audio signal using the selected verbal commands. By using an appropriate set of verbal commands, the accuracy of the speech recognition is increased.
    Type: Application
    Filed: June 15, 2012
    Publication date: December 19, 2013
    Inventors: Tarek El Dokor, James Holmes, Jordan Cluster, Stuart Yamamoto, Pedram Vaghefinazari
  • Publication number: 20130339021
    Abstract: Techniques, an apparatus and an article of manufacture identifying one or more utterances that are likely to carry the intent of a speaker, from a conversation between two or more parties. A method includes obtaining an input of a set of utterances in chronological order from a conversation between two or more parties, computing an intent confidence value of each utterance by summing intent confidence scores from each of the constituent words of the utterance, wherein intent confidence scores capture each word's influence on the subsequent utterances in the conversation based on (i) the uniqueness of the word in the conversation and (ii) the number of times the word subsequently occurs in the conversation, and generating a ranked order of the utterances from highest to lowest intent confidence value, wherein the highest intent value corresponds to the utterance which is most likely to carry intent of the speaker.
    Type: Application
    Filed: June 19, 2012
    Publication date: December 19, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Om D. Deshmukh, Sachindra Joshi, Saurabh Saket, Ashish Verma
  • Publication number: 20130339018
    Abstract: A system and method of verifying the identity of an authorized user in an authorized user group through a voice user interface for enabling secure access to one or more services via a mobile device includes receiving first voice information from a speaker through the voice user interface of the mobile device, calculating a confidence score based on a comparison of the first voice information with a stored voice model associated with the authorized user and specific to the authorized user, interpreting the first voice information as a specific service request, identifying a minimum confidence score for initiating the specific service request, determining whether or not the confidence score exceeds the minimum confidence score, and initiating the specific service request if the confidence score exceeds the minimum confidence score.
    Type: Application
    Filed: July 27, 2012
    Publication date: December 19, 2013
    Applicant: SRI INTERNATIONAL
    Inventors: Nicolas Scheffer, Yun Lei, Douglas A. Bercow
  • Publication number: 20130332147
    Abstract: The technology of the present application provides a method and apparatus to allow for dynamically updating a language model across a large number of similarly situated users. The system identifies individual changes to user profiles and evaluates the change for a broader application, such as, a dialect correction for a speech recognition engine, as administrator for the system identifies similarly situated user profiles and downloads the profile change to effect a dynamic change to the language model of similarly situated users.
    Type: Application
    Filed: June 11, 2012
    Publication date: December 12, 2013
    Applicant: NVOQ INCORPORATED
    Inventor: Charles Corfield
  • Publication number: 20130325459
    Abstract: Computationally implemented methods and systems include receiving indication of initiation of a speech-facilitated transaction between a party and a target device, and receiving adaptation data correlated to the party. The receiving is facilitated by a particular device associated with the party. The adaptation data is at least partly based on previous adaptation data derived at least in part from one or more previous speech interactions of the party. The methods and systems also include applying the received adaptation data correlated to the party to the target device, and processing speech from the party using the target device to which the received adaptation data has been applied. In addition to the foregoing, other aspects are described in the claims, drawings, and text.
    Type: Application
    Filed: May 31, 2012
    Publication date: December 5, 2013
    Inventors: Royce A. Levien, Richard T. Lord, Robert W. Lord, Mark A. Malamud, John D. Rinaldo, JR.
  • Publication number: 20130317820
    Abstract: An automatic speech recognition dictation application is described that includes a dictation module for performing automatic speech recognition in a dictation session with a speaker user to determine representative text corresponding to input speech from the speaker user. A post-processing module develops a session level metric correlated to verbatim recognition error rate of the dictation session, and determines if recognition performance degraded during the dictation session based on a comparison of the session metric to a baseline metric.
    Type: Application
    Filed: May 24, 2012
    Publication date: November 28, 2013
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Xiaoqiang Xiao, Venkatesh Nagesha
  • Publication number: 20130317823
    Abstract: Systems, methods, and computer-readable media that may be used to modify a voice action system to include voice actions provided by advertisers or users are provided. One method includes receiving electronic voice action bids from advertisers to modify the voice action system to include a specific voice action (e.g., a triggering phrase and an action). One or more bids may be selected. The method includes, for each of the selected bids, modifying data associated with the voice action system to include the voice action associated with the bid, such that the action associated with the respective voice action is performed when voice input from a user is received that the voice action system determines to correspond to the triggering phrase associated with the respective voice action.
    Type: Application
    Filed: May 23, 2012
    Publication date: November 28, 2013
    Inventor: Pedro J. Moreno Mengibar
  • Publication number: 20130304468
    Abstract: A method for contextual voice query dilation in a Spoken Web search includes determining a context in which a voice query is created, generating a set of multiple voice query terms based on the context and information derived by a speech recognizer component pertaining to the voice query, and processing the set of query terms with at least one dilation operator to produce a dilated set of queries. A method for performing a search on a voice query is also provided, including generating a set of multiple query terms based on information derived by a speech recognizer component processing a voice query, processing the set with multiple dilation operators to produce multiple dilated sub-sets of query terms, selecting at least one query term from each dilated sub-set to compose a query set, and performing a search on the query set.
    Type: Application
    Filed: August 8, 2012
    Publication date: November 14, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Nitendra Rajput, Kundan Shrivastava
  • Publication number: 20130297316
    Abstract: A method, system, and computer program product for voice entry of information are provided in the illustrative embodiments. A conversion rule is applied to a voice input. An entry field input is generated, wherein the conversion rule allows the voice input to be distinct from the entry field input, and wherein the voice input obfuscates the entry field input. The entry field input is provided to an application, wherein the entry field is usable to populate a data entry field in the application.
    Type: Application
    Filed: May 3, 2012
    Publication date: November 7, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Brian John Cragun, Marc Kevin Johlic
  • Publication number: 20130297306
    Abstract: An adaptive equalization system that adjusts the spectral shape of a speech signal based on an intelligibility measurement of the speech signal may improve the intelligibility of the output speech signal. Such an adaptive equalization system may include a speech intelligibility measurement module, a spectral shape adjustment module, and an adaptive equalization module. The speech intelligibility measurement module is configured to calculate a speech intelligibility measurement of a speech signal. The spectral shape adjustment module is configured to generate a weighted long-term speech curve based on a first predetermined long-term average speech curve, a second predetermined long-term average speech curve, and the speech intelligibility measurement. The adaptive equalization module is configured to adapt equalization coefficients for the speech signal based on the weighted long-term speech curve.
    Type: Application
    Filed: May 4, 2012
    Publication date: November 7, 2013
    Applicant: QNX Software Systems Limited
    Inventors: Phillip Alan Hetherington, Xueman Li
  • Publication number: 20130282371
    Abstract: A method is disclosed herein for recognizing a repeated utterance in a mobile computing device via a processor. A first utterance is detected being spoken into a first mobile computing device. Likewise, a second utterance is detected being spoken into a second mobile computing device within a predetermined time period. The second utterance substantially matches the first spoken utterance and the first and second mobile computing devices are communicatively coupled to each other. The processor enables capturing, at least temporarily, a matching utterance for performing a subsequent processing function. The performed subsequent processing function is based on a type of captured utterance.
    Type: Application
    Filed: April 20, 2012
    Publication date: October 24, 2013
    Applicant: Motorola Mobility, Inc.
    Inventors: Rachid M Alameh, Jiri Slaby, Hisashi D. Watanabe
  • Publication number: 20130275136
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable media for enhancing speech recognition accuracy. The method includes dividing a system dialog turn into segments based on timing of probable user responses, generating a weighted grammar for each segment, exclusively activating the weighted grammar generated for a current segment of the dialog turn during the current segment of the dialog turn, and recognizing user speech received during the current segment using the activated weighted grammar generated for the current segment. The method can further include assigning probability to the weighted grammar based on historical user responses and activating each weighted grammar is based on the assigned probability. Weighted grammars can be generated based on a user profile. A weighted grammar can be generated for two or more segments.
    Type: Application
    Filed: April 13, 2012
    Publication date: October 17, 2013
    Applicant: AT&T Intellectual Property I, L.P.
    Inventor: Michael Czahor
  • Publication number: 20130268270
    Abstract: A method is described for use with automatic speech recognition using discriminative criteria for speaker adaptation. An adaptation evaluation is performed of speech recognition performance data for speech recognition system users. Adaptation candidate users are identified based on the adaptation evaluation for whom an adaptation process is likely to improve system performance.
    Type: Application
    Filed: April 5, 2012
    Publication date: October 10, 2013
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Dan Ning Jiang, Vaibhava Goel, Dimitri Kanevsky, Yong Qin
  • Publication number: 20130249783
    Abstract: The invention relates to a method and system for annotating image regions with specific concepts based on multimodal user input. The system (10) comprises an identification unit (11) for the identification of a region of interest on a multidimensional image; an automatic speech recognition unit (12) for recognizing speech input in a natural language; a natural language understanding unit (13) which interprets the speech input in the context of a specific application domain; a fusion unit (14) which combines the multimodal user input from the identification unit (11) and the natural language understanding unit (13); and an annotation unit (15) which annotates the result of the natural language understanding unit (13) on the image regions and optionally provides user feedback about the annotation process. Thus, the system advantageously facilitates a user's task to annotate specific image regions with standardized key concepts based on multimodal speech-based user input.
    Type: Application
    Filed: March 22, 2012
    Publication date: September 26, 2013
    Inventor: Daniel Sonntag
  • Publication number: 20130253930
    Abstract: Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.
    Type: Application
    Filed: March 23, 2012
    Publication date: September 26, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Michael Lewis Seltzer, Alejandro Acero
  • Publication number: 20130238326
    Abstract: In an environment including multiple electronic devices that are each capable of being controlled by a user's voice command, an individual device is able to distinguish a voice command intended particularly for the device from among other voice commands that are intended for other devices present in the common environment. The device is able to accomplish this distinction by identifying unique attributes belonging to the device itself from within a user's voice command. Thus only voice commands that include attribute information that are supported by the device will be recognized by the device, and other voice commands that include attribute information that are not supported by the device may be effectively ignored for voice control purposes of the device.
    Type: Application
    Filed: March 8, 2012
    Publication date: September 12, 2013
    Applicant: LG ELECTRONICS INC.
    Inventors: Yongsin KIM, Dami CHOE, Hyorim PARK
  • Publication number: 20130231932
    Abstract: Implementations include systems, methods and/or devices operable to detect voice activity in an audible signal by detecting glottal pulses. The dominant frequency of a series of glottal pulses is perceived as the intonation pattern or melody of natural speech, which is also referred to as the pitch. However, as noted above, spoken communication typically occurs in the presence of noise and/or other interference. In turn, the undulation of voiced speech is masked in some portions of the frequency spectrum associated with human speech by the noise and/or other interference. In some implementations, detection of voice activity is facilitated by dividing the frequency spectrum associated with human speech into multiple sub-bands in order to identify glottal pulses that dominate the noise and/or other inference in particular sub-bands. Additionally and/or alternatively, in some implementations the analysis is furthered to provide a pitch estimate of the detected voice activity.
    Type: Application
    Filed: August 20, 2012
    Publication date: September 5, 2013
    Inventors: Pierre Zakarauskas, Alexander Escott, Clarence S.H. Chu, Shawn E. Stevenson
  • Publication number: 20130226557
    Abstract: The present disclosure describes a teleconferencing system that may use a virtual participant processor to translate language content of the teleconference into each participant's spoken language without additional user inputs. The virtual participant processor may connect to the teleconference as do the other participants. The virtual participant processor may intercept all text or audio data that was previously exchanged between the participants may now be intercepted by the virtual participant processor. Upon obtaining a partial or complete language recognition result or making a language preference determination, the virtual participant processor may call a translation engine appropriate for each of the participants. The virtual participant processor may send the resulting translation to a teleconference management processor. The teleconference management processor may deliver the respective translated text or audio data to the appropriate participant.
    Type: Application
    Filed: April 30, 2012
    Publication date: August 29, 2013
    Applicant: Google Inc.
    Inventors: Jakob David Uszkoreit, Ashish Venugopal, Johan Schalkwyk, Joshua James Estelle
  • Publication number: 20130225240
    Abstract: An electronic device is configured to receive data from a keypad key, wherein the key is associated with first and second alphanumeric characters. The device includes a keypad interface and a data entry processor. The keypad interface is configured to determine the first and second alphanumeric characters when the key is pressed. The data entry processor is configured to select the first alphanumeric character from among the first and second alphanumeric characters when a speech recognizer determines that a spoken entry identifies the first alphanumeric character.
    Type: Application
    Filed: February 29, 2012
    Publication date: August 29, 2013
    Applicant: NVIDIA Corporation
    Inventors: Henry P. Largey, Gabriel Rivera
  • Publication number: 20130195285
    Abstract: A speech from a speaker proximate to one or more microphones within an environment can be received. The microphones can be a directional microphone or an omni-directional microphone. The speech can be processed to produce an utterance to determine the identity of the speaker. The identity of the speaker can be associated with a voiceprint. The identity can be associated with a user's credentials of a computing system. The credentials can uniquely identify the user within the computing system. The utterance can be analyzed to establish a zone in which the speaker is present. The zone can be a bounded region within the environment. The zone can be mapped within the environment to determine a location of the speaker. The location can be a relative or an absolute location.
    Type: Application
    Filed: January 30, 2012
    Publication date: August 1, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: STEPHANIE DE LA FUENTE, GREGORY S. JONES, JOHN S. PANNELL
  • Publication number: 20130197906
    Abstract: Techniques to normalize names for name-based speech recognition grammars are described. Some embodiments are particularly directed to techniques to normalize names for name-based speech recognition grammars more efficiently by caching, and on a per-culture basis. A technique may comprise receiving a name for normalization, during name processing for a name-based speech grammar generating process. A normalization cache may be examined to determine if the name is already in the cache in a normalized form. When the name is not already in the cache, the name may be normalized and added to the cache. When the name is in the cache, the normalization result may be retrieved and passed to the next processing step. Other embodiments are described and claimed.
    Type: Application
    Filed: January 27, 2012
    Publication date: August 1, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Mini Varkey, Bernardo Sana, Victor Boctor, Diego Carlomagno
  • Publication number: 20130179164
    Abstract: A vehicle voice interface system calibration method comprising electronically convolving voice command data with voice impulse response data, electronically convolving audio system output data with feedback impulse response data, and calibrating the vehicle voice interface system. The voice command data is electronically convolved with voice impulse response data representing a voice acoustic signal path between an artificial mouth simulator and a first microphone, to simulate a voice acoustic transfer function pertaining to the passenger compartment. The audio system output data is convolved with feedback impulse response data representing a feedback acoustic signal path between a vehicle audio system output and a second microphone, to simulate a feedback acoustic transfer function pertaining to the passenger compartment. The voice interface system is calibrated to recognize voice commands represented by the voice command data based on the simulated voice and feedback acoustic transfer functions.
    Type: Application
    Filed: January 6, 2012
    Publication date: July 11, 2013
    Applicant: Nissan North America, Inc.
    Inventor: Patrick Dennis
  • Publication number: 20130179162
    Abstract: An inventive system and method for touch free operation of a device is presented. The system can comprise a depth sensor for detecting a movement, motion software to receive the detected movement from the depth sensor, deduce a gesture based on the detected movement, and filter the gesture to accept an applicable gesture, and client software to receive the applicable gesture at a client computer for performing a task in accordance with client logic based on the applicable gesture. The client can be a mapping device and the task can be one of various mapping operations. The system can also comprise hardware for making the detected movement an applicable gesture. The system can also comprise voice recognition providing voice input for enabling the client to perform the task based on the voice input in conjunction with the applicable gesture. The applicable gesture can be a movement authorized using facial recognition.
    Type: Application
    Filed: January 11, 2012
    Publication date: July 11, 2013
    Applicant: BIOSENSE WEBSTER (ISRAEL), LTD.
    Inventors: Asaf Merschon, Assaf Govari, Andres Claudio Altmann, Yitzhack Schwartz
  • Patent number: 8483540
    Abstract: A method, apparatus and system for synchronizing between two recording modes includes identifying a common event in the two recording modes. The event in time is recognized for a higher accuracy mode of the two modes. The event is predicted in a lower accuracy mode of the two modes by determining a time when the event occurred between frames in the lower accuracy mode. The event in the higher accuracy mode is synchronized to the lower accuracy mode to provide sub-frame accuracy alignment between the two modes. In one embodiment of the invention, the common event includes the closing of a clap slate, and the two modes include audio and video recording modes.
    Type: Grant
    Filed: December 12, 2006
    Date of Patent: July 9, 2013
    Assignee: Thomson Licensing
    Inventors: Ingo Doser, Ana Belen Benitez, Dong-Qing Zhang