Speech Recognition (epo) Patents (Class 704/E15.001)

  • Publication number: 20140100848
    Abstract: Methods and systems for identifying specified phrases within audio streams are provided. More particularly, a phrase is specified. An audio stream is them monitored for the phrase. In response to determining that the audio stream contains the phrase, verification from a user that the phrase was in fact included in the audio stream is requested. If such verification is received, the portion of the audio stream including the phrase is recorded. The recorded phrase can then be applied to identify future instances of the phrase in monitored audio streams.
    Type: Application
    Filed: October 5, 2012
    Publication date: April 10, 2014
    Applicant: AVAYA INC.
    Inventors: Shmuel Shaffer, Keith Ponting, Valentine C. Matula
  • Patent number: 8693977
    Abstract: Techniques for achieving personal security via mobile devices are presented. A portable mobile communication device, such as a phone or a personal digital assistant (PDA), is equipped with geographic positioning capabilities and is equipped with audio and visual devices. A panic mode of operation can be automatically detected in which real time audio and video for an environment surrounding the portable communication device are captured along with a geographic location for the portable communication device. This information is streamed over the Internet to a secure site where it can be viewed in real time and/or later inspected.
    Type: Grant
    Filed: August 13, 2009
    Date of Patent: April 8, 2014
    Assignee: Novell, Inc.
    Inventors: Sandeep Patnaik, Saheednanda Singh, AnilKumar Bolleni
  • Publication number: 20140074468
    Abstract: An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model decision maker with an output quality that is high, stable and depends gradually on the system footprint. In speaker selection for a statistical Text-to-Speech synthesis (TTS) system build, as another example context, an embodiment according to the invention enables a fast selection of the most appropriate speaker among several available ones for the full voice dataset recording and preparation, based on a small amount of recorded speech material.
    Type: Application
    Filed: September 7, 2012
    Publication date: March 13, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Alexander Sorin, Slava Shechtman, Vincent Pollet
  • Publication number: 20140074472
    Abstract: A voice control system is adapted for controlling an electrical appliance, and includes a host and a portable voice control device. The portable voice control device is capable of wireless communication with the host, and includes an audio pick-up unit for receiving a voice input. One of the host and the portable voice control device includes a voice recognition control module that is configured to recognize a control command from the voice input. The host controls operation of the electrical appliance according to the control command, and transmits an appliance status message to the portable voice control device. The portable voice control device further includes an output unit for outputting the appliance status message.
    Type: Application
    Filed: September 12, 2012
    Publication date: March 13, 2014
    Inventors: Chih-Hung Lin, Teh-Jang Chen
  • Publication number: 20140067391
    Abstract: A system and method are presented for predicting speech recognition performance using accuracy scores in speech recognition systems within the speech analytics field. A keyword set is selected. Figure of Merit (FOM) is computed for the keyword set. Relevant features that describe the word individually and in relation to other words in the language are computed. A mapping from these features to FOM is learned. This mapping can be generalized via a suitable machine learning algorithm and be used to predict FOM for a new keyword. In at least embodiment, the predicted FOM may be used to adjust internals of speech recognition engine to achieve a consistent behavior for all inputs for various settings of confidence values.
    Type: Application
    Filed: August 30, 2012
    Publication date: March 6, 2014
    Applicant: INTERACTIVE INTELLIGENCE, INC.
    Inventors: Aravind Ganapathiraju, Yingyi Tan, Felix Immanuel Wyss, Scott Allen Randal
  • Publication number: 20140067392
    Abstract: A method of providing hands-free services using a mobile device having wireless access to computer-based services includes receiving speech in a vehicle from a vehicle occupant; recording the speech using a mobile device; transmitting the recorded speech from the mobile device to a cloud speech service; receiving automatic speech recognition (ASR) results from the cloud speech service at the mobile device; and comparing the recorded speech with the received ASR results at the mobile device to identify one or more error conditions.
    Type: Application
    Filed: September 5, 2012
    Publication date: March 6, 2014
    Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLC
    Inventors: Denis R. Burke, Danilo Gurovich, Daniel E. Rudman, Keith A. Fry, Shane M. McCutchen, Marco T. Carnevale, Mukesh Gupta
  • Patent number: 8655660
    Abstract: The present invention is a system and method for generating a personal voice font including, monitoring voice segments automatically from phone conversations of a user by a voice learning processor to generate a personalized voice font and delivering the personalized voice font (PVF) to the a server.
    Type: Grant
    Filed: February 10, 2009
    Date of Patent: February 18, 2014
    Assignee: International Business Machines Corporation
    Inventors: Zsolt Szalai, Philippe Bazot, Bernard Pucci, Joel Vitale
  • Publication number: 20140039885
    Abstract: Methods and apparatus for voice-enabling a web application, wherein the web application includes one or more web pages rendered by a web browser on a computer. At least one information source external to the web application is queried to determine whether information describing a set of one or more supported voice interactions for the web application is available, and in response to determining that the information is available, the information is retrieved from the at least one information source. Voice input for the web application is then enabled based on the retrieved information.
    Type: Application
    Filed: August 2, 2012
    Publication date: February 6, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: David E. Reich, Christopher Hardy
  • Publication number: 20140039891
    Abstract: Systems and methods for audio editing are provided. In one implementation, a computer-implemented method is provided. The method includes receiving digital audio data including a plurality of distinct vocal components. Each distinct vocal component is automatically identified using one or more attributes that uniquely identify each distinct vocal component. The audio data is separated into two or more individual tracks where each individual track comprises audio data corresponding to one distinct vocal component. The separated individual tracks are then made available for further processing.
    Type: Application
    Filed: October 16, 2007
    Publication date: February 6, 2014
    Applicant: ADOBE SYSTEMS INCORPORATED
    Inventors: Nariman Sodeifi, David E. Johnston
  • Publication number: 20140039881
    Abstract: The instant application includes computationally-implemented systems and methods that include managing adaptation data, the adaptation data is at least partly based on at least one speech interaction of a particular party, facilitating transmission of the adaptation data to a target device when there is an indication of a speech-facilitated transaction between the target device and the particular party, such that the adaptation data is to be applied to the target device to assist in execution of the speech-facilitated transaction, and facilitating acquisition of adaptation result data that is based on at least one aspect of the speech-facilitated transaction and to be used in determining whether to modify the adaptation data. In addition to the foregoing, other aspects are described in the claims, drawings, and text.
    Type: Application
    Filed: August 1, 2012
    Publication date: February 6, 2014
    Inventors: Royce A. Levien, Richard T. Lord, Robert W. Lord, Mark A. Malamud
  • Publication number: 20140029733
    Abstract: A speech server and methods provide audio stream analysis for tone detection in addition to speech recognition to implement an accurate and efficient answering machine detection strategy. By performing both tone detection and speech recognition in a single component, such as the speech server, the number of components for digital signal processing may be reduced. The speech server communicates tone events detected at the telephony level and enables voice applications to detect tone events consistently and provide consistent support and accuracy of both inbound and outbound voice applications independent of the hardware or geographical location of the telephony network. In addition, an improved opportunity for signaling of an appropriate moment for an application to leave a message is provided, thereby supporting automation.
    Type: Application
    Filed: July 26, 2012
    Publication date: January 30, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Kenneth W.D. Smith, Jaques de Broin
  • Publication number: 20140025377
    Abstract: A speech recognition system, method of recognizing speech and a computer program product therefor. A client device identified with a context for an associated user selectively streams audio to a provider computer, e.g., a cloud computer. Speech recognition receives streaming audio, maps utterances to specific textual candidates and determines a likelihood of a correct match for each mapped textual candidate. A context model selectively winnows candidate to resolve recognition ambiguity according to context whenever multiple textual candidates are recognized as potential matches for the same mapped utterance. Matches are used to update the context model, which may be used for multiple users in the same context.
    Type: Application
    Filed: August 10, 2012
    Publication date: January 23, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Fernando Luiz Koch, Julio Nogima
  • Publication number: 20140012579
    Abstract: In some embodiments, recognition results produced by a speech processing system (which may include two or more recognition results, including a top recognition result and one or more alternative recognition results) based on an analysis of a speech input, are evaluated for indications of potential errors. In some embodiments, the indications of potential errors may include discrepancies between recognition results that are meaningful for a domain, such as medically-meaningful discrepancies. The evaluation of the recognition results may be carried out using any suitable criteria, including one or more criteria that differ from criteria used by an ASR system in determining the top recognition result and the alternative recognition results from the speech input. In some embodiments, a recognition result may additionally or alternatively be processed to determine whether the recognition result includes a word or phrase that is unlikely to appear in a domain to which speech input relates.
    Type: Application
    Filed: July 9, 2012
    Publication date: January 9, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: William F. Ganong, III, Raghu Vemula, Robert Fleming
  • Publication number: 20140012582
    Abstract: In some embodiments, a recognition result produced by a speech processing system based on an analysis of a speech input is evaluated for indications of potential errors. In some embodiments, sets of words/phrases that may be acoustically similar or otherwise confusable, the misrecognition of which can be significant in the domain, may be used together with a language model to evaluate a recognition result to determine whether the recognition result includes such an indication. In some embodiments, a word/phrase of a set that appears in the result is iteratively replaced with each of the other words/phrases of the set. The result of the replacement may be evaluated using a language model to determine a likelihood of the newly-created string of words appearing in a language and/or domain. The likelihood may then be evaluated to determine whether the result of the replacement is sufficiently likely for an alert to be triggered.
    Type: Application
    Filed: July 9, 2012
    Publication date: January 9, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: William F. Ganong, III, Raghu Vemula, Robert Fleming
  • Publication number: 20140006025
    Abstract: This disclosure includes, for example, methods and computer systems for providing audio-activated resource access for user devices. The computer systems may store instructions to cause the processor to perform operations, comprising capturing audio at a user device. The operations may also comprise using a speaker recognition system to identify a speaker in the transmitted audio and/or using a speech-to-text converter to identify text in the captured audio. The speaker identity or a condensed version of the speaker identity or other metadata along with the speaker identity may be transmitted to a server system to determine a corresponding speaker identity entry. The operations may also comprise receiving a resource corresponding to the identified speaker entry in the server system.
    Type: Application
    Filed: June 29, 2012
    Publication date: January 2, 2014
    Inventors: Harshini Ramnath Krishnan, Andrew Fregly
  • Publication number: 20130346066
    Abstract: Joint decoding of words and tags may be provided. Upon receiving an input from a user comprising a plurality of elements, the input may be decoded into a word lattice comprising a plurality of words. A tag may be assigned to each of the plurality of words and a most-likely sequence of word-tag pairs may be identified. The most-likely sequence of word-tag pairs may be evaluated to identify an action request from the user.
    Type: Application
    Filed: June 20, 2012
    Publication date: December 26, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Anoop Kiran Deoras, Dilek Zeynep Hakkani-Tur, Ruhi Sarikaya, Gokhan Tur
  • Publication number: 20130339027
    Abstract: A method or system for selecting or pruning applicable verbal commands associated with speech recognition based on a user's motions detected from a depth camera. Depending on the depth of the user's hand or arm, the context of the verbal command is determined and verbal commands corresponding to the determined context are selected. Speech recognition is then performed on an audio signal using the selected verbal commands. By using an appropriate set of verbal commands, the accuracy of the speech recognition is increased.
    Type: Application
    Filed: June 15, 2012
    Publication date: December 19, 2013
    Inventors: Tarek El Dokor, James Holmes, Jordan Cluster, Stuart Yamamoto, Pedram Vaghefinazari
  • Publication number: 20130339021
    Abstract: Techniques, an apparatus and an article of manufacture identifying one or more utterances that are likely to carry the intent of a speaker, from a conversation between two or more parties. A method includes obtaining an input of a set of utterances in chronological order from a conversation between two or more parties, computing an intent confidence value of each utterance by summing intent confidence scores from each of the constituent words of the utterance, wherein intent confidence scores capture each word's influence on the subsequent utterances in the conversation based on (i) the uniqueness of the word in the conversation and (ii) the number of times the word subsequently occurs in the conversation, and generating a ranked order of the utterances from highest to lowest intent confidence value, wherein the highest intent value corresponds to the utterance which is most likely to carry intent of the speaker.
    Type: Application
    Filed: June 19, 2012
    Publication date: December 19, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Om D. Deshmukh, Sachindra Joshi, Saurabh Saket, Ashish Verma
  • Publication number: 20130339018
    Abstract: A system and method of verifying the identity of an authorized user in an authorized user group through a voice user interface for enabling secure access to one or more services via a mobile device includes receiving first voice information from a speaker through the voice user interface of the mobile device, calculating a confidence score based on a comparison of the first voice information with a stored voice model associated with the authorized user and specific to the authorized user, interpreting the first voice information as a specific service request, identifying a minimum confidence score for initiating the specific service request, determining whether or not the confidence score exceeds the minimum confidence score, and initiating the specific service request if the confidence score exceeds the minimum confidence score.
    Type: Application
    Filed: July 27, 2012
    Publication date: December 19, 2013
    Applicant: SRI INTERNATIONAL
    Inventors: Nicolas Scheffer, Yun Lei, Douglas A. Bercow
  • Publication number: 20130332147
    Abstract: The technology of the present application provides a method and apparatus to allow for dynamically updating a language model across a large number of similarly situated users. The system identifies individual changes to user profiles and evaluates the change for a broader application, such as, a dialect correction for a speech recognition engine, as administrator for the system identifies similarly situated user profiles and downloads the profile change to effect a dynamic change to the language model of similarly situated users.
    Type: Application
    Filed: June 11, 2012
    Publication date: December 12, 2013
    Applicant: NVOQ INCORPORATED
    Inventor: Charles Corfield
  • Publication number: 20130325459
    Abstract: Computationally implemented methods and systems include receiving indication of initiation of a speech-facilitated transaction between a party and a target device, and receiving adaptation data correlated to the party. The receiving is facilitated by a particular device associated with the party. The adaptation data is at least partly based on previous adaptation data derived at least in part from one or more previous speech interactions of the party. The methods and systems also include applying the received adaptation data correlated to the party to the target device, and processing speech from the party using the target device to which the received adaptation data has been applied. In addition to the foregoing, other aspects are described in the claims, drawings, and text.
    Type: Application
    Filed: May 31, 2012
    Publication date: December 5, 2013
    Inventors: Royce A. Levien, Richard T. Lord, Robert W. Lord, Mark A. Malamud, John D. Rinaldo, JR.
  • Publication number: 20130317823
    Abstract: Systems, methods, and computer-readable media that may be used to modify a voice action system to include voice actions provided by advertisers or users are provided. One method includes receiving electronic voice action bids from advertisers to modify the voice action system to include a specific voice action (e.g., a triggering phrase and an action). One or more bids may be selected. The method includes, for each of the selected bids, modifying data associated with the voice action system to include the voice action associated with the bid, such that the action associated with the respective voice action is performed when voice input from a user is received that the voice action system determines to correspond to the triggering phrase associated with the respective voice action.
    Type: Application
    Filed: May 23, 2012
    Publication date: November 28, 2013
    Inventor: Pedro J. Moreno Mengibar
  • Publication number: 20130317820
    Abstract: An automatic speech recognition dictation application is described that includes a dictation module for performing automatic speech recognition in a dictation session with a speaker user to determine representative text corresponding to input speech from the speaker user. A post-processing module develops a session level metric correlated to verbatim recognition error rate of the dictation session, and determines if recognition performance degraded during the dictation session based on a comparison of the session metric to a baseline metric.
    Type: Application
    Filed: May 24, 2012
    Publication date: November 28, 2013
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Xiaoqiang Xiao, Venkatesh Nagesha
  • Publication number: 20130304468
    Abstract: A method for contextual voice query dilation in a Spoken Web search includes determining a context in which a voice query is created, generating a set of multiple voice query terms based on the context and information derived by a speech recognizer component pertaining to the voice query, and processing the set of query terms with at least one dilation operator to produce a dilated set of queries. A method for performing a search on a voice query is also provided, including generating a set of multiple query terms based on information derived by a speech recognizer component processing a voice query, processing the set with multiple dilation operators to produce multiple dilated sub-sets of query terms, selecting at least one query term from each dilated sub-set to compose a query set, and performing a search on the query set.
    Type: Application
    Filed: August 8, 2012
    Publication date: November 14, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Nitendra Rajput, Kundan Shrivastava
  • Publication number: 20130297306
    Abstract: An adaptive equalization system that adjusts the spectral shape of a speech signal based on an intelligibility measurement of the speech signal may improve the intelligibility of the output speech signal. Such an adaptive equalization system may include a speech intelligibility measurement module, a spectral shape adjustment module, and an adaptive equalization module. The speech intelligibility measurement module is configured to calculate a speech intelligibility measurement of a speech signal. The spectral shape adjustment module is configured to generate a weighted long-term speech curve based on a first predetermined long-term average speech curve, a second predetermined long-term average speech curve, and the speech intelligibility measurement. The adaptive equalization module is configured to adapt equalization coefficients for the speech signal based on the weighted long-term speech curve.
    Type: Application
    Filed: May 4, 2012
    Publication date: November 7, 2013
    Applicant: QNX Software Systems Limited
    Inventors: Phillip Alan Hetherington, Xueman Li
  • Publication number: 20130297316
    Abstract: A method, system, and computer program product for voice entry of information are provided in the illustrative embodiments. A conversion rule is applied to a voice input. An entry field input is generated, wherein the conversion rule allows the voice input to be distinct from the entry field input, and wherein the voice input obfuscates the entry field input. The entry field input is provided to an application, wherein the entry field is usable to populate a data entry field in the application.
    Type: Application
    Filed: May 3, 2012
    Publication date: November 7, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Brian John Cragun, Marc Kevin Johlic
  • Publication number: 20130282371
    Abstract: A method is disclosed herein for recognizing a repeated utterance in a mobile computing device via a processor. A first utterance is detected being spoken into a first mobile computing device. Likewise, a second utterance is detected being spoken into a second mobile computing device within a predetermined time period. The second utterance substantially matches the first spoken utterance and the first and second mobile computing devices are communicatively coupled to each other. The processor enables capturing, at least temporarily, a matching utterance for performing a subsequent processing function. The performed subsequent processing function is based on a type of captured utterance.
    Type: Application
    Filed: April 20, 2012
    Publication date: October 24, 2013
    Applicant: Motorola Mobility, Inc.
    Inventors: Rachid M Alameh, Jiri Slaby, Hisashi D. Watanabe
  • Publication number: 20130275136
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable media for enhancing speech recognition accuracy. The method includes dividing a system dialog turn into segments based on timing of probable user responses, generating a weighted grammar for each segment, exclusively activating the weighted grammar generated for a current segment of the dialog turn during the current segment of the dialog turn, and recognizing user speech received during the current segment using the activated weighted grammar generated for the current segment. The method can further include assigning probability to the weighted grammar based on historical user responses and activating each weighted grammar is based on the assigned probability. Weighted grammars can be generated based on a user profile. A weighted grammar can be generated for two or more segments.
    Type: Application
    Filed: April 13, 2012
    Publication date: October 17, 2013
    Applicant: AT&T Intellectual Property I, L.P.
    Inventor: Michael Czahor
  • Publication number: 20130268270
    Abstract: A method is described for use with automatic speech recognition using discriminative criteria for speaker adaptation. An adaptation evaluation is performed of speech recognition performance data for speech recognition system users. Adaptation candidate users are identified based on the adaptation evaluation for whom an adaptation process is likely to improve system performance.
    Type: Application
    Filed: April 5, 2012
    Publication date: October 10, 2013
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Dan Ning Jiang, Vaibhava Goel, Dimitri Kanevsky, Yong Qin
  • Publication number: 20130253930
    Abstract: Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.
    Type: Application
    Filed: March 23, 2012
    Publication date: September 26, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Michael Lewis Seltzer, Alejandro Acero
  • Publication number: 20130249783
    Abstract: The invention relates to a method and system for annotating image regions with specific concepts based on multimodal user input. The system (10) comprises an identification unit (11) for the identification of a region of interest on a multidimensional image; an automatic speech recognition unit (12) for recognizing speech input in a natural language; a natural language understanding unit (13) which interprets the speech input in the context of a specific application domain; a fusion unit (14) which combines the multimodal user input from the identification unit (11) and the natural language understanding unit (13); and an annotation unit (15) which annotates the result of the natural language understanding unit (13) on the image regions and optionally provides user feedback about the annotation process. Thus, the system advantageously facilitates a user's task to annotate specific image regions with standardized key concepts based on multimodal speech-based user input.
    Type: Application
    Filed: March 22, 2012
    Publication date: September 26, 2013
    Inventor: Daniel Sonntag
  • Publication number: 20130238326
    Abstract: In an environment including multiple electronic devices that are each capable of being controlled by a user's voice command, an individual device is able to distinguish a voice command intended particularly for the device from among other voice commands that are intended for other devices present in the common environment. The device is able to accomplish this distinction by identifying unique attributes belonging to the device itself from within a user's voice command. Thus only voice commands that include attribute information that are supported by the device will be recognized by the device, and other voice commands that include attribute information that are not supported by the device may be effectively ignored for voice control purposes of the device.
    Type: Application
    Filed: March 8, 2012
    Publication date: September 12, 2013
    Applicant: LG ELECTRONICS INC.
    Inventors: Yongsin KIM, Dami CHOE, Hyorim PARK
  • Publication number: 20130231932
    Abstract: Implementations include systems, methods and/or devices operable to detect voice activity in an audible signal by detecting glottal pulses. The dominant frequency of a series of glottal pulses is perceived as the intonation pattern or melody of natural speech, which is also referred to as the pitch. However, as noted above, spoken communication typically occurs in the presence of noise and/or other interference. In turn, the undulation of voiced speech is masked in some portions of the frequency spectrum associated with human speech by the noise and/or other interference. In some implementations, detection of voice activity is facilitated by dividing the frequency spectrum associated with human speech into multiple sub-bands in order to identify glottal pulses that dominate the noise and/or other inference in particular sub-bands. Additionally and/or alternatively, in some implementations the analysis is furthered to provide a pitch estimate of the detected voice activity.
    Type: Application
    Filed: August 20, 2012
    Publication date: September 5, 2013
    Inventors: Pierre Zakarauskas, Alexander Escott, Clarence S.H. Chu, Shawn E. Stevenson
  • Publication number: 20130226557
    Abstract: The present disclosure describes a teleconferencing system that may use a virtual participant processor to translate language content of the teleconference into each participant's spoken language without additional user inputs. The virtual participant processor may connect to the teleconference as do the other participants. The virtual participant processor may intercept all text or audio data that was previously exchanged between the participants may now be intercepted by the virtual participant processor. Upon obtaining a partial or complete language recognition result or making a language preference determination, the virtual participant processor may call a translation engine appropriate for each of the participants. The virtual participant processor may send the resulting translation to a teleconference management processor. The teleconference management processor may deliver the respective translated text or audio data to the appropriate participant.
    Type: Application
    Filed: April 30, 2012
    Publication date: August 29, 2013
    Applicant: Google Inc.
    Inventors: Jakob David Uszkoreit, Ashish Venugopal, Johan Schalkwyk, Joshua James Estelle
  • Publication number: 20130225240
    Abstract: An electronic device is configured to receive data from a keypad key, wherein the key is associated with first and second alphanumeric characters. The device includes a keypad interface and a data entry processor. The keypad interface is configured to determine the first and second alphanumeric characters when the key is pressed. The data entry processor is configured to select the first alphanumeric character from among the first and second alphanumeric characters when a speech recognizer determines that a spoken entry identifies the first alphanumeric character.
    Type: Application
    Filed: February 29, 2012
    Publication date: August 29, 2013
    Applicant: NVIDIA Corporation
    Inventors: Henry P. Largey, Gabriel Rivera
  • Publication number: 20130195285
    Abstract: A speech from a speaker proximate to one or more microphones within an environment can be received. The microphones can be a directional microphone or an omni-directional microphone. The speech can be processed to produce an utterance to determine the identity of the speaker. The identity of the speaker can be associated with a voiceprint. The identity can be associated with a user's credentials of a computing system. The credentials can uniquely identify the user within the computing system. The utterance can be analyzed to establish a zone in which the speaker is present. The zone can be a bounded region within the environment. The zone can be mapped within the environment to determine a location of the speaker. The location can be a relative or an absolute location.
    Type: Application
    Filed: January 30, 2012
    Publication date: August 1, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: STEPHANIE DE LA FUENTE, GREGORY S. JONES, JOHN S. PANNELL
  • Publication number: 20130197906
    Abstract: Techniques to normalize names for name-based speech recognition grammars are described. Some embodiments are particularly directed to techniques to normalize names for name-based speech recognition grammars more efficiently by caching, and on a per-culture basis. A technique may comprise receiving a name for normalization, during name processing for a name-based speech grammar generating process. A normalization cache may be examined to determine if the name is already in the cache in a normalized form. When the name is not already in the cache, the name may be normalized and added to the cache. When the name is in the cache, the normalization result may be retrieved and passed to the next processing step. Other embodiments are described and claimed.
    Type: Application
    Filed: January 27, 2012
    Publication date: August 1, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Mini Varkey, Bernardo Sana, Victor Boctor, Diego Carlomagno
  • Publication number: 20130179164
    Abstract: A vehicle voice interface system calibration method comprising electronically convolving voice command data with voice impulse response data, electronically convolving audio system output data with feedback impulse response data, and calibrating the vehicle voice interface system. The voice command data is electronically convolved with voice impulse response data representing a voice acoustic signal path between an artificial mouth simulator and a first microphone, to simulate a voice acoustic transfer function pertaining to the passenger compartment. The audio system output data is convolved with feedback impulse response data representing a feedback acoustic signal path between a vehicle audio system output and a second microphone, to simulate a feedback acoustic transfer function pertaining to the passenger compartment. The voice interface system is calibrated to recognize voice commands represented by the voice command data based on the simulated voice and feedback acoustic transfer functions.
    Type: Application
    Filed: January 6, 2012
    Publication date: July 11, 2013
    Applicant: Nissan North America, Inc.
    Inventor: Patrick Dennis
  • Publication number: 20130179162
    Abstract: An inventive system and method for touch free operation of a device is presented. The system can comprise a depth sensor for detecting a movement, motion software to receive the detected movement from the depth sensor, deduce a gesture based on the detected movement, and filter the gesture to accept an applicable gesture, and client software to receive the applicable gesture at a client computer for performing a task in accordance with client logic based on the applicable gesture. The client can be a mapping device and the task can be one of various mapping operations. The system can also comprise hardware for making the detected movement an applicable gesture. The system can also comprise voice recognition providing voice input for enabling the client to perform the task based on the voice input in conjunction with the applicable gesture. The applicable gesture can be a movement authorized using facial recognition.
    Type: Application
    Filed: January 11, 2012
    Publication date: July 11, 2013
    Applicant: BIOSENSE WEBSTER (ISRAEL), LTD.
    Inventors: Asaf Merschon, Assaf Govari, Andres Claudio Altmann, Yitzhack Schwartz
  • Patent number: 8483540
    Abstract: A method, apparatus and system for synchronizing between two recording modes includes identifying a common event in the two recording modes. The event in time is recognized for a higher accuracy mode of the two modes. The event is predicted in a lower accuracy mode of the two modes by determining a time when the event occurred between frames in the lower accuracy mode. The event in the higher accuracy mode is synchronized to the lower accuracy mode to provide sub-frame accuracy alignment between the two modes. In one embodiment of the invention, the common event includes the closing of a clap slate, and the two modes include audio and video recording modes.
    Type: Grant
    Filed: December 12, 2006
    Date of Patent: July 9, 2013
    Assignee: Thomson Licensing
    Inventors: Ingo Doser, Ana Belen Benitez, Dong-Qing Zhang
  • Publication number: 20130173264
    Abstract: An apparatus for utilizing textual data and acoustic data corresponding to speech data to detect sentiment may include a processor and memory storing executable computer code causing the apparatus to at least perform operations including evaluating textual data and acoustic data corresponding to voice data associated with captured speech content. The computer program code may further cause the apparatus to analyze the textual data and the acoustic data to detect whether the textual data or the acoustic data includes one or more words indicating at least one sentiment of a user that spoke the speech content. The computer program code may further cause the apparatus to assign at least one predefined sentiment to at least one of the words in response to detecting that the word(s) indicates the sentiment of the user. Corresponding methods and computer program products are also provided.
    Type: Application
    Filed: January 3, 2012
    Publication date: July 4, 2013
    Applicant: NOKIA CORPORATION
    Inventors: Imre Attila Kiss, Joseph Polifroni, Francois Mairesse, Mark Adler
  • Publication number: 20130173701
    Abstract: A system, method, and computer-readable medium, is described that implements a domain name registration suggestion tool that receives one or more inputs, extracts information from the inputs into a submission string, submits the submission string to a domain name suggestion tool, and receives domain name suggestions based on the submission string. Inputs types may include images, audio clips, and metadata. The inputs sources may be processed to extract information related to the image source to build the submission string.
    Type: Application
    Filed: December 30, 2011
    Publication date: July 4, 2013
    Inventors: Neel Goyal, Vincent Raemy, Harshini Ramnath Krishnan
  • Publication number: 20130173268
    Abstract: A method for verifying that a person is registered to use a telemedical device includes identifying an unprompted trigger phrase in words spoken by a person and received by the telemedical device. The telemedical device prompts the person to state a name of a registered user and optionally prompts the person to state health tips for the person. The telemedical device verifies that the person is the registered user using utterance data generated from the unprompted trigger phrase, name of the registered user, and health tips.
    Type: Application
    Filed: December 29, 2011
    Publication date: July 4, 2013
    Applicant: Robert Bosch GmbH
    Inventors: Fuliang Weng, Taufiq Hasan, Zhe Feng
  • Publication number: 20130173269
    Abstract: An apparatus for generating a review based in part on detected sentiment may include a processor and memory storing executable computer code causing the apparatus to at least perform operations including determining a location(s) of the apparatus and a time(s) that the location(s) was determined responsive to capturing voice data of speech content associated with spoken reviews of entities. The computer program code may further cause the apparatus to analyze textual and acoustic data corresponding to the voice data to detect whether the textual or acoustic data includes words indicating a sentiment(s) of a user speaking the speech content. The computer program code may further cause the apparatus to generate a review of an entity corresponding to a spoken review(s) based on assigning a predefined sentiment to a word(s) responsive to detecting that the word indicates the sentiment of the user. Corresponding methods and computer program products are also provided.
    Type: Application
    Filed: January 3, 2012
    Publication date: July 4, 2013
    Applicant: NOKIA CORPORATION
    Inventors: Mark Adler, Imre Attila Kiss, Francois Mairesse, Joseph Polifroni
  • Publication number: 20130158997
    Abstract: Embodiments of the present invention include an apparatus, method, and system for acoustic modeling. In an embodiment, a speech recognition system is provided. The system includes a processing unit configured to divide a received audio signal into consecutive frames having respective frame vectors, an acoustic processing unit (APU), a data bus that couples the processing unit and the APU. The APU includes a local, non-volatile memory that stores a plurality of senones, a memory buffer coupled to the memory, the acoustic processing unit being configured to load at least one Gaussian probability distribution vector stored in the memory into the memory buffer, and a scoring unit configured to simultaneously compare a plurality of dimensions of a Gaussian probability distribution vector loaded into the memory buffer with respective dimensions of a frame vector received from the processing unit and to output a corresponding score to the processing unit.
    Type: Application
    Filed: June 6, 2012
    Publication date: June 20, 2013
    Applicant: Spansion LLC
    Inventors: Venkataraman Natarajan, Stephan Rosner
  • Publication number: 20130158977
    Abstract: Systems and methods are provided for detecting and analyzing speech spoken in the vicinity of a user. The detected speech may be analyzed to determine the quality, volume, complexity, language, and other attributes. A value metric may be calculated for the received speech, such as to inform parents of a child's progress related to learning to speak, or to provide feedback to a foreign language learner. A corresponding device may display the number of words, the value metric, or other information about speech received by the device.
    Type: Application
    Filed: June 14, 2011
    Publication date: June 20, 2013
    Inventor: Andrew Senior
  • Publication number: 20130158996
    Abstract: Embodiments of the present invention include an apparatus, method, and system for acoustic modeling. The apparatus can include a senone scoring unit (SSU) control module, a distance calculator, and an addition module. The SSU control module can be configured to receive a feature vector. The distance calculator can be configured to receive a plurality of Gaussian probability distributions via a data bus having a width of at least one Gaussian probability distribution and the feature vector from the SSU control module. The distance calculator can include a plurality of arithmetic logic units to calculate a plurality of dimension distance scores and an accumulator to sum the dimension distance scores to generate a Gaussian distance score. Further, the addition module is configured to sum a plurality of Gaussian distance scores to generate a senone score.
    Type: Application
    Filed: June 6, 2012
    Publication date: June 20, 2013
    Applicant: Spansion LLC
    Inventors: Richard Fastow, Jens Olson
  • Publication number: 20130144616
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for processing speech. A system configured to practice the method monitors user utterances to generate a conversation context. Then the system receives a current user utterance independent of non-natural language input intended to trigger speech processing. The system compares the current user utterance to the conversation context to generate a context similarity score, and if the context similarity score is above a threshold, incorporates the current user utterance into the conversation context. If the context similarity score is below the threshold, the system discards the current user utterance. The system can compare the current user utterance to the conversation context based on an n-gram distribution, a perplexity score, and a perplexity threshold. Alternately, the system can use a task model to compare the current user utterance to the conversation context.
    Type: Application
    Filed: December 6, 2011
    Publication date: June 6, 2013
    Applicant: AT&T Intellectual Property I, L.P.
    Inventor: Srinivas BANGALORE
  • Publication number: 20130144627
    Abstract: A control circuit employed in an electronic device includes a microphone, a level conversion circuit, and a voice processing circuit. The voice processing circuit includes a voice operated switch connected between the microphone and the level conversion circuit. The microphone picks up voice commands, the voice operated switch receives the voice commands from the microphone, and outputs a high voltage signal when a volume of the voice commands is greater than or equal to a predetermined volume threshold or is within a predetermined volume range, the level conversion circuit converts the high voltage signal into a low voltage signal for turning on the electronic device.
    Type: Application
    Filed: March 9, 2012
    Publication date: June 6, 2013
    Applicants: HON HAI PRECISION INDUSTRY CO., LTD., HONG FU JIN PRECISION INDUSTRY (ShenZhen) CO., LTD.
    Inventor: JIE LI
  • Publication number: 20130144623
    Abstract: Techniques for ability enhancement are described. Some embodiments provide an ability enhancement facilitator system (“AEFS”) configured to determine and present speaker-related information based on speaker utterances. In one embodiment, the AEFS receives data that represents an utterance of a speaker received by a hearing device of the user, such as a hearing aid, smart phone, media player/device, or the like. The AEFS identifies the speaker based on the received data, such as by performing speaker recognition. The AEFS determines speaker-related information associated with the identified speaker, such as by determining an identifier (e.g., name or title) of the speaker, by locating an information item (e.g., an email message, document) associated with the speaker, or the like. The AEFS then informs the user of the speaker-related information, such as by presenting the speaker-related information on a display of the hearing device or some other device accessible to the user.
    Type: Application
    Filed: December 13, 2011
    Publication date: June 6, 2013
    Inventors: Richard T. Lord, Robert W. Lord, Nathan P. Myhrvold, Clarence T. Tegreene, Roderick A. Hyde, Lowell L. Wood, JR., Muriel Y. Ishikawa, Victoria Y.H. Wood, Charles Whitmer, Paramvir Bahl, Douglas C. Burger, Ranveer Chandra, William H. Gates, III, Paul Holman, Jordin T. Kare, Craig J. Mundie, Tim Paek, Desney S. Tan, Lin Zhong, Matthew G. Dyor