Speech Recognition (epo) Patents (Class 704/E15.001)

  • Publication number: 20110238419
    Abstract: A binaural configuration and an associated method have/utilize first and second hearing devices for the voice control of the hearing devices by voice commands. The configuration contains a first voice recognition module in the first hearing device and a second voice recognition module in the second hearing device. The second voice recognition module uses information data from the first voice recognition module for recognition of the voice commands. It is here advantageous that the rate of erroneously recognized voice commands (“false alarms”) is reduced.
    Type: Application
    Filed: March 24, 2011
    Publication date: September 29, 2011
    Applicant: SIEMENS MEDICAL INSTRUMENTS PTE. LTD.
    Inventor: Roland Barthel
  • Publication number: 20110230229
    Abstract: Techniques for organizing information in a user-interactive system based on user interest are provided. In one aspect, a method for operating a system having a plurality of resources through which a user can navigate is provided. The method includes the following steps. When the user accesses the system, the resources are presented to the user in a particular order. Interests of the user in the resources presented are determined. The interests of the user are compared to interests of other users to find one or more subsets of users to which the user belongs by virtue of having similar interests. Upon one or more subsequent accesses to the system by the user, the order in which the resources are presented to the user is based on interests common to the one or more subsets of users to which the user belongs.
    Type: Application
    Filed: March 20, 2010
    Publication date: September 22, 2011
    Applicant: International Business Machines Corporation
    Inventors: Rajarshi Das, Robert George Farrell, Nitendra Rajput
  • Publication number: 20110231188
    Abstract: The system and method described herein may provide an acoustic grammar to dynamically sharpen speech interpretation. In particular, the acoustic grammar may be used to map one or more phonemes identified in a user verbalization to one or more syllables or words, wherein the acoustic grammar may have one or more linking elements to reduce a search space associated with mapping the phonemes to the syllables or words. As such, the acoustic grammar may be used to generate one or more preliminary interpretations associated with the verbalization, wherein one or more post-processing techniques may then be used to sharpen accuracy associated with the preliminary interpretations. For example, a heuristic model may assign weights to the preliminary interpretations based on context, user profiles, or other knowledge and a probable interpretation may be identified based on confidence scores associated with one or more candidate interpretations generated with the heuristic model.
    Type: Application
    Filed: June 1, 2011
    Publication date: September 22, 2011
    Applicant: VoiceBox Technologies, Inc.
    Inventors: Robert A. Kennewick, Min Ke, Michael Tjalve, Philippe Di Cristo
  • Publication number: 20110224987
    Abstract: A method for identifying end of voiced speech within an audio stream of a noisy environment employs a speech discriminator. The discriminator analyzes each window of the audio stream, producing an output corresponding to the window. The output is used to classify the window in one of several classes, for example, (1) speech, (2) silence, or (3) noise. A state machine processes the window classifications, incrementing counters as each window is classified: speech counter for speech windows, silence counter for silence, and noise counter for noise. If the speech counter indicates a predefined number of windows, the state machine clears all counters. Otherwise, the state machine appropriately weights the values in the silence and noise counters, adds the weighted values, and compares the sum to a limit imposed on the number of non-voice windows. When the non-voice limit is reached, the state machine terminates processing of the audio stream.
    Type: Application
    Filed: June 3, 2010
    Publication date: September 15, 2011
    Applicant: Applied Voice & Speech Technologies, Inc.
    Inventor: Karl Daniel Gierach
  • Publication number: 20110224978
    Abstract: An information processing device includes an audio-based speech recognition processing unit which is input with audio information as observation information of a real space, executes an audio-based speech recognition process, thereby generating word information that is determined to have a high probability of being spoken, an image-based speech recognition processing unit which is input with image information as observation information of the real space, analyzes mouth movements of each user included in the input image, thereby generating mouth movement information, an audio-image-combined speech recognition score calculating unit which is input with the word information and the mouth movement information, executes a score setting process in which a mouth movement close to the word information is set with a high score, thereby executing a score setting process, and an information integration processing unit which is input with the score and executes a speaker specification process.
    Type: Application
    Filed: March 1, 2011
    Publication date: September 15, 2011
    Inventor: Tsutomu SAWADA
  • Publication number: 20110224980
    Abstract: A speech recognition system according to the present invention includes a sound source separating section which separates mixed speeches from multiple sound sources from one another; a mask generating section which generates a soft mask which can take continuous values between 0 and 1 for each frequency spectral component of a separated speech signal using distributions of speech signal and noise against separation reliability of the separated speech signal; and a speech recognizing section which recognizes speeches separated by the sound source separating section using soft masks generated by the mask generating section.
    Type: Application
    Filed: March 10, 2011
    Publication date: September 15, 2011
    Applicant: HONDA MOTOR CO., LTD.
    Inventors: Kazuhiro Nakadai, Toru Takahashi, Hiroshi Okuno
  • Publication number: 20110224967
    Abstract: Method and apparatus for capturing a text based source image 5, 5A provided on an object 3 supported on a surface. Positioned above the object 3 is a camera 7 for capturing a view of the text based image 5, 5A. The camera 7, through lens 9 generates a focused image of at least part of the object 3 and transmits this image to a processor 11 for magnification of the image captured by the camera 7 to a size specified for display on a display device 15. In the processor 11 the magnification is effected to a rate that is controlled by the second predefined size for display 19, 19A of the font to appear on the display 15 and is independent of the first font size.
    Type: Application
    Filed: June 16, 2009
    Publication date: September 15, 2011
    Inventor: Michiel Jeroen Van Schaik
  • Publication number: 20110218806
    Abstract: Systems and methods are provided for automatically building a native phonetic lexicon for a speech-based application trained to process a native (base) language, wherein the native phonetic lexicon includes native phonetic transcriptions (base forms) for non-native (foreign) words which are automatically derived from non-native phonetic transcriptions of the non-native words.
    Type: Application
    Filed: May 18, 2011
    Publication date: September 8, 2011
    Applicant: Nuance Communications, Inc.
    Inventors: Neal Alewine, Eric Janke, Paul Sharp, Roberto Sicconi
  • Publication number: 20110218803
    Abstract: A method for assessing intelligibility of speech represented by a speech signal includes providing a speech signal and performing a feature extraction on at least one frame of the speech signal so as to obtain a feature vector for each of the at least one frame of the speech signal. The feature vector is input to a statistical machine learning model so as to obtain an estimated posterior probability of phonemes in the at least one frame as an output including a vector of phoneme posterior probabilities of different phonemes for each of the at least one frame of the speech signal. An entropy estimation is performed on the vector of phoneme posterior probabilities of the at least one frame of the speech signal so as to evaluate intelligibility of the at least one frame of the speech signal. An intelligibility measure is output for the at least one frame of the speech signal.
    Type: Application
    Filed: March 4, 2011
    Publication date: September 8, 2011
    Applicant: DEUTSCHE TELEKOM AG
    Inventors: Hamed Ketabdar, Juan-Pablo Ramirez
  • Publication number: 20110218804
    Abstract: A speech recognition method, the method involving: receiving a speech input from a known speaker of a sequence of observations; and determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model, the acoustic model having a plurality of model parameters describing probability distributions which relate a word or part thereof to an observation, the acoustic model having been trained using first training data and adapted using second training data to said speaker, the speech recognition method also determining the likelihood of a sequence of observations occurring in a given language using a language model; and combining the likelihoods determined by the acoustic model and the language model and outputting a sequence of words identified from said speech input signal, wherein said acoustic model is context based for said speaker, said context based information being contained in said model using a plurality of decision trees, wherein the structure of said d
    Type: Application
    Filed: January 26, 2011
    Publication date: September 8, 2011
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventor: Byung Ha Chun
  • Publication number: 20110218805
    Abstract: A spoken term detection apparatus includes: processing performed by a processor includes a feature extraction process extracting an acoustic feature from speech data accumulated in an accumulation part and storing an extracted acoustic feature in an acoustic feature storage, a first calculation process calculating a standard score from a similarity between an acoustic feature stored in the acoustic feature storage and an acoustic model stored in the acoustic model storage part, a second calculation process for comparing an acoustic model corresponding to an input keyword with the acoustic feature stored in the acoustic feature storage part to calculate a score of the keyword, and a retrieval process retrieving speech data including the keyword from speech data accumulated in the accumulation part based on the score of the keyword calculated by the second calculation process and the standard score stored in the standard score storage part.
    Type: Application
    Filed: March 3, 2011
    Publication date: September 8, 2011
    Applicant: FUJITSU LIMITED
    Inventors: Nobuyuki Washio, Shouji Harada
  • Publication number: 20110213614
    Abstract: A method of analysing an audio signal is disclosed. A digital representation of an audio signal is received and a first output function is generated based on a response of a physiological model to the digital representation. At least one property of the first output function may be determined. One or more values are determined for use in analysing the audio signal, based on the determined property of the first output function.
    Type: Application
    Filed: September 11, 2009
    Publication date: September 1, 2011
    Applicant: NEWSOUTH INNOVATIONS PTY LIMITED
    Inventors: Wenliang Lu, Dipanjan Sen
  • Publication number: 20110213616
    Abstract: A speech recognition system includes a natural language processing component and an automated speech recognition component distinct from each other such that uncertainty in speech recognition is isolated from uncertainty in natural language understanding, wherein the natural language processing component and an automated speech recognition component communicate corresponding weighted meta-information representative of the uncertainty.
    Type: Application
    Filed: September 23, 2010
    Publication date: September 1, 2011
    Inventors: Robert E. Williams, John E. Keane
  • Publication number: 20110208521
    Abstract: A method, system and apparatus are shown for identifying non-language speech sounds in a speech or audio signal. An audio signal is segmented and feature vectors are extracted from the segments of the audio signal. The segment is classified using a hidden Markov model (HMM) that has been trained on sequences of these feature vectors. Post-processing components can be utilized to enhance classification. An embodiment is described in which the hidden Markov model is used to classify a segment as a language speech sound or one of a variety of non-language speech sounds. Another embodiment is described in which the hidden Markov model is trained using discriminative learning.
    Type: Application
    Filed: August 13, 2009
    Publication date: August 25, 2011
    Applicant: 21CT, INC.
    Inventor: Matthew McClain
  • Publication number: 20110208519
    Abstract: A method of operation of a real-time data-pattern analysis system includes: providing a memory module, a computational unit, and an integrated data transfer module arranged within an integrated circuit die; storing a data pattern within the memory module; transferring the data pattern from the memory module to the computational unit using the integrated data transfer module; and comparing processed data to the data pattern using the computational unit.
    Type: Application
    Filed: October 7, 2009
    Publication date: August 25, 2011
    Inventor: Richard M. Fastow
  • Publication number: 20110208604
    Abstract: System and method for delivering media assets to a user of an automobile is disclosed. The system comprises a voice input device and a voice recognition device. The system further comprises a head-up display device for displaying metadata of the media assets on a windshield of the automobile. After a user's input for one or a group of media assets is received, a list of metadata of the media assets is displayed on the windshield. The system plays back the selected media asset through the use of a media delivery unit after the selection is received by the voice input device. A driving route may also be displayed on the windshield.
    Type: Application
    Filed: February 20, 2010
    Publication date: August 25, 2011
    Inventor: Yang Pan
  • Publication number: 20110208524
    Abstract: This is directed to processing voice inputs received by an electronic device. In particular, this is directed to receiving a voice input and identifying the user providing the voice input. The voice input can be processed using a subset of words from a library used to identify the words or phrases of the voice input. The particular subset can be selected such that voice inputs provided by the user are more likely to include words from the subset. The subset of the library can be selected using any suitable approach, including for example based on the user's interests and words that relate to those interests. For example, the subset can include one or more words related to media items selected by the user for storage on the electronic device, names of the user's contacts, applications or processes used by the user, or any other words relating to the user's interactions with the device.
    Type: Application
    Filed: February 25, 2010
    Publication date: August 25, 2011
    Applicant: Apple Inc.
    Inventor: Allen P. Haughay
  • Publication number: 20110208526
    Abstract: A method for variable resolution and error control in spoken language understanding (SLU) allows arranging the categories of the SLU into a hierarchy of different levels of specificity. The pre-determined hierarchy is used to identify different types of errors such as high-cost errors and low-cost errors and trade, if necessary, high cost errors for low cost errors.
    Type: Application
    Filed: May 6, 2011
    Publication date: August 25, 2011
    Inventors: Roberto PIERACCINI, Krishna Dayanidhi
  • Publication number: 20110202350
    Abstract: A system for remotely and interactively controlling visual and multimedia content displayed on and rendered by a web browser using a telephony device. In particular, the system relates to receiving a voice input (e.g., dual tone multi-frequency DTMF input, spoken input, etc.) from a telephony device (e.g., a landline, a cellular telephone, or other system with telephone functionality, etc.) via a wide-area network to an intermediary computer that is configured to control the rendering of one or more web pages (or other web data) by a standard web browser.
    Type: Application
    Filed: October 15, 2009
    Publication date: August 18, 2011
    Inventor: Troy Barnes
  • Publication number: 20110202338
    Abstract: Voice recognition technology is combined with external information sources and/or contextual information to enhance the quality of voice recognition results specifically for the use case of reading out or speaking an alphanumeric identifier. The alphanumeric identifier may be associated with a good, service, person, account, or other entity. For example, the identifier may be a vehicle license plate number.
    Type: Application
    Filed: February 14, 2011
    Publication date: August 18, 2011
    Inventor: Philip INGHELBRECHT
  • Publication number: 20110202351
    Abstract: A system includes a hands free mobile communication device. Software stored on a machine readable storage device is executed to cause the hands free mobile communication device to communicate audibly with a field operator performing field operations. The operator receives instructions regarding operations to be performed. Oral communications are received from the operator and are processed automatically to provide further instructions in response to the received oral communications.
    Type: Application
    Filed: February 16, 2010
    Publication date: August 18, 2011
    Applicant: Honeywell International Inc.
    Inventors: Tom Plocher, Emmanuel Letsu-Dake, Robert E. De Mers, Paul Derby
  • Publication number: 20110196677
    Abstract: According to one illustrative embodiment, a method is provided for analyzing an audio interaction. At least one change in an emotion of a speaker in an audio interaction and at least one aspect of the audio interaction are identified. The at least one change in an emotion is analyzed in conjunction with the at least one aspect to determine a relationship between the at least one change in an emotion and the at least one aspect, and a result of the analysis is provided.
    Type: Application
    Filed: February 11, 2010
    Publication date: August 11, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Om D. Deshmukh, Chitra Dorai, Shailesh Joshi, Maureen E. Rzasa, Ashish Verma, Karthik Visweswariah, Gary J. Wright, Sai Zeng
  • Publication number: 20110191104
    Abstract: A method for measuring a disparity between two speech samples is disclosed that may include determining upon a speech granularity level at which to compare the rhythm of a student speech sample and a reference speech sample; determining a duration disparity between a first speech unit and a second, non-adjacent speech unit in the student speech sample; determining a duration disparity between a first speech unit and a second, non-adjacent speech unit in the reference speech sample; and calculating the difference between the student speech-unit duration disparity and the reference speech-unit disparity.
    Type: Application
    Filed: January 29, 2010
    Publication date: August 4, 2011
    Applicant: Rosetta Stone, Ltd.
    Inventors: Joseph Tepperman, Theban Stanley, Kadri Hacioglu
  • Publication number: 20110184724
    Abstract: Presented is a method and system for speech recognition. The method includes determining noise level in an environment, comparing the determined noise level with a predetermined noise level threshold value, using a first set of grammar for speech recognition, if the determined noise level is below the predetermined noise level threshold value, and using a second set of grammar for speech recognition, if the determined noise level is above the predetermined noise level threshold value.
    Type: Application
    Filed: April 6, 2010
    Publication date: July 28, 2011
    Inventor: Amit RANJAN
  • Publication number: 20110184736
    Abstract: Automated methods are provided for recognizing inputted information items and selecting information items. The recognition and selection processes are performed by selecting category designations that the information items belong to. The category designations improve the accuracy and speed of the inputting and selection processes.
    Type: Application
    Filed: January 25, 2011
    Publication date: July 28, 2011
    Inventor: Benjamin SLOTZNICK
  • Publication number: 20110184737
    Abstract: A speech recognition apparatus includes a speech input unit that receives input speech, a phoneme recognition unit that recognizes phonemes of the input speech and generates a first phoneme sequence representing corrected speech, a matching unit that matches the first phoneme sequence with a second phoneme sequence representing original speech, and a phoneme correcting unit that corrects phonemes of the second phoneme sequence based on the matching result.
    Type: Application
    Filed: January 27, 2011
    Publication date: July 28, 2011
    Applicant: HONDA MOTOR CO., LTD.
    Inventors: Mikio NAKANO, Naoto IWAHASHI, Kotaro FUNAKOSHI, Taisuke SUMII
  • Publication number: 20110178801
    Abstract: A system for access to multimedia structures has telephone sets capable of connecting to a telephone network, a storage device capable of storing a plurality of multimedia structures representing messages and/or data and/or commands, and a network access server that can be associated with the telephone sets and is capable of selectively instantiating the multimedia structures via an interconnection network. There is also a voice-recognition and speech-synthesis system that can be associated with the network access server and that comprises modules for reading files in XML format and for processing the files so as to obtain files in a format that can be synthesized by a speech synthesizer.
    Type: Application
    Filed: January 14, 2011
    Publication date: July 21, 2011
    Applicants: TELECOM ITALIA S.P.A., LOQUENDO S.P.A.
    Inventors: Pierpaolo Anselmetti, Mauro Cociglio, Simone Toniolo, Diego Zanin, Nadia Zerba
  • Publication number: 20110179304
    Abstract: One example embodiment includes a method for providing multi-tenancy in a computing environment. The method includes receiving a script in a computing environment, where the script includes one or more actions to be completed by the computing environment. The method further includes providing one or more computing resources in the computing environment and building an action list for the one or more computing resources, where the action list is a data structure that contains a list of one or more actions to be executed by the one or more computing resources. The method further includes transmitting a first action to one of the one or more computing resources, where the first action is one of the one or more actions. The method further includes executing the first action in the one of the one or more computing resources and indicating to the action list the completion of the first action.
    Type: Application
    Filed: January 15, 2010
    Publication date: July 21, 2011
    Applicant: INCONTACT, INC.
    Inventor: David Owen Peterson
  • Patent number: 7983912
    Abstract: A speech recognition apparatus includes a generation unit generating a recognition candidate associated with a speech utterance and a likelihood; a storing unit storing the one recognition; a selecting unit selecting the recognition candidate as a recognition result of a first speech utterance; an utterance relation determining unit determining whether a second speech utterance which is input after the input of the first speech utterance is a speech re-utterance of a whole of the first speech utterance or a speech re-utterance of a part of the first speech utterance; a whole correcting unit correcting the recognition candidate of the whole of the first speech utterance when the second speech utterance is the whole of the first speech utterance; and a part correcting unit correcting the recognition candidate for the part of the first speech utterance when the second speech utterance is the part of the first speech utterance.
    Type: Grant
    Filed: March 15, 2006
    Date of Patent: July 19, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Hideki Hirakawa, Tetsuro Chino
  • Publication number: 20110168499
    Abstract: In a destination floor registration device 800, a boarding detection unit 804 detects a boarding of an elevator user; when no button operation by destination floor registration buttons 500 is performed after a predetermined period of time has elapsed since detection of the boarding, a voice destination floor registration unit 805 outputs from a voice output device 400 a message for prompting a passenger to pronounce a destination floor; a voice recognition unit 806 recognizes the destination floor pronounced by the passenger; and a voice destination floor registration unit 805 requests an elevator car control device 200 to register the destination floor recognized by the voice recognition unit 806.
    Type: Application
    Filed: October 3, 2008
    Publication date: July 14, 2011
    Applicant: MITSUBISHI ELECTRIC CORPORATION
    Inventor: Nobukazu Takeuchi
  • Publication number: 20110172999
    Abstract: A system, method and computer-readable medium for practicing a method of emotion detection during a natural language dialog between a human and a computing device are disclosed. The method includes receiving an utterance from a user in a natural language dialog, receiving contextual information regarding the natural language dialog which is related to changes of emotion over time in the dialog, and detecting an emotion of the user based on the received contextual information. Examples of contextual information include, for example, differential statistics, joint statistics and distance statistics.
    Type: Application
    Filed: March 21, 2011
    Publication date: July 14, 2011
    Applicant: AT&T Corp.
    Inventors: Dilek Z. Hakkani-Tur, Jackson J. Liscombe, Guiseppe Riccardi
  • Publication number: 20110173000
    Abstract: A word category estimation apparatus (100) includes a word category model (5) which is formed from a probability model having a plurality of kinds of information about a word category as features, and includes information about an entire word category graph as at least one of the features. A word category estimation unit (4) receives the word category graph of a speech recognition hypothesis to be processed, computes scores by referring to the word category model for respective arcs that form the word category graph, and outputs a word category sequence candidate based on the scores.
    Type: Application
    Filed: December 19, 2008
    Publication date: July 14, 2011
    Inventors: Hitoshi Yamamoto, Miki Kiyokazu
  • Publication number: 20110173002
    Abstract: A storage unit stores a correspondence between a voice command and a display mode modification operation. When a control unit determines that a vehicle is traveling according to a traveling state of the vehicle obtained by a traveling state acquisition unit, when a voice recognition unit recognizes a voice, which is uttered by a user and received by a voice input unit, and when the control unit determines that the recognized voice corresponds to a voice command stored in the storage unit, the control unit performs a display mode change operation corresponding to the voice command and modifies a display mode of an icon indicated on an indication screen of an indication unit.
    Type: Application
    Filed: January 10, 2011
    Publication date: July 14, 2011
    Applicant: DENSO CORPORATION
    Inventors: Masahiro FUJII, Yuji SHINKAI
  • Publication number: 20110166857
    Abstract: A human voice distinguishing method and device are provided. The method involves: taking every n sampling points of the current frame of audio signals as one subsection, wherein n is a positive integer, judging whether two adjacent subsections have transition relative to a distinguishing threshold, wherein the sliding maximum absolute value of the two adjacent subsections is more and less than the distinguishing threshold respectively, if so, then determining the current frame to be human voice, where the sliding maximum absolute value of the subsection is obtained by the following method: taking the maximum value of absolute intensity of every sampling point in this subsection as the initial maximum absolute value of this subsection, and taking the maximum value of the initial maximum absolute value of this subsection and m subsections following this subsection as the sliding maximum absolute value of this subsection, wherein m is a positive integer.
    Type: Application
    Filed: September 15, 2009
    Publication date: July 7, 2011
    Applicant: ACTIONS SEMICONDUCTOR CO. LTD.
    Inventors: Xiangyong Xie, Zhan Chen
  • Publication number: 20110166860
    Abstract: Systems and methods are disclosed to operate a mobile device by capturing user input; transmitting the user input over a wireless channel to an engine, analyzing at the engine music clip or video in a multimedia data stream and sending an analysis wirelessly to the mobile device.
    Type: Application
    Filed: July 12, 2010
    Publication date: July 7, 2011
    Inventor: Bao Q. Tran
  • Publication number: 20110166862
    Abstract: A method and system for altering an operational mode of evaluating and responding to verbal input from a user to a mobile device if conditions make such evaluation incompatible with a favorable user experience. Automated speech recognition (ASR) evaluation of verbal input may be performed on a mobile platform to continue a flow of the user experience. Evaluation of the verbal input may continue at a backend when conditions allow for transmission of recorded input to the backend.
    Type: Application
    Filed: January 4, 2011
    Publication date: July 7, 2011
    Inventors: Eyal ESHED, Ariel Velikovsky, Sherrie Ellen Shammass
  • Publication number: 20110166855
    Abstract: In one embodiment the present invention includes a method comprising receiving an acoustic input signal and processing the acoustic input signal with a plurality of acoustic recognition processes configured to recognize the same target sound. Different acoustic recognition processes start processing different segments of the acoustic input signal at different time points in the acoustic input signal. In one embodiment, initial states in the recognition processes may be configured on each time step.
    Type: Application
    Filed: July 6, 2010
    Publication date: July 7, 2011
    Applicant: SENSORY, INCORPORATED
    Inventors: Pieter J. Vermeulen, Jonathan Shaw, Todd F. Mozer
  • Publication number: 20110161076
    Abstract: A smart phone senses audio, imagery, and/or other stimulus from a user's environment, and acts autonomously to fulfill inferred or anticipated user desires. In one aspect, the detailed technology concerns phone-based cognition of a scene viewed by the phone's camera. The image processing tasks applied to the scene can be selected from among various alternatives by reference to resource costs, resource constraints, other stimulus information (e.g., audio), task substitutability, etc. The phone can apply more or less resources to an image processing task depending on how successfully the task is proceeding, or based on the user's apparent interest in the task. In some arrangements, data may be referred to the cloud for analysis, or for gleaning. Cognition, and identification of appropriate device response(s), can be aided by collateral information, such as context. A great number of other features and arrangements are also detailed.
    Type: Application
    Filed: June 9, 2010
    Publication date: June 30, 2011
    Inventors: Bruce L. Davis, Tony F. Rodriguez, William Y. Conwell, Geoffrey B. Rhoads
  • Publication number: 20110161082
    Abstract: A method for assessing a performance of a speech recognition system may include determining a grade, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, wherein the grade indicates a level of the performance of the system and the grade is based on a recognition rate and at least one recognition factor. An apparatus for assessing a performance of a speech recognition system may include a processor that determines a grade, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, wherein the grade indicates a level of the performance of the system and wherein the grade is based on a recognition rate and at least one recognition factor.
    Type: Application
    Filed: March 9, 2011
    Publication date: June 30, 2011
    Inventors: Keith Braho, Jeffrey Pike, Amro El-Jaroudi, Lori Pike, Michael Laughery
  • Publication number: 20110161075
    Abstract: A method and apparatus for implementation of real-time speech recognition using a handheld computing apparatus are provided. The handheld computing apparatus receives an audio signal, such as a user's voice. The handheld computing apparatus ultimately transmits the voice data to a remote or distal computing device with greater processing power and operating a speech recognition software application. The speech recognition software application processes the signal and outputs a set of instructions for implementation either by the computing device or the handheld apparatus. The instructions can include a variety of items including instructing the presentation of a textual representation of dictation, or a function or command to be executed by the handheld device (such as linking to a website, opening a file, cutting, pasting, saving, or other file menu type functionalities), or by the computing device itself.
    Type: Application
    Filed: December 1, 2010
    Publication date: June 30, 2011
    Inventor: Eric Hon-Anderson
  • Publication number: 20110161083
    Abstract: A method for assessing a performance of a speech recognition system may include determining a grade, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, wherein the grade indicates a level of the performance of the system and the grade is based on a recognition rate and at least one recognition factor. An apparatus for assessing a performance of a speech recognition system may include a processor that determines a grade, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, wherein the grade indicates a level of the performance of the system and wherein the grade is based on a recognition rate and at least one recognition factor.
    Type: Application
    Filed: March 9, 2011
    Publication date: June 30, 2011
    Inventors: Keith Braho, Jeffrey Pike, Amro El-Jaroudi, Lori Pike, Michael Laughery
  • Publication number: 20110161077
    Abstract: A method of and system for accurately determining a caller response by processing speech-recognition results and returning that result to a directed-dialog application for further interaction with the caller. Multiple speech-recognition engines are provided that process the caller response in parallel. Returned speech-recognition results comprising confidence-score values and word-score values from each of the speech-recognition engines may be modified based on context information provided by the directed-dialog application and grammars associated with each speech-recognition engine. An optional context database may be used to further reduce or add weight to confidence-score values and word-score values, remove phrases and/or words, and add phrases and/or words to the speech-recognition engine results. In situations where a predefined threshold-confidence-score value is not exceeded, a new dynamic grammar may be created.
    Type: Application
    Filed: December 30, 2010
    Publication date: June 30, 2011
    Inventor: Gregory J. Bielby
  • Publication number: 20110151782
    Abstract: The present invention relates to a system and a method for communication with hands-free profile. The method includes the steps of: providing illumination devices which microphones and speakers are respectively disposed on; providing a detection mechanism for detecting the position of a specific illumination device corresponding to a position of a user; when a communication device receives an incoming telegram signal: sending a preset ring by the speakers; and transmitting the sound transmitted from the communication device to the speaker of the specific illumination device and transmitting the sound received from the microphone of the specific illumination device to the communication device so that the communication is performed.
    Type: Application
    Filed: August 18, 2010
    Publication date: June 23, 2011
    Inventor: Chun-I SUN
  • Publication number: 20110153323
    Abstract: A method and system is provided that controls an external output function of a mobile device according to control interactions received via the microphone. The method includes, activating a microphone according to preset optional information when the mobile device enters an external output mode, performing an external output operation in the external output mode, detecting an interaction based on sound information in the external output mode, and controlling the external output according to the interaction.
    Type: Application
    Filed: December 10, 2010
    Publication date: June 23, 2011
    Applicant: SAMSUNG ELECTRONICS CO. LTD.
    Inventors: Hee Woon KIM, Si Hak JANG
  • Publication number: 20110153328
    Abstract: Provided is an obscene content analysis apparatus and method. The obscene content analysis apparatus includes a content input unit that receives content, an input data buffering unit that buffers the received content, wherein buffering is performed on content corresponding to a length of a previously set analysis section or a length longer than the analysis section, an obscenity analysis determining unit that determines whether or not the analysis section of audio data extracted from the buffered content is obscene by using a previously generated audio-based obscenity determining model and marks the analysis section with an obscenity mark when the analysis section is determined as obscene, a reproduction data buffering unit that accumulates and stores content in which obscenity has been determined by the obscenity analysis determining unit, and a content reproducing unit that reproduces the content while blocking the analysis section marked with the obscenity mark.
    Type: Application
    Filed: November 17, 2010
    Publication date: June 23, 2011
    Applicant: Electronics and Telecommunications Research Institute
    Inventors: Jae Deok LIM, Seung Wan HAN, Byeong Cheol CHOI, Byung Ho CHUNG
  • Publication number: 20110153321
    Abstract: Systems and methods for detecting features in spoken speech and processing speech sounds based on the features are provided. One or more features may be identified in a speech sound. The speech sound may be modified to enhance or reduce the degree to which the feature affects the sound ultimately heard by a listener. Systems and methods according to embodiments of the invention may allow for automatic speech recognition devices that enhance detection and recognition of spoken sounds, such as by a user of a hearing aid or other device.
    Type: Application
    Filed: July 2, 2009
    Publication date: June 23, 2011
    Applicant: THE BOARD OF TRUSTEES OF THE UNIVERSITY OF ILLINOI
    Inventors: Jont B. Allen, Feipeng LI
  • Publication number: 20110153326
    Abstract: A system and method for extracting acoustic features and speech activity on a device and transmitting them in a distributed voice recognition system. The distributed voice recognition system includes a local VR engine in a subscriber unit and a server VR engine on a server. The local VR engine comprises a feature extraction (FE) module that extracts features from a speech signal, and a voice activity detection module (VAD) that detects voice activity within a speech signal. The system includes filters, framing and windowing modules, power spectrum analyzers, a neural network, a nonlinear element, and other components to selectively provide an advanced front end vector including predetermined portions of the voice activity detection indication and extracted features from the subscriber unit to the server. The system also includes a module to generate additional feature vectors on the server from the received features using a feed-forward multilayer perceptron (MLP) and providing the same to the speech server.
    Type: Application
    Filed: February 9, 2011
    Publication date: June 23, 2011
    Applicant: QUALCOMM INCORPORATED
    Inventors: HARINATH GARUDADRI, HYNEK HERMANSKY, LUKAS BURGET, PRATIBHA JAIN, SACHIN KAJAREKAR, SUNIL SIVADAS, STEPHANE N. DUPONT, MARIA CARMEN BENITEZ ORTUZAR, NELSON H. MORGAN
  • Publication number: 20110153322
    Abstract: A dialog management apparatus and method for processing an information-seeking dialogue with a user and providing a service to the user by prompting the user for a task-oriented dialogue may be provided. A hierarchical topic plan in which pieces of information are organized in a hierarchy according to topics corresponding to services may be used to prompt the user to change an information-seeking dialogue to a task-oriented dialogue, and the user may be provided with a service.
    Type: Application
    Filed: October 26, 2010
    Publication date: June 23, 2011
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Byung-Kwan KWAK, Jeong-Mi Cho
  • Publication number: 20110144978
    Abstract: A system and method for providing vocabulary information includes one or more computer processors that, for each of a plurality of words of a text, determine a relevance of the word to the text, and, for each of at least a subset of the plurality of words, output an indication of the respective determined relevance of the word to the text, where, for each of the plurality of words, the determination includes comparing a frequency of the word in the text to a frequency threshold.
    Type: Application
    Filed: December 15, 2009
    Publication date: June 16, 2011
    Inventor: Marc TINKLER
  • Publication number: 20110144995
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for performing a search. A system configured to practice the method first receives from an automatic speech recognition (ASR) system a word lattice based on speech query and receives indexed documents from an information repository. The system composes, based on the word lattice and the indexed documents, at least one triple including a query word, selected indexed document, and weight. The system generates an N-best path through the word lattice based on the at least one triple and re-ranks ASR output based on the N-best path. The system aggregates each weight across the query words to generate N-best listings and returns search results to the speech query based on the re-ranked ASR output and the N-best listings. The lattice can be a confusion network, the arc density of which can be adjusted for a desired performance level.
    Type: Application
    Filed: December 15, 2009
    Publication date: June 16, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Srinivas BANGALORE, Taniya MISHRA