Speech Recognition (epo) Patents (Class 704/E15.001)

  • Publication number: 20130144627
    Abstract: A control circuit employed in an electronic device includes a microphone, a level conversion circuit, and a voice processing circuit. The voice processing circuit includes a voice operated switch connected between the microphone and the level conversion circuit. The microphone picks up voice commands, the voice operated switch receives the voice commands from the microphone, and outputs a high voltage signal when a volume of the voice commands is greater than or equal to a predetermined volume threshold or is within a predetermined volume range, the level conversion circuit converts the high voltage signal into a low voltage signal for turning on the electronic device.
    Type: Application
    Filed: March 9, 2012
    Publication date: June 6, 2013
    Applicants: HON HAI PRECISION INDUSTRY CO., LTD., HONG FU JIN PRECISION INDUSTRY (ShenZhen) CO., LTD.
    Inventor: JIE LI
  • Publication number: 20130144623
    Abstract: Techniques for ability enhancement are described. Some embodiments provide an ability enhancement facilitator system (“AEFS”) configured to determine and present speaker-related information based on speaker utterances. In one embodiment, the AEFS receives data that represents an utterance of a speaker received by a hearing device of the user, such as a hearing aid, smart phone, media player/device, or the like. The AEFS identifies the speaker based on the received data, such as by performing speaker recognition. The AEFS determines speaker-related information associated with the identified speaker, such as by determining an identifier (e.g., name or title) of the speaker, by locating an information item (e.g., an email message, document) associated with the speaker, or the like. The AEFS then informs the user of the speaker-related information, such as by presenting the speaker-related information on a display of the hearing device or some other device accessible to the user.
    Type: Application
    Filed: December 13, 2011
    Publication date: June 6, 2013
    Inventors: Richard T. Lord, Robert W. Lord, Nathan P. Myhrvold, Clarence T. Tegreene, Roderick A. Hyde, Lowell L. Wood, JR., Muriel Y. Ishikawa, Victoria Y.H. Wood, Charles Whitmer, Paramvir Bahl, Douglas C. Burger, Ranveer Chandra, William H. Gates, III, Paul Holman, Jordin T. Kare, Craig J. Mundie, Tim Paek, Desney S. Tan, Lin Zhong, Matthew G. Dyor
  • Publication number: 20130138439
    Abstract: An interactive user interface is described for setting confidence score thresholds in a language processing system. There is a display of a first system confidence score curve characterizing system recognition performance associated with a high confidence threshold, a first user control for adjusting the high confidence threshold and an associated visual display highlighting a point on the first system confidence score curve representing the selected high confidence threshold, a display of a second system confidence score curve characterizing system recognition performance associated with a low confidence threshold, and a second user control for adjusting the low confidence threshold and an associated visual display highlighting a point on the second system confidence score curve representing the selected low confidence threshold. The operation of the second user control is constrained to require that the low confidence threshold must be less than or equal to the high confidence threshold.
    Type: Application
    Filed: November 29, 2011
    Publication date: May 30, 2013
    Applicant: Nuance Communications, Inc.
    Inventors: Jeffrey N. Marcus, Amy E. Ulug, William Bridges Smith, JR.
  • Publication number: 20130132082
    Abstract: Methods and systems for recognition of concurrent, superimposed, or otherwise overlapping signals are described. A Markov Selection Model is introduced that, together with probabilistic decomposition methods, enable recognition of simultaneously emitted signals from various sources. For example, a signal mixture may include overlapping speech from different persons. In some instances, recognition may be performed without the need to separate signals or sources. As such, some of the techniques described herein may be useful in automatic transcription, noise reduction, teaching, electronic games, audio search and retrieval, medical and scientific applications, etc.
    Type: Application
    Filed: February 21, 2011
    Publication date: May 23, 2013
    Inventor: Paris Smaragdis
  • Publication number: 20130124193
    Abstract: One embodiment includes a computer implemented method of processing documents. The method includes generating a text analysis task object that includes instructions regarding a document processing pipeline and a document identifier. The method further includes accessing, by a worker system, the text analysis task object and generating the document processing pipeline according to the instructions. The method further includes performing text analysis using the document processing pipeline on a document identified by the document identifier.
    Type: Application
    Filed: November 15, 2011
    Publication date: May 16, 2013
    Applicant: BUSINESS OBJECTS SOFTWARE LIMITED
    Inventor: Greg Holmberg
  • Publication number: 20130110511
    Abstract: A method for customized voice communication comprising receiving a speech signal, retrieving a user account including an user profile corresponding to an identifier of a caller producing the speech signal, and determining if the user profile includes a speech profile with at least one dialect. If the user profile includes a speech profile, the method further comprises analyzing using a speech analyzer on the speech signal to classify the speech signal into a classified dialect, comparing the classified dialect with each of the dialects in the user profiles to select one of the dialects, and using the selected dialect for subsequent voice communication with the user. The selected dialect can be used for subsequent recognition and response speech synthesis. Moreover, a method is described for storing a user's own pronunciation of names and addresses, whereby a user may be greeted by the communication device using their own specific pronunciation.
    Type: Application
    Filed: October 31, 2011
    Publication date: May 2, 2013
    Applicant: TELCORDIA TECHNOLOGIES, INC.
    Inventors: Murray Spiegel, John R. Wullert, II
  • Publication number: 20130103402
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for combining frame and segment level processing, via temporal pooling, for phonetic classification. A frame processor unit receives an input and extracts the time-dependent features from the input. A plurality of pooling interface units generates a plurality of feature vectors based on pooling the time-dependent features and selecting a plurality of time-dependent features according to a plurality of selection strategies. Next, a plurality of segmental classification units generates scores for the feature vectors. Each segmental classification unit (SCU) can be dedicated to a specific pooling interface unit (PIU) to form a PIU-SCU combination. Multiple PIU-SCU combinations can be further combined to form an ensemble of combinations, and the ensemble can be diversified by varying the pooling operations used by the PIU-SCU combinations.
    Type: Application
    Filed: October 25, 2011
    Publication date: April 25, 2013
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Sumit CHOPRA, Dimitrios Dimitriadis, Patrick Haffner
  • Publication number: 20130090930
    Abstract: Various embodiments provide techniques for implementing speech recognition for context switching In at least some embodiments, the techniques can enable a user to switch between different contexts and/or user interfaces of an application via speech commands. In at least some embodiments, a context menu is provided that lists available contexts for an application that may be navigated to via speech commands. In implementations, the contexts presented in the context menu include a subset of a larger set of contexts that are filtered based on a variety of context filtering criteria. A user can speak one of the contexts presented in the context menu to cause a navigation to a user interface associated with one of the contexts.
    Type: Application
    Filed: October 10, 2011
    Publication date: April 11, 2013
    Inventors: Matthew J. Monson, William P. Giese, Daniel J. Greenawalt
  • Publication number: 20130080159
    Abstract: This disclosure relates to systems and methods for proactively determining identification information for a plurality of audio segments within a plurality of broadcast media streams, and providing identification information associated with specific audio portions of a broadcast media stream automatically or upon request.
    Type: Application
    Filed: September 27, 2011
    Publication date: March 28, 2013
    Applicant: GOOGLE INC.
    Inventors: Matthew Sharifi, Ant Oztaskent, Yaroslav Volovich
  • Publication number: 20130080165
    Abstract: Online histogram recognition may be provided. Upon receiving a spoken phrase from a user, a histogram/frequency distribution may be estimated on the spoken phrase according to a prior distribution. The histogram distribution may be equalized and then provided to a spoken language understanding application.
    Type: Application
    Filed: September 24, 2011
    Publication date: March 28, 2013
    Applicant: Microsoft Corporation
    Inventors: Shizen Wang, Yifan Gong
  • Publication number: 20130080169
    Abstract: An audio analysis system includes a terminal apparatus and a host system. The terminal apparatus acquires an audio signal of a sound containing utterances of a user and another person, discriminates between portions of the audio signal corresponding to the utterances of the user and the other person, detects an utterance feature based on the portion corresponding to the utterance of the user or the other person, and transmits utterance information including the discrimination and detection results to the host system. The host system detects a part corresponding to a conversation from the received utterance information, detects portions of the part of the utterance information corresponding to the user and the other person, compares a combination of plural utterance features corresponding to the portions of the part of the utterance information of the user and the other person with relation information to estimate an emotion, and outputs estimation information.
    Type: Application
    Filed: February 10, 2012
    Publication date: March 28, 2013
    Applicant: FUJI XEROX Co., Ltd.
    Inventors: Haruo HARADA, Hirohito YONEYAMA, Kei SHIMOTANI, Yohei NISHINO, Kiyoshi IIDA, Takao NAITO
  • Publication number: 20130080171
    Abstract: In one embodiment, a method receives an acoustic input signal at a speech recognizer configured to recognize the acoustic input signal in an always on mode. A set of responses based on the recognized acoustic input signal is determined and ranked based on criteria. A computing device determines if the response should be output based on a ranking of the response. The method determines an output method in a plurality of output methods based on the ranking of the response and outputs the response using the output method if it is determined the response should be output.
    Type: Application
    Filed: September 27, 2011
    Publication date: March 28, 2013
    Applicant: SENSORY, INCORPORATED
    Inventors: Todd F. Mozer, Pieter J. Verneulen
  • Publication number: 20130080161
    Abstract: According to one embodiment, a speech recognition apparatus includes following units. The service estimation unit estimates a service being performed by a user, by using non-speech information, and to generate service information. The speech recognition unit performs speech recognition on speech information in accordance with a speech recognition technique corresponding to the service information. The feature quantity extraction unit extracts a feature quantity related to the service of the user, from the speech recognition result. The service estimation unit re-estimates the service by using the feature quantity. The speech recognition unit performs speech recognition based on the re-estimation result.
    Type: Application
    Filed: September 27, 2012
    Publication date: March 28, 2013
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Kenji IWATA, Kentaro TORRI, Naoshi UCHIHIRA, Tetsuro CHINO
  • Publication number: 20130066629
    Abstract: The present invention relates to means and methods of classifying speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music.
    Type: Application
    Filed: November 12, 2012
    Publication date: March 14, 2013
    Inventor: Alon Konchitsky
  • Publication number: 20130066637
    Abstract: Information processor 1 includes display unit 30 for displaying an interface screen having function execution key unit 23 indicating a prescribed function for each function type, and interface screen change key unit 22 for switching each function type; interface screen control unit 20 for controlling display switching of the screen on the display unit 30 in response to an input operation signal; interface screen operation history recording unit 110 for recording, as continuous operation information, operation time and operation contents of the function execution key unit 23 or interface screen change key unit 22 in response to the input operation signal; likelihood value providing unit 120 for calculating and adding, to each function the function execution key unit 23 indicates, a likelihood value indicating a degree of a user desire from the continuous operation information recorded; priority recognition word setting unit 130 for outputting word information corresponding to the function whose likelihood valu
    Type: Application
    Filed: August 9, 2010
    Publication date: March 14, 2013
    Applicant: Mitsubishi Electric Corporation
    Inventors: Yusuke Seto, Tadashi Suzuki, Ryo Iwamiya
  • Publication number: 20130060571
    Abstract: A system for integrating local speech recognition with cloud-based speech recognition in order to provide an efficient natural user interface is described. In some embodiments, a computing device determines a direction associated with a particular person within an environment and generates an audio recording associated with the direction. The computing device then performs local speech recognition on the audio recording in order to detect a first utterance spoken by the particular person and to detect one or more keywords within the first utterance. The first utterance may be detected by applying voice activity detection techniques to the audio recording. The first utterance and the one or more keywords are subsequently transferred to a server which may identify speech sounds within the first utterance associated with the one or more keywords and adapt one or more speech recognition techniques based on the identified speech sounds.
    Type: Application
    Filed: September 2, 2011
    Publication date: March 7, 2013
    Applicant: Microsoft Corporation
    Inventors: Thomas M. Soemo, Leo Soong, Michael H. Kim, Chad R. Heinemann, Dax H. Hawkins
  • Publication number: 20130060566
    Abstract: This invention realizes a speech communication system and method, and a robot apparatus capable of significantly improving entertainment property. A speech communication system with a function to make conversation with a conversation partner is provided with a speech recognition means for recognizing speech of the conversation partner, a conversation control means for controlling conversation with the conversation partner based on the recognition result of the speech recognition means, an image recognition means for recognizing the face of the conversation partner, and a tracking control means for tracing the existence of the conversation partner based on one or both of the recognition result of the image recognition means and the recognition result of the speech recognition means. The conversation control means controls conversation so as to continue depending on tracking of the tracking control means.
    Type: Application
    Filed: November 2, 2012
    Publication date: March 7, 2013
    Inventors: Kazumi AOYAMA, Hideki Shimomura
  • Publication number: 20130060570
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for advanced turn-taking in an interactive spoken dialog system. A system configured according to this disclosure can incrementally process speech prior to completion of the speech utterance, and can communicate partial speech recognition results upon finding particular conditions. A first condition which, if found, allows the system to communicate partial speech recognition results, is that the most recent word found in the partial results is statistically likely to be the termination of the utterance, also known as a terminal node. A second condition is the determination that all search paths within a speech lattice converge to a common node, also known as a pinch node, before branching out again. Upon finding either condition, the system can communicate the partial speech recognition results. Stability and correctness probabilities can also determine which partial results are communicated.
    Type: Application
    Filed: September 1, 2011
    Publication date: March 7, 2013
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Jason WILLIAMS, Ethan Selfridge
  • Publication number: 20130060569
    Abstract: A voice authentication system using a removable voice ID card comprises: at server side, a voiceprint database for storing the voiceprints of all authorized users; a voiceprint updating means for updating the voiceprints in said voiceprint database; and a voiceprint digest generator for generating a voiceprint digest according to a request from a client; at client side, a voice ID card for storing the voiceprint of an authorized user; a validation means for validating the voiceprint in the voice ID card on the basis of the voiceprint digest from the server; an audio device for performing voice interaction with a user; and a voice authentication means for determining whether the voiceprint from said voice ID card is of the same speaker as the voice from said audio device.
    Type: Application
    Filed: March 2, 2012
    Publication date: March 7, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Guo Kang Fu, Feng F. Shi, Zhi Jun Wang, Yu Chen Zhou
  • Publication number: 20130054235
    Abstract: Embodiments of the present invention improve content manipulation systems and methods using speech recognition. In one embodiment, the present invention includes a method comprising configuring a recognizer to recognize utterances in the presence of a background audio signal having particular audio characteristics. A composite signal comprising a first audio signal and a spoken utterance of a user is received by the recognizer, where the first audio signal comprises the particular audio characteristics used to configure the recognizer so that the recognizer is desensitized to the first audio signal. The spoke utterance is recognized in the presence of the first audio signal when the spoken utterance is one of the predetermined utterances. An operation is performed on the first audio signal.
    Type: Application
    Filed: August 24, 2011
    Publication date: February 28, 2013
    Applicant: Sensory, Incorporated
    Inventors: Todd F. Mozer, Jeff Rogers, Pieter J. Vermeulen, Jonathan Shaw
  • Publication number: 20130054243
    Abstract: Provided is an electronic device and control method, wherein a simple interface upon utilizing voice recognition can be attained. A cellular phone (1) is provided with a voice recognition unit (30), an execution unit (40) that executes a prescribed application, and an OS (50) that controls the voice recognition unit (30) and the executing unit (40). The executing unit (40) will make an assessment, upon receiving an instruction from the OS (50) to start up the prescribed application, of whether the start-up instruction was based on a result of voice recognition conducted by the voice recognition unit (30) or not, and will select the content to be processed according to the result of this assessment.
    Type: Application
    Filed: September 28, 2010
    Publication date: February 28, 2013
    Applicant: KYOCERA Corporation
    Inventor: Hajime Ichikawa
  • Publication number: 20130054242
    Abstract: Embodiments of the present invention improve methods of performing speech recognition. In one embodiment, the present invention includes a method comprising receiving a spoken utterance, processing the spoken utterance in a speech recognizer to generate a recognition result, determining consistencies of one or more parameters of component sounds of the spoken utterance, wherein the parameters are selected from the group consisting of duration, energy, and pitch, and wherein each component sound of the spoken utterance has a corresponding value of said parameter, and validating the recognition result based on the consistency of at least one of said parameters.
    Type: Application
    Filed: August 24, 2011
    Publication date: February 28, 2013
    Applicant: SENSORY, INCORPORATED
    Inventors: Jonathan Shaw, Pieter Vermeulen, Stephen Sutton, Robert Savoie
  • Publication number: 20130041665
    Abstract: There are disclosed an electronic device and a method of controlling the electronic device. The electronic device according to an aspect of the present invention includes a display unit, a voice input unit, and a control unit configured to output a plurality of contents through the electronic device, receive a voice command through the voice input unit for performing a command, determine which of the plurality of contents correspond to the received voice command, and perform the command on one or more of the plurality of contents that correspond to the received voice command. According to the present invention, multi-tasking performed in an electronic device can be efficiently controlled through a voice command.
    Type: Application
    Filed: September 23, 2011
    Publication date: February 14, 2013
    Inventors: Seokbok Jang, Jongse Park, Joonyup Lee, Jungkyo Choi
  • Publication number: 20130030802
    Abstract: Maintaining and supplying a plurality of speech models is provided. A plurality of speech models and metadata for each speech model are stored. A query for a speech model is received from a source. The query includes one or more conditions. The speech model with metadata most closely matching the supplied one or more conditions is determined. The determined speech model is provided to the source. A refined speech model is received from the source, and the refined speech model is stored.
    Type: Application
    Filed: July 9, 2012
    Publication date: January 31, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bin Jia, Ying Liu, E. Feng Lu, Jia Wu, Zhen Zhang
  • Publication number: 20130030808
    Abstract: Systems and methods are provided for scoring non-native speech. Two or more speech samples are received, where each of the samples are of speech spoken by a non-native speaker, and where each of the samples are spoken in response to distinct prompts. The two or more samples are concatenated to generate a concatenated response for the non-native speaker, where the concatenated response is based on the two or more speech samples that were elicited using the distinct prompts. A concatenated speech proficiency metric is computed based on the concatenated response, and the concatenated speech proficiency metric is provided to a scoring model, where the scoring model generates a speaking score based on the concatenated speech metric.
    Type: Application
    Filed: July 24, 2012
    Publication date: January 31, 2013
    Inventors: Klaus Zechner, Su-Youn Yoon, Lei Chen, Shasha Xie, Xiaoming Xi, Chaitanya Ramineni
  • Publication number: 20130030803
    Abstract: A microphone-array-based speech recognition system combines a noise cancelling technique for cancelling noise of input speech signals from an array of microphones, according to at least an inputted threshold. The system receives noise-cancelled speech signals outputted by a noise masking module through at least a speech model and at least a filler model, then computes a confidence measure score with the at least a speech model and the at least a filler model for each threshold and each noise-cancelled speech signal, and adjusts the threshold to continue the noise cancelling for achieving a maximum confidence measure score, thereby outputting a speech recognition result related to the maximum confidence measure score.
    Type: Application
    Filed: October 12, 2011
    Publication date: January 31, 2013
    Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE
    Inventor: Hsien-Cheng Liao
  • Publication number: 20130021362
    Abstract: An apparatus includes an input unit, a microphone, a control unit, and a voice recognition unit. The input unit is configured to receive a first type input and a second type input. The microphone is configured to receive an input sound signal. The control unit is configured to control a display to display feedback according to a type of input. The voice recognition unit is configured to perform recognition processing on the input sound signal.
    Type: Application
    Filed: July 10, 2012
    Publication date: January 24, 2013
    Applicant: SONY CORPORATION
    Inventors: Akiko SAKURADA, Osamu Shigeta, Nariaki Sato, Yasuyuki Koga, Kazuyuki Yamamoto
  • Publication number: 20130024197
    Abstract: An electronic device and a method for controlling an electronic device are disclosed. The electronic device includes: a display unit; a voice input unit; and a controller displaying a plurality of contents on the display unit, receiving a voice command for controlling any one of the plurality of contents through the voice input unit, and controlling content corresponding to the received voice command. Multitasking performed by the electronic device can be effectively controlled through a voice command.
    Type: Application
    Filed: October 13, 2011
    Publication date: January 24, 2013
    Applicant: LG ELECTRONICS INC.
    Inventors: Seokbok JANG, Jongse PARK, Joonyup LEE, Jungkyu CHOI
  • Publication number: 20130013308
    Abstract: An approach is provided for determining a user age range. An age estimator causes, at least in part, acquisition of voice data. Next, the age estimator calculates a first set of probability values, wherein each of the probability values represents a probability that the voice data is in a respective one of a plurality of predefined age ranges, and the predefined age ranges are segments of a lifespan. Then, the age estimator derives a second set of probability values by applying a correlation matrix to the first set of probability values, wherein the correlation matrix associates the first set of probability values with probabilities of the voice data matching individual ages over the lifespan. Then, the age estimator, for each of the predefined age ranges, calculates a sum of the probabilities in the second set of probability values corresponding to the individual ages within the respective predefined age ranges.
    Type: Application
    Filed: March 23, 2010
    Publication date: January 10, 2013
    Applicant: NOKIA CORPORATION
    Inventors: Yang Cao, Feng Ding, Jilei Tian
  • Publication number: 20130013289
    Abstract: Provided are a method of extracting an experience-revealing sentence from a blog document and a method of classifying verbs into activity verbs and state verbs in a sentence recorded in a blog document. The method of extracting an experience sentence from a blog document includes generating a sentence classifier using a machine learning algorithm based on grammatical features, and classifying experience sentences that represent actual experiences of users and non-experience sentences that represent no experience in the blog document using the sentence classifier. By classifying sentences in a blog document into experience sentences and non-experience sentences, it is possible to extract experiences that a user has actually had or that have actually happened to a user from the document.
    Type: Application
    Filed: July 7, 2011
    Publication date: January 10, 2013
    Applicant: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGY
    Inventors: Sung Hyon Myaeng, Keun Chan Park, Yoon Jae Jeong
  • Publication number: 20130013297
    Abstract: A message service method using speech recognition includes a message server recognizing a speech transmitted from a transmission terminal, generating and transmitting a recognition result of the speech and N-best results based on a confusion network to the transmission terminal; if a message is selected through the recognition result and the N-best results and an evaluation result according to accuracy of the message are decided, the transmission terminal transmitting the message and the evaluation result to a reception terminal; and the reception terminal displaying the message and the evaluation result.
    Type: Application
    Filed: July 5, 2012
    Publication date: January 10, 2013
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Hwa Jeon SONG, YunKeun Lee, Jeon Gue Park, Jong Jin Kim, Ki-Young Park, Hoon Chung, Hyung-Bae Jeon, Ho Young Jung, Euisok Chung, Jeom Ja Kang, Byung Ok Kang, Sang Kyu Park, Sung Joo Lee, Yoo Rhee Oh
  • Publication number: 20130013319
    Abstract: Methods and apparatus for initiating an action using a voice-controlled human interface. The interface provides a hands free, voice driven environment to control processes and applications. According to one embodiment, a method comprises electronically receiving first user input, parsing the first user input to determine whether the first user input contains a command activation statement that cues a voice-controlled human interface to enter a command mode in which a second user input comprising a voice signal is processed to identify at least one executable command and, in response to to determining that the first user input comprises the command activation statement, identifying the at least one executable command in the second user input.
    Type: Application
    Filed: September 14, 2012
    Publication date: January 10, 2013
    Applicant: Nuance Communications, Inc.
    Inventors: Richard Grant, Pedro E. McGregor
  • Patent number: 8352273
    Abstract: There is provided a device for performing interaction between a user and a machine. The device includes a plurality of domains corresponding to a plurality of stages in the interaction. Each of the domains has voice comprehension means which understands the content of the user's voice. The device includes: means for recognizing the user's voice; means for selecting a domain enabling the best voice comprehension results as ht domain; means for referencing task knowledge of the domain and extracting a task correlated to the voice comprehension result; means for obtaining a sub task sequence correlated to the extracted task; means for setting the first sub task of the sub task sequence as the sub task and updating the domain to which the sub task belongs as the domain; means for extracting a behavior or sub task end flag correlated to the voice comprehension result and the subtask; and means for causing the machine to execute the extracted behavior.
    Type: Grant
    Filed: July 26, 2006
    Date of Patent: January 8, 2013
    Assignee: Honda Motor Co., Ltd.
    Inventor: Mikio Nakano
  • Publication number: 20130006621
    Abstract: Methods, apparatus, and computer program products for providing a context-based grammar for automatic speech recognition, including creating by a multimodal application a context, the context comprising words associated with user activity in the multimodal application, and supplementing by the multimodal application a grammar for automatic speech recognition in dependence upon the context.
    Type: Application
    Filed: September 13, 2012
    Publication date: January 3, 2013
    Applicant: Nuance Communications, Inc.
    Inventors: Charles W. Cross, JR., Frank L. Jania
  • Publication number: 20130006631
    Abstract: Environmental recognition systems may improve recognition accuracy by leveraging local and nonlocal features in a recognition target. A local decoder may be used to analyze local features, and a nonlocal decoder may be used to analyze nonlocal features. Local and nonlocal estimates may then be exchanged to improve the accuracy of the local and nonlocal decoders. Additional iterations of analysis and exchange may be performed until a predetermined threshold is reached. In some embodiments, the system may comprise extrinsic information extractors to prevent positive feedback loops from causing the system to adhere to erroneous previous decisions.
    Type: Application
    Filed: June 28, 2012
    Publication date: January 3, 2013
    Applicant: UTAH STATE UNIVERSITY
    Inventors: Jacob Gunther, Todd Moon
  • Publication number: 20130006620
    Abstract: A system and method for providing automatic and coordinated sharing of conversational resources, e.g., functions and arguments, between network-connected servers and devices and their corresponding applications. In one aspect, a system for providing automatic and coordinated sharing of conversational resources includes a network having a first and second network device, the first and second network device each comprising a set of conversational resources, a dialog manager for managing a conversation and executing calls requesting a conversational service, and a communication stack for communicating messages over the network using conversational protocols, wherein the conversational protocols establish coordinated network communication between the dialog managers of the first and second network device to automatically share the set of conversational resources of the first and second network device, when necessary, to perform their respective requested conversational service.
    Type: Application
    Filed: September 11, 2012
    Publication date: January 3, 2013
    Applicant: Nuance Communications, Inc.
    Inventors: Stephane H. Maes, Ponani Gopalakrishnan
  • Publication number: 20130006625
    Abstract: A system, method, and computer program product for automatically analyzing multimedia data audio content are disclosed. Embodiments receive multimedia data, detect portions having specified audio features, and output a corresponding subset of the multimedia data and generated metadata. Audio content features including voices, non-voice sounds, and closed captioning, from downloaded or streaming movies or video clips are identified as a human probably would do, but in essentially real time. Particular speakers and the most meaningful content sounds and words and corresponding time-stamps are recognized via database comparison, and may be presented in order of match probability. Embodiments responsively pre-fetch related data, recognize locations, and provide related advertisements. The content features may be also sent to search engines so that further related content may be identified. User feedback and verification may improve the embodiments over time.
    Type: Application
    Filed: June 28, 2011
    Publication date: January 3, 2013
    Applicant: Sony Corporation
    Inventors: Priyan Gunatilake, Djung Nguyen, Abhishek Patil, Dipendu Saha
  • Publication number: 20130006629
    Abstract: The present invention relates to a searching device, searching method, and program whereby searching for a word string corresponding to input voice can be performed in a robust manner. A voice recognition unit 11 subjects an input voice to voice recognition. A matching unit 16 performs matching, for each of multiple word strings for search results which are word strings that are to be search results for word strings corresponding to the input voice, of a pronunciation symbol string for search results, which is an array of pronunciation symbols expressing pronunciation of the word string search result, and a recognition result pronunciation symbol string which is an array of pronunciation symbols expressing pronunciation of the voice recognition results of the input voice.
    Type: Application
    Filed: December 2, 2010
    Publication date: January 3, 2013
    Applicant: SONY CORPORATION
    Inventors: Hitoshi Honda, Yoshinori Maeda, Satoshi Asakawa
  • Publication number: 20120330662
    Abstract: An input supporting system (1) includes a database (10) which accumulates data for a plurality of items therein, an extraction unit (104) which compares, with the data for the items in the database (10), input data which is obtained as a result of a speech recognition process on speech data (D0), and extracts data similar to the input data from the database, and a presentation unit (106) which presents the extracted data as candidates to be registered in the database (10).
    Type: Application
    Filed: January 17, 2011
    Publication date: December 27, 2012
    Applicant: NEC CORPORATION
    Inventor: Masahiro Saikou
  • Publication number: 20120328086
    Abstract: A method and system for improving user satisfaction with a computer system that includes a computer. The computer prompts a user at a user machine to select a language usage pattern preference from at least two language usage pattern preference choices respectively including at least two text passages, each text passage expressing different text. After the prompting, the computer receives from the user machine a language usage pattern preference selected by the user from the at least two language usage pattern preference choices. The computer stores, in a user profile of the user located in a database accessible to the computer, a flag indicative of the selected language usage pattern preference.
    Type: Application
    Filed: September 5, 2012
    Publication date: December 27, 2012
    Applicant: International Business Machines Corporation
    Inventors: Nathan Raymond Hughes, Nishant Srinath Rao, Michelle Ann Uretsky
  • Publication number: 20120330654
    Abstract: A computer implemented method, system, and/or computer program product generates an audio cohort. Audio data from a set of audio sensors is received by an audio analysis engine. The audio data, which is associated with a plurality of objects, comprises a set of audio patterns. The audio data is processed to identify audio attributes associated with the plurality of objects to form digital audio data. This digital audio data comprises metadata that describes the audio attributes of the set of objects. A set of audio cohorts is generated using the audio attributes associated with the digital audio data and cohort criteria, where each audio cohort in the set of audio cohorts is a cohort of accompanied customers in a store, and where processing the audio data identifies a type of zoological creature that is accompanying each of the accompanied customers.
    Type: Application
    Filed: September 6, 2012
    Publication date: December 27, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: ROBERT LEE ANGELL, ROBERT R. FRIEDLANDER, JAMES R. KRAEMER
  • Publication number: 20120323576
    Abstract: Event audio data that is based on verbal utterances associated with a pharmaceutical event associated with a patient may be received. Medical history information associated with the patient may be obtained, based on information included in a medical history repository. At least one text string that matches at least one interpretation of the event audio data may be obtained, based on information included in a pharmaceutical speech repository, information included in a speech accent repository, and a drug matching function, the at least one text string being associated with a pharmaceutical drug. One or more adverse drug event (ADE) alerts may be determined based on matching the at least one text string and medical history attributes associated with the at least one patient with ADE attributes obtained from an ADE repository. An ADE alert report may be generated, based on the determined one or more ADE alerts.
    Type: Application
    Filed: June 17, 2011
    Publication date: December 20, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Tao Wang, Bin Zhou
  • Publication number: 20120323577
    Abstract: Methods of automatic speech recognition for premature enunciation. In one method, a) a user is prompted to input speech, then b) a listening period is initiated to monitor audio via a microphone, such that there is no pause between the end of step a) and the beginning of step b), and then the begin-speaking audible indicator is communicated to the user during the listening period. In another method, a) at least one audio file is played including both a prompt for a user to input speech and a begin-speaking audible indicator to the user, b) a microphone is activated to monitor audio, after playing the prompt but before playing the begin-speaking audible indicator in step a), and c) speech is received from the user via the microphone.
    Type: Application
    Filed: June 16, 2011
    Publication date: December 20, 2012
    Applicant: GENERAL MOTORS LLC
    Inventors: John J. Correia, Rathinavelu Chengalvarayan, Gaurav Talwar, Xufang Zhao
  • Publication number: 20120323573
    Abstract: A method for scoring non-native speech includes receiving a speech sample spoken by a non-native speaker and performing automatic speech recognition and metric extraction on the speech sample to generate a transcript of the speech sample and a speech metric associated with the speech sample. The method further includes determining whether the speech sample is scorable or non-scorable based upon the transcript and speech metric, where the determination is based on an audio quality of the speech sample, an amount of speech of the speech sample, a degree to which the speech sample is off-topic, whether the speech sample includes speech from an incorrect language, or whether the speech sample includes plagiarized material. When the sample is determined to be non-scorable, an indication of non-scorability is associated with the speech sample. When the sample is determined to be scorable, the sample is provided to a scoring model for scoring.
    Type: Application
    Filed: March 23, 2012
    Publication date: December 20, 2012
    Inventors: Su-Youn Yoon, Derrick Higgins, Klaus Zechner, Shasha Xie, Je Hun Jeon, Keelan Evanini
  • Publication number: 20120316875
    Abstract: Embodiments of the invention provide systems and methods for speech signal handling. Speech handling according to one embodiment of the present invention can be performed via a hosted architecture. Electrical signal representing human speech can be analyzed with an Automatic Speech Recognizer (ASR) hosted on a different server from a media server or other server hosting a service utilizing speech input. Neither server need be located at the same location as the user. The spoken sounds can be accepted as input to and handled with a media server which identifies parts of the electrical signal that contain a representation of speech. This architecture can serve any user who has a web-browser and Internet access, either on a PC, PDA, cell phone, tablet, or any other computing device.
    Type: Application
    Filed: June 8, 2012
    Publication date: December 13, 2012
    Applicant: Red Shift Company, LLC
    Inventors: JOEL NYQUIST, Matthew Robinson
  • Publication number: 20120316870
    Abstract: A communication unit, a voice input unit, a storage unit, and a processor are included in a communication device. The communication unit enables communication between the device and other communication devices. The voice input unit receives voice signals, which may correspond to one stored speech command and an related operation. The processor detects a match, and executes the desired operation. A related communication method is also provided.
    Type: Application
    Filed: August 24, 2011
    Publication date: December 13, 2012
    Applicant: HON HAI PRECISION INDUSTRY CO., LTD.
    Inventors: YING-CHUAN YU, YING-XIONG HUANG
  • Publication number: 20120316877
    Abstract: A dynamic exponential, feature-based, language model is continually adjusted per utterance by a user, based on the user's usage history. This adjustment of the model is done incrementally per user, over a large number of users, each with a unique history. The user history can include previously recognized utterances, text queries, and other user inputs. The history data for a user is processed to derive features. These features are then added into the language model dynamically for that user.
    Type: Application
    Filed: June 12, 2011
    Publication date: December 13, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Geoffrey Zweig, Shuangyu Chang
  • Publication number: 20120316876
    Abstract: A display system, a display device, a control method for the display device, and a voice recognition system are disclosed. A display device according to one embodiment of the present invention can carry out voice recognition upon a voice received from at least one speaker through at least one voice input device; and display the voice recognition result on the display unit. Accordingly, effective voice recognition is made possible for TV environments where various constraints exist differently from mobile terminal environments.
    Type: Application
    Filed: September 23, 2011
    Publication date: December 13, 2012
    Inventors: Seokbok Jang, Jongse Park, Joonyup Lee, Jungkyu Choi
  • Publication number: 20120316871
    Abstract: An automatic speech recognition system includes an audio capture component, a speech recognition processing component, and a result processing component which are distributed among two or more logical devices and/or two or more physical devices. In particular, the audio capture component may be located on a different logical device and/or physical device from the result processing component. For example, the audio capture component may be on a computer connected to a microphone into which a user speaks, while the result processing component may be on a terminal server which receives speech recognition results from a speech recognition processing server.
    Type: Application
    Filed: June 8, 2012
    Publication date: December 13, 2012
    Inventors: Detlef Koll, Michael Finke
  • Publication number: 20120316868
    Abstract: Methods and systems are described for changing a communication quality of a communication session based on a meaning of speech data. Speech data exchanged between clients participating in a communication session is parsed. A meaning of the parsed speech data is determined to determine a communication quality of the communication session. An action is performed to change the communication quality of the communication session based on the meaning of the parsed speech data.
    Type: Application
    Filed: August 27, 2012
    Publication date: December 13, 2012
    Inventor: Mona Singh