Speech Recognition Depending On Application Context, E.g., In A Computer, Etc. (epo) Patents (Class 704/E15.044)
  • Patent number: 9542486
    Abstract: A computer-implemented technique can include receiving a media feed from a speaker computing device representing speech of a speaker user captured by the speaker computing device. The technique can include receiving a plurality of translation requests, each translation request being received from a listener computing device associated with a listener user and corresponding to a request to obtain a translated version of the media feed into a preferred language of the listener user. The technique can include determining the preferred language for each listener user. The technique can include obtaining a machine translated media feed for each of the translation requests, the machine translated media feed corresponding to a translation of the media feed from the source language to the preferred language of the listener user associated with the translation request. The technique can also include outputting the machine translated media feeds to the listener computing devices.
    Type: Grant
    Filed: May 29, 2014
    Date of Patent: January 10, 2017
    Assignee: Google Inc.
    Inventors: Alexander Jay Cuthbert, Joshua James Estelle
  • Patent number: 9398099
    Abstract: An information processing apparatus includes a generation unit configured to generate a first identifier and a second identifier for identifying an external device, a storage unit configured to store the generated first and second identifiers in association with each other, a first processing unit configured to execute processing based on a received first processing request, a transmission unit configured to transmit the first identifier together with a processing result of the first processing unit to the external device, and a second processing unit configured to, if a reception unit receives a second processing request including the first identifier transmitted from the transmission unit, identify the external device by the second identifier corresponding to the first identifier, and execute the processing based on the second processing request.
    Type: Grant
    Filed: September 19, 2011
    Date of Patent: July 19, 2016
    Assignee: Canon Kabushiki Kaisha
    Inventor: Kunimasa Fujisawa
  • Publication number: 20140108010
    Abstract: A voice-enabled document system facilitates execution of service delivery operations by eliminating the need for manual or visual interaction during information retrieval by an operator. Access to voice-enabled documents can facilitate operations for mobile vendors, on-site or field-service repairs, medical service providers, food service providers, and the like. Service providers can access the voice-enabled documents by using a client device to retrieve the document, display it on a screen, and, via voice commands initiate playback of selected audio files containing information derived from text data objects selected from the document. Data structures that are components of a voice-enabled document include audio playback files and a logical association that links the audio playback files to user-selectable fields, and to a set of voice commands.
    Type: Application
    Filed: October 11, 2012
    Publication date: April 17, 2014
    Applicant: INTERMEC IP CORP.
    Inventors: Paul Maltseff, Roger Byford, Jim Logan
  • Publication number: 20130325464
    Abstract: The disclosure provides a method for displaying words. In the method, a speech signal is received. A pitch contour and an energy contour of the speech signal are extracted. Speech recognition is performed on the speech signal to recognize a plurality of words corresponding to the speech signal and determine time alignment information of each of the plurality of words. At least one display parameter of each of the plurality of words is determined according to the pitch contour, the energy contour and the time alignment information of each of the plurality of words. Thus, the plurality of words is integrated into a sentence according to the at least one display parameter of each of the plurality of words. Then, the sentence is displayed on at least one display device.
    Type: Application
    Filed: September 14, 2012
    Publication date: December 5, 2013
    Inventors: Yu-Chen Huang, Che-Kuang Lin
  • Publication number: 20130304453
    Abstract: Techniques are disclosed for automatically generating structured documents based on speech, including identification of relevant concepts and their interpretation. In one embodiment, a structured document generator uses an integrated process to generate a structured textual document (such as a structured textual medical report) based on a spoken audio stream. The spoken audio stream may be recognized using a language model which includes a plurality of sub-models arranged in a hierarchical structure. Each of the sub-models may correspond to a concept that is expected to appear in the spoken audio stream. Different portions of the spoken audio stream may be recognized using different sub-models. The resulting structured textual document may have a hierarchical structure that corresponds to the hierarchical structure of the language sub-models that were used to generate the structured textual document.
    Type: Application
    Filed: May 22, 2009
    Publication date: November 14, 2013
    Inventors: Juergen Fritsch, Michael Finke, Detlef Koll, Monika Woszczyna, Girija Yegnanarayanan
  • Publication number: 20130262106
    Abstract: A system and method for adapting a language model to a specific environment by receiving interactions captured the specific environment, generating a collection of documents from documents retrieved from external resources, detecting in the collection of documents terms related to the environment that are not included in an initial language model and adapting the initial language model to include the terms detected.
    Type: Application
    Filed: March 29, 2012
    Publication date: October 3, 2013
    Inventors: Eyal HURVITZ, Ezra Daya, Oren Pereg, Moshe Wasserblat
  • Publication number: 20130170752
    Abstract: A system, method, and computer-readable medium, is described that implements a resource navigation links tool that receives one or more inputs, extracts information from the inputs into a submission string, submits the submission string to a resource navigation links tool, and receives resource navigation links based on the submission string. Inputs types may include images, audio clips, and metadata. The inputs sources may be processed to extract information related to the image source to build the submission string.
    Type: Application
    Filed: December 30, 2011
    Publication date: July 4, 2013
    Inventors: Harshini Ramnath Krishnan, Neel Goyal, Vincent Raemy
  • Publication number: 20130080164
    Abstract: This specification describes technologies relating to recognition of text in various media. In general, one aspect of the subject matter described in this specification can be embodied in methods that include receiving an input signal including data representing one or more words and passing the input signal to a text recognition system that generates a recognized text string based on the input signal. The methods may further include receiving the recognized text string from the text recognition system. The methods may further include presenting the recognized text string to a user and receiving a corrected text string based on input from the user. The methods may further include checking if an edit distance between the corrected text string and the recognized text string is below a threshold. If the edit distance is below the threshold, the corrected text string may be passed to the text recognition system for training purposes.
    Type: Application
    Filed: September 26, 2012
    Publication date: March 28, 2013
    Inventors: Luca Zanolin, Marcus A. Foster, Richard Z. Cohen
  • Publication number: 20130054238
    Abstract: Input context for a statistical dialog manager may be provided. Upon receiving a spoken query from a user, the query may be categorized according to at least one context clue. The spoken query may then be converted to text according to a statistical dialog manager associated with the category of the query and a response to the spoken query may be provided to the user.
    Type: Application
    Filed: August 29, 2011
    Publication date: February 28, 2013
    Applicant: Microsoft Corporation
    Inventors: Michael Bodell, John Bain, Robert Chambers, Karen M. Cross, Michael Kim, Nick Gedge, Daniel Frederick Penn, Kunal Patel, Edward Mark Tecot, Jeremy C. Waltmunson
  • Publication number: 20130041664
    Abstract: A method and apparatus is provided for annotating video content with metadata generated using speech recognition technology. The method begins by rendering video content on a display device. A segment of speech is received from a user such that the speech segment annotates a portion of the video content currently being rendered. The speech segment is converted to a text-segment and the text-segment is associated with the rendered portion of the video content. The text segment is stored in a selectively retrievable manner so that it is associated with the rendered portion of the video content.
    Type: Application
    Filed: October 17, 2012
    Publication date: February 14, 2013
  • Publication number: 20130013306
    Abstract: A computer program product, for performing data determination from medical record transcriptions, resides on a computer-readable medium and includes computer-readable instructions for causing a computer to obtain a medical transcription of a dictation, the dictation being from medical personnel and concerning a patient, analyze the transcription for an indicating phrase associated with a type of data desired to be determined from the transcription, the type of desired data being relevant to medical records, determine whether data indicated by text disposed proximately to the indicating phrase is of the desired type, and store an indication of the data if the data is of the desired type.
    Type: Application
    Filed: September 13, 2012
    Publication date: January 10, 2013
    Applicant: eScription Inc.
    Inventors: Roger S. Zimmerman, Paul Egerman, George Zavaliagkos
  • Publication number: 20120330659
    Abstract: An information processing device includes a display data creating unit configured to create display data including characters representing the content of an utterance based on a sound and a symbol surrounding the characters and indicating a first direction, and an image combining unit configured to determine the position of the display data based on a display position of an image representing a sound source of the utterance, and to combine the display data and the image of the sound source so that an orientation in which the sound is radiated is matched with the first direction.
    Type: Application
    Filed: June 21, 2012
    Publication date: December 27, 2012
    Applicant: HONDA MOTOR CO., LTD.
    Inventor: Kazuhiro NAKADAI
  • Publication number: 20120310623
    Abstract: An item of information (212) is transmitted to a distal computer (220), translated to a different sense modality and/or language (222), and in substantially real time, and the translation (222) is transmitted back to the location (211) from which the item was sent. The device sending the item is preferably a wireless device, and more preferably a cellular or other telephone (210). The device receiving the translation is also preferably a wireless device, and more preferably a cellular or other telephone, and may advantageously be the same device as the sending device. The item of information (212) preferably comprises a sentence of human of speech having at least ten words, and the translation is a written expression of the sentence. All of the steps of transmitting the item of information, executing the program code, and transmitting the translated information preferably occurs in less than 60 seconds of elapsed time.
    Type: Application
    Filed: August 10, 2012
    Publication date: December 6, 2012
    Inventor: Robert D. Fish
  • Publication number: 20120226499
    Abstract: Methods of adding data identifiers and speech/voice recognition functionality are disclosed. A telnet client runs one or more scripts that add data identifiers to data fields in a telnet session. The input data is inserted in the corresponding fields based on data identifiers. Scripts run only on the telnet client without modifications to the server applications. Further disclosed are methods for providing speech recognition and voice functionality to telnet clients. Portions of input data are converted to voice and played to the user. A user also may provide input to certain fields of the telnet session by using his voice. Scripts running on the telnet client convert the user's voice into text and is inserted to corresponding fields.
    Type: Application
    Filed: May 9, 2012
    Publication date: September 6, 2012
  • Publication number: 20120212337
    Abstract: An original text that is a representation of a narration of a patient encounter provided by a clinician may be received and re-formatted to produce a formatted text. One or more clinical facts may be extracted from the formatted text. A first fact of the clinical facts may be extracted from a first portion of the formatted text, and the first portion of the formatted text may be a formatted version of a first portion of the original text. A linkage may be maintained between the first fact and the first portion of the original text.
    Type: Application
    Filed: February 18, 2011
    Publication date: August 23, 2012
    Applicant: Nuance Communications, Inc.
    Inventors: Frank Montyne, David Decraene, Joeri Van der Vloet, Johan Raedemaeker, Ignace Desimpel, Frederik Coppens, Tom Deray, James R. Flanagan, Mariana Casella dos Santos, Marnix Holvoet, Maria van Gurp, David Hellman, Girija Yegnanarayanan, Karen Anne Doyle
  • Publication number: 20120150538
    Abstract: A textual representation of a voice message is provided to a communication device, such as a mobile phone, for example, when the mobile phone is operating in a silent mode. The voice message is input by a caller and the voice message converted to phonemes. A text representation of the voice message is transmitted to the mobile phone. The representation includes characters based on the phonemes with well known words being represented in an easily understood shorthand format.
    Type: Application
    Filed: February 20, 2012
    Publication date: June 14, 2012
    Applicant: Xerox Corporation
    Inventors: Denys M. Proux, Eric H. Cheminot
  • Publication number: 20120143606
    Abstract: A method and system for monitoring video assets provided by a multimedia content distribution network includes testing closed captions provided in output video signals. A video and audio portion of a video signal are acquired during a time period that a closed caption occurs. A first text string is extracted from a text portion of a video image, while a second text string is extracted from speech content in the audio portion. A degree of matching between the strings is evaluated based on a threshold to determine when a caption error occurs. Various operations may be performed when the caption error occurs, including logging caption error data and sending notifications of the caption error.
    Type: Application
    Filed: December 1, 2010
    Publication date: June 7, 2012
    Inventor: Hung John Pham
  • Publication number: 20120059651
    Abstract: A mobile communications device includes a network interface for communicating over a wide-area network, an input/output interface for communicating over a PAN and a display. The communication device also includes one or more processors for executing machine-executable instructions and one or more machine-readable storage media for storing the machine-executable instructions. The instructions, when executed by the one more processors, implement a voice proximity component, a speech-to-text component and a user interface. The voice proximity component is configured to select a first user's voice from among a plurality of user voices. The first user voice belongs to a user who is in closest proximity to the mobile communication device. The speech-to-text component is configured to convert to text in real-time speech received from the first user but not the other users. The user interface is arranged for displaying the text on the display as it received over the PAN from the other mobile communication devices.
    Type: Application
    Filed: September 7, 2010
    Publication date: March 8, 2012
    Inventors: Jonathan Delgado, Alfredo Alvarez Lamela
  • Publication number: 20110320197
    Abstract: It comprises analyzing audio content of multimedia files and performing a speech to text transcription thereof automatically by means of an ASR process, and selecting acoustic and language models adapted for the ASR process at least before the latter processes the multimedia file, i.e. “a priori”. The method is particularly applicable to the automatic indexing, aggregation and clustering of news from different sources and from different types of files, including text, audio and audiovisual documents without any manual annotation.
    Type: Application
    Filed: April 27, 2011
    Publication date: December 29, 2011
    Applicant: TELEFONICA S.A.
    Inventors: David CONEJERO, Helenca DUXANS, Gregorio ESCALADA
  • Publication number: 20110301951
    Abstract: A questionnaire is presented to a user in a more efficient manner in which the user is more likely to participate. The questionnaire is sent electronically to the user's vehicle and presented audibly to the user. The user responds audibly to the questions in the questionnaire. The user's responses are converted to text and sent back to the provider server for tallying.
    Type: Application
    Filed: June 7, 2011
    Publication date: December 8, 2011
    Inventor: Otman A. Basir
  • Publication number: 20110288861
    Abstract: Disclosed are techniques and systems to provide a narration of a text.
    Type: Application
    Filed: May 18, 2010
    Publication date: November 24, 2011
    Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman, Lucy Gibson
  • Publication number: 20110246196
    Abstract: A triple factor authentication in one step method and system is disclosed. According to one embodiment, an Integrated Voice Biometrics Cloud Security Gateway (IVCS Gateway) intercepts an access request to a resource server from a user using a user device. IVCS Gateway then authenticates the user by placing a call to the user device and sending a challenge message prompting the user to respond by voice. After receiving the voice sample of the user, the voice sample is compared against a stored voice biometrics record for the user. The voice sample is also converted into a text phrase and compared against a stored secret text phrase. In an alternative embodiment, an IVCS Gateway that is capable of making non-binary access decisions and associating multiple levels of access with a single user or group is described.
    Type: Application
    Filed: March 30, 2011
    Publication date: October 6, 2011
  • Publication number: 20110231379
    Abstract: Techniques described herein generally relate to real time inference based systems. Example embodiments may set forth devices, methods, and computer programs related to search engine inference based virtual assistance. One example method may include a computing device adapted to receive text as input and a computer processor arranged to determine at least one inference regarding subject matter of the text based on one or more web searches of one or more terms within the text. The inference(s) may then be automatically displayed upon the inference(s) being determined. The text may be automatically received as input from a voice-to-text converter as voice-to-text conversion producing the text is occurring.
    Type: Application
    Filed: March 16, 2010
    Publication date: September 22, 2011
    Inventor: Ezekiel Kruglick
  • Publication number: 20110112832
    Abstract: A media archive comprising a plurality of media resources associated with events that occurred during a time interval are processed to synchronize the media resources. Sequences of patterns are identified in each media resource of the media archive. Elements of the sequences associated with different media resources are correlated such that a set of correlated elements is associated with the same event that occurred in the given time interval. The synchronization information of the processed media resources is represented in a flexible and extensible data format. The synchronization information is used for correction of errors occurring in the media resources of a media archive, for enhancing processes identifying information in media resources, for example by transcription of audio resources or by optical character recognition of images.
    Type: Application
    Filed: September 30, 2010
    Publication date: May 12, 2011
    Inventors: Michael F. Prorock, Thomas J. Prorock
  • Publication number: 20110029316
    Abstract: According to the present invention, a method for integrating processes with a multi-faceted human centered interface is provided. The interface is facilitated to implement a hands free, voice driven environment to control processes and applications. A natural language model is used to parse voice initiated commands and data, and to route those voice initiated inputs to the required applications or processes. The use of an intelligent context based parser allows the system to intelligently determine what processes are required to complete a task which is initiated using natural language. A single window environment provides an interface which is comfortable to the user by preventing the occurrence of distracting windows from appearing. The single window has a plurality of facets which allow distinct viewing areas. Each facet has an independent process routing its outputs thereto. As other processes are activated, each facet can reshape itself to bring a new process into one of the viewing areas.
    Type: Application
    Filed: October 18, 2010
    Publication date: February 3, 2011
    Applicant: Nuance Communications, Inc.
    Inventors: Richard Grant, Pedro E. McGregor
  • Publication number: 20100332226
    Abstract: A mobile terminal and controlling method thereof are disclosed, by which a specific content and another content associated with the specific content can be quickly searched using a user's voice. The present invention includes inputting a voice for a search for a specific content provided to the mobile terminal via a microphone, analyzing a meaning of the inputted voice, searching a memory for at least one content to which a voice name having a meaning associated with the analyzed voice is tagged, and displaying the searched at least one content.
    Type: Application
    Filed: June 30, 2010
    Publication date: December 30, 2010
    Applicant: LG ELECTRONICS INC.
    Inventors: In Jik Lee, Jong Keun Youn, Dae Sung Jung, Jae Min Joh, Sun Hwa Cha, Seung Heon Yang, Jae Hoon Yu
  • Publication number: 20100324894
    Abstract: Technologies are generally described for voice to text to voice processing. An audio signal can be preprocessed and translated into text prior to being processed in the textual domain. The text domain processing or subsequent text to voice regeneration can seek to improve clarity, correct grammar, adjust vocabulary level, remove profanity, correct slang, alter dialect, alter accent, or provide other modifications of various oral communication characteristics. The processed text may be translated back into the audio domain for delivery to a listener. The processing at each stage may be driven by a set of objectives and constraints set by the speaker, the listener, a third party, or any combination of explicit or implicit participants. The voice processing may translate the voice content from a specific human language to the same human language with various improvements. The processing may also involve translation into one or more other languages.
    Type: Application
    Filed: June 17, 2009
    Publication date: December 23, 2010
    Inventor: Miodrag Potkonjak
  • Publication number: 20100299135
    Abstract: Techniques are disclosed for automatically generating structured documents based on speech, including identification of relevant concepts and their interpretation. In one embodiment, a structured document generator uses an integrated process to generate a structured textual document (such as a structured textual medical report) based on a spoken audio stream. The spoken audio stream may be recognized using a language model which includes a plurality of sub-models arranged in a hierarchical structure. Each of the sub-models may correspond to a concept that is expected to appear in the spoken audio stream. Different portions of the spoken audio stream may be recognized using different sub-models. The resulting structured textual document may have a hierarchical structure that corresponds to the hierarchical structure of the language sub-models that were used to generate the structured textual document.
    Type: Application
    Filed: May 22, 2009
    Publication date: November 25, 2010
    Inventors: Juergen Fritsch, Michael Finke, Detlef Koll, Monika Woszczyna, Girija Yegnanarayanan
  • Publication number: 20100179811
    Abstract: Occurrences of one or more keywords in audio data are identified using a speech recognizer employing a language model to derive a transcript of the keywords. The transcript is converted into a phoneme sequence. The phonemes of the phoneme sequence are mapped to the audio data to derive a time-aligned phoneme sequence that is searched for occurrences of keyword phoneme sequences corresponding to the phonemes of the keywords. Searching includes computing a confusion matrix. The language model used by the speech recognizer is adapted to keywords by increasing the likelihoods of the keywords in the language model. For each potential occurrences keywords detected, a corresponding subset of the audio data may be played back to an operator to confirm whether the potential occurrences correspond to actual occurrences of the keywords.
    Type: Application
    Filed: January 13, 2010
    Publication date: July 15, 2010
    Applicant: CRIM
    Inventors: Vishwa Nath GUPTA, Gilles BOULIANNE
  • Publication number: 20100088100
    Abstract: An electronic device may capture a voice command from a user. The electronic device may store contextual information about the state of the electronic device when the voice command is received. The electronic device may transmit the voice command and the contextual information to computing equipment such as a desktop computer or a remote server. The computing equipment may perform a speech recognition operation on the voice command and may process the contextual information. The computing equipment may respond to the voice command. The computing equipment may also transmit information to the electronic device that allows the electronic device to respond to the voice command.
    Type: Application
    Filed: October 2, 2008
    Publication date: April 8, 2010
    Inventor: Aram M. Lindahl
  • Publication number: 20090300003
    Abstract: A keyword input supporting apparatus includes a document acquisition unit that acquires a document having a plurality of components containing text data, a main component selection unit that selects a component having many characters in the text data as a main component, a part-of-speech analysis unit that analyzes the part-of-speech of the text data contained in the main component, and adds a semantic attribute to each of words of the text data, a specific name extraction unit that extracts as a specific name a word, having a predetermined semantic attribute or part of speech, from the words, a specific name storage that stores the specific name together with the corresponding semantic attribute, a keyword candidate classification unit that performs classification of the specific name from the storage as a keyword candidate based on the semantic attribute, and a keyword candidate presentation unit that presents the keyword candidate to a user.
    Type: Application
    Filed: May 27, 2009
    Publication date: December 3, 2009
    Inventors: Masaru Suzuki, Satoshi Kinoshita, Hideo Umeki, Wataru Nakano
  • Publication number: 20090110158
    Abstract: A computer implemented method, apparatus, and computer usable program product for managing a communications session. The process monitors a bandwidth of the communications device in response to detecting an exchange of audio-based messages on a communications device. In response to detecting the bandwidth below a threshold, the process converts a subsequent outgoing audio-based message into an outgoing text-based message and associates a low bandwidth indicator to the outgoing text-based message to form a distinguished text-based message. The process then transmits the distinguished text-based message for receipt by an intended recipient.
    Type: Application
    Filed: October 25, 2007
    Publication date: April 30, 2009
    Inventors: Yen-Fu Chen, Fabian F. Morgan, Keith Raymond Walker, Sarah Vijoya White Eagle
  • Publication number: 20090055174
    Abstract: Provided are a method and apparatus for automatically completing a text input using speech recognition. The method includes: receiving a first part of a text from a user through a text input device; recognizing a speech of the user, which corresponds to the text; and completing a remaining part of the text based on the first part of the text and the recognized speech. Therefore, accuracy of the text input and convenience of the speech recognition can be ensured, and a non-input part of the text can be easily input based on the input part of the text and the recognized speech at a high speed.
    Type: Application
    Filed: December 21, 2007
    Publication date: February 26, 2009
    Inventors: Ick-sang Han, Joong-mi Cho, Yoon-kyung Song, Byung-kwan Kwak, Nam-hoon Kim, Ji-yeun Kim
  • Publication number: 20090043587
    Abstract: A speech recognition system and method are provided to correctly distinguish among multiple interpretations of an utterance. This system is particularly useful when the set of possible interpretations is large, changes dynamically, and/or contains items that are not phonetically distinctive. The speech recognition system extends the capabilities of mobile wireless communication devices that are voice operated after their initial activation.
    Type: Application
    Filed: October 17, 2008
    Publication date: February 12, 2009
    Applicant: Vocera Communications, Inc.
    Inventor: Robert E. Shostak
  • Publication number: 20090030681
    Abstract: A device may receive over a network a digitized speech signal from a remote control that accepts speech. In addition, the device may convert the digitized speech signal into text, use the text to obtain command information applicable to a set-top box, and send the command information to the set-top box to control presentation of multimedia content on a television in accordance with the command information.
    Type: Application
    Filed: July 23, 2007
    Publication date: January 29, 2009
    Inventors: Ashutosh K. Sureka, Sathish K. Subramanian, Sidhartha Basu, Indivar Verma
  • Publication number: 20090024389
    Abstract: A system in one embodiment includes a server associated with a unified messaging system (UMS). The server records speech of a user as an audio data file, translates the audio data file into a text data file, and maps each word within the text data file to a corresponding segment of audio data in the audio data file. A graphical user interface (GUI) of a message editor running on an endpoint associated with the user displays the text data file on the endpoint and allows the user to identify a portion of the text data file for replacement. The server being further operable to record new speech of the user as new audio data and to replace one or more segments of the audio data file corresponding to the portion of the text with the new audio data.
    Type: Application
    Filed: July 20, 2007
    Publication date: January 22, 2009
    Applicant: Cisco Technology, Inc.
    Inventors: Joseph F. Khouri, Laurent Philonenko, Mukul Jain, Shmuel Shaffer
  • Publication number: 20090024392
    Abstract: A speech recognition dictionary making supporting system for efficiently making/updating a speech recognition dictionary/language model with reduced speech recognition errors by using text data available at low cost. The speech recognition dictionary making supporting system comprises a recognition dictionary storage section (105), a language model storage section (106), and a sound model storage section (107). A virtual speech recognizing section (102) creates virtual speech recognition result text data in regard to an analyzed text data created by a text analyzing section (101) with reference to a recognition dictionary, language model, and sound model, and compares the virtual speech recognition result text data with the original analyzed text data. An updating section (103) updates the recognition dictionary and language model so that the different portions in both the text data may be lessened.
    Type: Application
    Filed: February 2, 2007
    Publication date: January 22, 2009
    Applicant: NEC CORPORATION
    Inventor: Takafumi Koshinaka
  • Publication number: 20080288249
    Abstract: A method and a system for a speech recognition system (1), comprising an electronic document, which is a speech based document comprising one or more sections of text recognized or transcribed from sections of speech, wherein said sections of speech are dictated by an author and processed by a speech recognizer (4) in the speech recognition system (1) into corresponding sections of text of said speech based document. The method comprises the steps of dynamically creating and adapting sub contexts by said speech recognizer and associating said sub context to said sections of text.
    Type: Application
    Filed: December 7, 2006
    Publication date: November 20, 2008
    Inventors: Gerhard Grobauer, Miklos Papai
  • Publication number: 20080275707
    Abstract: A method of providing voice based device management, comprising defining a set of one or more status queries for a device, defining for each of the status queries a respective set of status responses for the device corresponding to the instantaneous status of the device, mapping the status queries to corresponding voice format status queries, and mapping the status responses to corresponding voice format status responses.
    Type: Application
    Filed: May 2, 2005
    Publication date: November 6, 2008
    Inventor: Muthukumar Suriyanarayanan
  • Publication number: 20080195393
    Abstract: Dynamically defining a VoiceXML grammar of a multimodal application, implemented with the multimodal application operating on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal application operatively coupled to a VoiceXML interpreter, and the method includes loading the X+V page by the multimodal application, from a web server into the multimodal device for execution, the X+V page including one or more VoiceXML grammars in one or more VoiceXML dialogs, including at least one in-line grammar that is declared but undefined; retrieving by the multimodal application a grammar definition for the in-line grammar from the web server without reloading the X+V page; and defining by the multimodal application the in-line grammar with the retrieved grammar definition before executing the VoiceXML dialog containing the in-line grammar.
    Type: Application
    Filed: February 12, 2007
    Publication date: August 14, 2008
    Inventors: Charles W. Cross, Hilary A. Pike, Lisa A. Seacat, Marc T. White
  • Publication number: 20080177547
    Abstract: A novel system integrates speech recognition and semantic classification, so that acoustic scores in a speech recognizer that accepts spoken utterances may be taken into account when training both language models and semantic classification models. For example, a joint association score may be defined that is indicative of a correspondence of a semantic class and a word sequence for an acoustic signal. The joint association score may incorporate parameters such as weighting parameters for signal-to-class modeling of the acoustic signal, language model parameters and scores, and acoustic model parameters and scores. The parameters may be revised to raise the joint association score of a target word sequence with a target semantic class relative to the joint association score of a competitor word sequence with the target semantic class. The parameters may be designed so that the semantic classification errors in the training data are minimized.
    Type: Application
    Filed: January 19, 2007
    Publication date: July 24, 2008
    Applicant: Microsoft Corporation
    Inventors: Sibel Yaman, Li Deng, Dong Yu, Ye-Yi Wang, Alejandro Acero
  • Publication number: 20080162136
    Abstract: Methods, apparatus, and computer program products are described for automatic speech recognition (‘ASR’) that include accepting by the multimodal application speech input and visual input for selecting or deselecting items in a selection list, the speech input enabled by a speech recognition grammar; providing, from the multimodal application to the grammar interpreter, the speech input and the speech recognition grammar; receiving, by the multimodal application from the grammar interpreter, interpretation results including matched words from the grammar that correspond to items in the selection list and a semantic interpretation token that specifies whether to select or deselect items in the selection list; and determining, by the multimodal application in dependence upon the value of the semantic interpretation token, whether to select or deselect items in the selection list that correspond to the matched words.
    Type: Application
    Filed: January 3, 2007
    Publication date: July 3, 2008
    Inventors: Ciprian Agapi, Soonthorn Ativanichayaphong, Charles W. Cross, Gerald M. McCobb
  • Publication number: 20080154593
    Abstract: The present invention can include a speech processing method for providing dictation capabilities to a voice server. The method can include a step of establishing a real-time voice communication session involving a voice interface. Speech for the communication session can be streamed to a remotely located voice server. A real-time stream of text can be received from the voice server. The stream of text can include text that has been speech-to-text converted by the voice server from the streamed speech. The voice server can use a MRCP based non-halting interface to receive the real-time stream of speech and a delivery interface to deliver real-time text to a designated endpoint.
    Type: Application
    Filed: December 22, 2006
    Publication date: June 26, 2008
  • Publication number: 20080147406
    Abstract: The present solution includes a method for dynamically switching modalities in a dialogue session involving a voice server. In the method, a dialogue session can be established between a user and a speech application. During the dialogue session, the user can interact using an original modality, which is either a speech modality, a text exchange modality, or a multi mode modality that includes a text exchange modality. The speech application can interact using a speech modality. A modality switch trigger can be detected that changes the original modality to a different modality. The modality transition to the second modality can be transparent to the speech application. The speech application can be a standard VoiceXML based speech application that lacks an inherent text exchange capability.
    Type: Application
    Filed: December 19, 2006
    Publication date: June 19, 2008
  • Publication number: 20080147407
    Abstract: The disclosed solution includes a method for dynamically switching modalities based upon inferred conditions in a dialogue session involving a speech application. The method establishes a dialogue session between a user and the speech application. During the dialogue session, the user interacts using an original modality and a second modality. The speech application interacts using a speech modality only. A set of conditions indicative of interaction problems using the original modality can be inferred. Responsive to the inferring step, the original modality can be changed to the second modality. A modality transition to the second modality can be transparent the speech application and can occur without interrupting the dialogue session. The original modality and the second modality can be different modalities; one including a text exchange modality and another including a speech modality.
    Type: Application
    Filed: December 19, 2006
    Publication date: June 19, 2008
  • Publication number: 20080147395
    Abstract: The present solution includes an automated response method. The method can receive user interactions entered through a real-time text exchange interface. These user interactions with the speech application can be dynamically and automatically converted as necessary into a format consumable by a voice server. A text input API of a voice server can be used to allow the voice server to directly accept text input. Further, automated interactions can be received from the voice server, which are dynamically and automatically converted into a format accepted by the text exchange interface. The text exchange interface can be an off-the-shelf unmodified interface. The speech application can be a VoiceXML based application that lacks an inherent text exchange capability.
    Type: Application
    Filed: December 19, 2006
    Publication date: June 19, 2008
  • Patent number: 7389213
    Abstract: A computer software product is used to create applications for enabling a dialogue between a human and a computer. The software product provides a programming tool that insulates software developers from time-consuming, technically-challenging programming tasks by enabling the developer to specify generalized instructions to a Dialogue Flow Interpreter, which invokes functions to implement a speech application, automatically populating a library with dialogue objects that are available to other applications. The speech applications created through the DFI may be implemented as COM (component object model) objects, and so the applications can be easily integrated into a variety of different platforms. In addition, “translator” object classes are provided to handle specific types of data, such as currency, numeric data, dates, times, string variables, etc. These translator object classes have utility either as part of the DFI library or as a sub-library separate from dialogue implementation.
    Type: Grant
    Filed: January 3, 2006
    Date of Patent: June 17, 2008
    Assignee: Unisys Corporation
    Inventors: Karl Wilmer Scholz, James S. Irwin, Samir Tamri
  • Publication number: 20080140418
    Abstract: A method and device for performing some preprocessing on voice transmissions depending upon the intended destination of the transmission. The device includes a receiving component configured to receive a voice signal from a source over a network. The device also includes a processing component configured to determine a destination address associated with the received signal, determine a signal processing algorithm from a plurality of signal processing algorithms based on the determined address, and process the voice signal according to the specified algorithm. The device further includes a delivery component configured to send the processed signal to the associated address.
    Type: Application
    Filed: October 30, 2007
    Publication date: June 12, 2008
    Inventor: Gilad Odinak
  • Publication number: 20080134020
    Abstract: A method and system for Extensible Markup Language (XML) application transformation. Specifically, in one embodiment, a method is disclosed for the generation of markup language applications (e.g., a VXML application) for a voice interface process. First, a call flow diagram is converted into a in an XML format. The call flow diagram describes the voice interface process. Next, a lookup table of entries in XML is created by mapping a plurality of audio files and their corresponding textual representations with audio states in the. Then, an intermediate application is created in the XML format from the by merging corresponding entries in the lookup table with the audio states. Finally, the intermediate application is transformed into a second application of a second markup language format that is a static representation of the call flow diagram.
    Type: Application
    Filed: October 23, 2007
    Publication date: June 5, 2008
    Inventor: Ramy M. Adeeb
  • Patent number: RE42868
    Abstract: A method and apparatus accesses a database where entries are linked to at least two sets of patterns. One or more patterns of a first set of patterns are recognized within a received signal. The recognized patterns are used to identify entries and compile a list of patterns in a second set of patterns to which those entries are also linked. The list is then used to recognize a second received signal. The received signals may, for example, be voice signals or signals indicating the origin or destination of the received signals.
    Type: Grant
    Filed: October 25, 1995
    Date of Patent: October 25, 2011
    Assignee: Cisco Technology, Inc.
    Inventors: David J. Attwater, Steven J. Whittaker, Francis J. Scahill, Alison D. Simons