Speech Recognition Depending On Application Context, E.g., In A Computer, Etc. (epo) Patents (Class 704/E15.044)

Information processing apparatus for executing processing in response to request from external device, control method of information processing apparatus, and storage medium

Patent number: 9398099

Abstract: An information processing apparatus includes a generation unit configured to generate a first identifier and a second identifier for identifying an external device, a storage unit configured to store the generated first and second identifiers in association with each other, a first processing unit configured to execute processing based on a received first processing request, a transmission unit configured to transmit the first identifier together with a processing result of the first processing unit to the external device, and a second processing unit configured to, if a reception unit receives a second processing request including the first identifier transmitted from the transmission unit, identify the external device by the second identifier corresponding to the first identifier, and execute the processing based on the second processing request.

Type: Grant

Filed: September 19, 2011

Date of Patent: July 19, 2016

Assignee: Canon Kabushiki Kaisha

Inventor: Kunimasa Fujisawa
VOICE-ENABLED DOCUMENTS FOR FACILITATING OPERATIONAL PROCEDURES

Publication number: 20140108010

Abstract: A voice-enabled document system facilitates execution of service delivery operations by eliminating the need for manual or visual interaction during information retrieval by an operator. Access to voice-enabled documents can facilitate operations for mobile vendors, on-site or field-service repairs, medical service providers, food service providers, and the like. Service providers can access the voice-enabled documents by using a client device to retrieve the document, display it on a screen, and, via voice commands initiate playback of selected audio files containing information derived from text data objects selected from the document. Data structures that are components of a voice-enabled document include audio playback files and a logical association that links the audio playback files to user-selectable fields, and to a set of voice commands.

Type: Application

Filed: October 11, 2012

Publication date: April 17, 2014

Applicant: INTERMEC IP CORP.

Inventors: Paul Maltseff, Roger Byford, Jim Logan
METHOD FOR DISPLAYING WORDS AND PROCESSING DEVICE AND COMPUTER PROGRAM PRODUCT THEREOF

Publication number: 20130325464

Abstract: The disclosure provides a method for displaying words. In the method, a speech signal is received. A pitch contour and an energy contour of the speech signal are extracted. Speech recognition is performed on the speech signal to recognize a plurality of words corresponding to the speech signal and determine time alignment information of each of the plurality of words. At least one display parameter of each of the plurality of words is determined according to the pitch contour, the energy contour and the time alignment information of each of the plurality of words. Thus, the plurality of words is integrated into a sentence according to the at least one display parameter of each of the plurality of words. Then, the sentence is displayed on at least one display device.

Type: Application

Filed: September 14, 2012

Publication date: December 5, 2013

Applicant: QUANTA COMPUTER INC.

Inventors: Yu-Chen Huang, Che-Kuang Lin
Automated Extraction of Semantic Content and Generation of a Structured Document from Speech

Publication number: 20130304453

Abstract: Techniques are disclosed for automatically generating structured documents based on speech, including identification of relevant concepts and their interpretation. In one embodiment, a structured document generator uses an integrated process to generate a structured textual document (such as a structured textual medical report) based on a spoken audio stream. The spoken audio stream may be recognized using a language model which includes a plurality of sub-models arranged in a hierarchical structure. Each of the sub-models may correspond to a concept that is expected to appear in the spoken audio stream. Different portions of the spoken audio stream may be recognized using different sub-models. The resulting structured textual document may have a hierarchical structure that corresponds to the hierarchical structure of the language sub-models that were used to generate the structured textual document.

Type: Application

Filed: May 22, 2009

Publication date: November 14, 2013

Inventors: Juergen Fritsch, Michael Finke, Detlef Koll, Monika Woszczyna, Girija Yegnanarayanan
METHOD AND SYSTEM FOR AUTOMATIC DOMAIN ADAPTATION IN SPEECH RECOGNITION APPLICATIONS

Publication number: 20130262106

Abstract: A system and method for adapting a language model to a specific environment by receiving interactions captured the specific environment, generating a collection of documents from documents retrieved from external resources, detecting in the collection of documents terms related to the environment that are not included in an initial language model and adapting the initial language model to include the terms detected.

Type: Application

Filed: March 29, 2012

Publication date: October 3, 2013

Inventors: Eyal HURVITZ, Ezra Daya, Oren Pereg, Moshe Wasserblat
IMAGE, AUDIO, AND METADATA INPUTS FOR KEYWORD RESOURCE NAVIGATION LINKS

Publication number: 20130170752

Abstract: A system, method, and computer-readable medium, is described that implements a resource navigation links tool that receives one or more inputs, extracts information from the inputs into a submission string, submits the submission string to a resource navigation links tool, and receives resource navigation links based on the submission string. Inputs types may include images, audio clips, and metadata. The inputs sources may be processed to extract information related to the image source to build the submission string.

Type: Application

Filed: December 30, 2011

Publication date: July 4, 2013

Inventors: Harshini Ramnath Krishnan, Neel Goyal, Vincent Raemy
Selective Feedback For Text Recognition Systems

Publication number: 20130080164

Abstract: This specification describes technologies relating to recognition of text in various media. In general, one aspect of the subject matter described in this specification can be embodied in methods that include receiving an input signal including data representing one or more words and passing the input signal to a text recognition system that generates a recognized text string based on the input signal. The methods may further include receiving the recognized text string from the text recognition system. The methods may further include presenting the recognized text string to a user and receiving a corrected text string based on input from the user. The methods may further include checking if an edit distance between the corrected text string and the recognized text string is below a threshold. If the edit distance is below the threshold, the corrected text string may be passed to the text recognition system for training purposes.

Type: Application

Filed: September 26, 2012

Publication date: March 28, 2013

Inventors: Luca Zanolin, Marcus A. Foster, Richard Z. Cohen
Using Multiple Modality Input to Feedback Context for Natural Language Understanding

Publication number: 20130054238

Abstract: Input context for a statistical dialog manager may be provided. Upon receiving a spoken query from a user, the query may be categorized according to at least one context clue. The spoken query may then be converted to text according to a statistical dialog manager associated with the category of the query and a response to the spoken query may be provided to the user.

Type: Application

Filed: August 29, 2011

Publication date: February 28, 2013

Applicant: Microsoft Corporation

Inventors: Michael Bodell, John Bain, Robert Chambers, Karen M. Cross, Michael Kim, Nick Gedge, Daniel Frederick Penn, Kunal Patel, Edward Mark Tecot, Jeremy C. Waltmunson
Method and Apparatus for Annotating Video Content With Metadata Generated Using Speech Recognition Technology

Publication number: 20130041664

Abstract: A method and apparatus is provided for annotating video content with metadata generated using speech recognition technology. The method begins by rendering video content on a display device. A segment of speech is received from a user such that the speech segment annotates a portion of the video content currently being rendered. The speech segment is converted to a text-segment and the text-segment is associated with the rendered portion of the video content. The text segment is stored in a selectively retrievable manner so that it is associated with the rendered portion of the video content.

Type: Application

Filed: October 17, 2012

Publication date: February 14, 2013

Applicant: GENERAL INSTRUMENT CORPORATION

Inventor: GENERAL INSTRUMENT CORPORATION
TRANSCRIPTION DATA EXTRACTION

Publication number: 20130013306

Abstract: A computer program product, for performing data determination from medical record transcriptions, resides on a computer-readable medium and includes computer-readable instructions for causing a computer to obtain a medical transcription of a dictation, the dictation being from medical personnel and concerning a patient, analyze the transcription for an indicating phrase associated with a type of data desired to be determined from the transcription, the type of desired data being relevant to medical records, determine whether data indicated by text disposed proximately to the indicating phrase is of the desired type, and store an indication of the data if the data is of the desired type.

Type: Application

Filed: September 13, 2012

Publication date: January 10, 2013

Applicant: eScription Inc.

Inventors: Roger S. Zimmerman, Paul Egerman, George Zavaliagkos
INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND INFORMATION PROCESSING PROGRAM

Publication number: 20120330659

Abstract: An information processing device includes a display data creating unit configured to create display data including characters representing the content of an utterance based on a sound and a symbol surrounding the characters and indicating a first direction, and an image combining unit configured to determine the position of the display data based on a display position of an image representing a sound source of the utterance, and to combine the display data and the image of the sound source so that an orientation in which the sound is radiated is matched with the first direction.

Type: Application

Filed: June 21, 2012

Publication date: December 27, 2012

Applicant: HONDA MOTOR CO., LTD.

Inventor: Kazuhiro NAKADAI
Methods For Using A Speech To Obtain Additional Information

Publication number: 20120310623

Abstract: An item of information (212) is transmitted to a distal computer (220), translated to a different sense modality and/or language (222), and in substantially real time, and the translation (222) is transmitted back to the location (211) from which the item was sent. The device sending the item is preferably a wireless device, and more preferably a cellular or other telephone (210). The device receiving the translation is also preferably a wireless device, and more preferably a cellular or other telephone, and may advantageously be the same device as the sending device. The item of information (212) preferably comprises a sentence of human of speech having at least ten words, and the translation is a written expression of the sentence. All of the steps of transmitting the item of information, executing the program code, and transmitting the translated information preferably occurs in less than 60 seconds of elapsed time.

Type: Application

Filed: August 10, 2012

Publication date: December 6, 2012

Inventor: Robert D. Fish
SCRIPTING SUPPORT FOR DATA IDENTIFIERS, VOICE RECOGNITION AND SPEECH IN A TELNET SESSION

Publication number: 20120226499

Abstract: Methods of adding data identifiers and speech/voice recognition functionality are disclosed. A telnet client runs one or more scripts that add data identifiers to data fields in a telnet session. The input data is inserted in the corresponding fields based on data identifiers. Scripts run only on the telnet client without modifications to the server applications. Further disclosed are methods for providing speech recognition and voice functionality to telnet clients. Portions of input data are converted to voice and played to the user. A user also may provide input to certain fields of the telnet session by using his voice. Scripts running on the telnet client convert the user's voice into text and is inserted to corresponding fields.

Type: Application

Filed: May 9, 2012

Publication date: September 6, 2012

Applicant: WAVELINK CORPORATION

Inventors: LAMAR JOHN VAN WAGENEN, BRANT DAVID THOMSEN, SCOTT ALLEN CADDES
METHODS AND APPARATUS FOR FORMATTING TEXT FOR CLINICAL FACT EXTRACTION

Publication number: 20120212337

Abstract: An original text that is a representation of a narration of a patient encounter provided by a clinician may be received and re-formatted to produce a formatted text. One or more clinical facts may be extracted from the formatted text. A first fact of the clinical facts may be extracted from a first portion of the formatted text, and the first portion of the formatted text may be a formatted version of a first portion of the original text. A linkage may be maintained between the first fact and the first portion of the original text.

Type: Application

Filed: February 18, 2011

Publication date: August 23, 2012

Applicant: Nuance Communications, Inc.

Inventors: Frank Montyne, David Decraene, Joeri Van der Vloet, Johan Raedemaeker, Ignace Desimpel, Frederik Coppens, Tom Deray, James R. Flanagan, Mariana Casella dos Santos, Marnix Holvoet, Maria van Gurp, David Hellman, Girija Yegnanarayanan, Karen Anne Doyle
VOICE MESSAGE CONVERTER

Publication number: 20120150538

Abstract: A textual representation of a voice message is provided to a communication device, such as a mobile phone, for example, when the mobile phone is operating in a silent mode. The voice message is input by a caller and the voice message converted to phonemes. A text representation of the voice message is transmitted to the mobile phone. The representation includes characters based on the phonemes with well known words being represented in an easily understood shorthand format.

Type: Application

Filed: February 20, 2012

Publication date: June 14, 2012

Applicant: Xerox Corporation

Inventors: Denys M. Proux, Eric H. Cheminot
METHOD AND SYSTEM FOR TESTING CLOSED CAPTION CONTENT OF VIDEO ASSETS

Publication number: 20120143606

Abstract: A method and system for monitoring video assets provided by a multimedia content distribution network includes testing closed captions provided in output video signals. A video and audio portion of a video signal are acquired during a time period that a closed caption occurs. A first text string is extracted from a text portion of a video image, while a second text string is extracted from speech content in the audio portion. A degree of matching between the strings is evaluated based on a threshold to determine when a caption error occurs. Various operations may be performed when the caption error occurs, including logging caption error data and sending notifications of the caption error.

Type: Application

Filed: December 1, 2010

Publication date: June 7, 2012

Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventor: Hung John Pham
MOBILE COMMUNICATION DEVICE FOR TRANSCRIBING A MULTI-PARTY CONVERSATION

Publication number: 20120059651

Abstract: A mobile communications device includes a network interface for communicating over a wide-area network, an input/output interface for communicating over a PAN and a display. The communication device also includes one or more processors for executing machine-executable instructions and one or more machine-readable storage media for storing the machine-executable instructions. The instructions, when executed by the one more processors, implement a voice proximity component, a speech-to-text component and a user interface. The voice proximity component is configured to select a first user's voice from among a plurality of user voices. The first user voice belongs to a user who is in closest proximity to the mobile communication device. The speech-to-text component is configured to convert to text in real-time speech received from the first user but not the other users. The user interface is arranged for displaying the text on the display as it received over the PAN from the other mobile communication devices.

Type: Application

Filed: September 7, 2010

Publication date: March 8, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Jonathan Delgado, Alfredo Alvarez Lamela
METHOD FOR INDEXING MULTIMEDIA INFORMATION

Publication number: 20110320197

Abstract: It comprises analyzing audio content of multimedia files and performing a speech to text transcription thereof automatically by means of an ASR process, and selecting acoustic and language models adapted for the ASR process at least before the latter processes the multimedia file, i.e. “a priori”. The method is particularly applicable to the automatic indexing, aggregation and clustering of news from different sources and from different types of files, including text, audio and audiovisual documents without any manual annotation.

Type: Application

Filed: April 27, 2011

Publication date: December 29, 2011

Applicant: TELEFONICA S.A.

Inventors: David CONEJERO, Helenca DUXANS, Gregorio ESCALADA
ELECTRONIC QUESTIONNAIRE

Publication number: 20110301951

Abstract: A questionnaire is presented to a user in a more efficient manner in which the user is more likely to participate. The questionnaire is sent electronically to the user's vehicle and presented audibly to the user. The user responds audibly to the questions in the questionnaire. The user's responses are converted to text and sent back to the provider server for tallying.

Type: Application

Filed: June 7, 2011

Publication date: December 8, 2011

Inventor: Otman A. Basir
Audio Synchronization For Document Narration with User-Selected Playback

Publication number: 20110288861

Abstract: Disclosed are techniques and systems to provide a narration of a text.

Type: Application

Filed: May 18, 2010

Publication date: November 24, 2011

Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman, Lucy Gibson
INTEGRATED VOICE BIOMETRICS CLOUD SECURITY GATEWAY

Publication number: 20110246196

Abstract: A triple factor authentication in one step method and system is disclosed. According to one embodiment, an Integrated Voice Biometrics Cloud Security Gateway (IVCS Gateway) intercepts an access request to a resource server from a user using a user device. IVCS Gateway then authenticates the user by placing a call to the user device and sending a challenge message prompting the user to respond by voice. After receiving the voice sample of the user, the voice sample is compared against a stored voice biometrics record for the user. The voice sample is also converted into a text phrase and compared against a stored secret text phrase. In an alternative embodiment, an IVCS Gateway that is capable of making non-binary access decisions and associating multiple levels of access with a single user or group is described.

Type: Application

Filed: March 30, 2011

Publication date: October 6, 2011

Inventor: SAJIT BHASKARAN
SEARCH ENGINE INFERENCE BASED VIRTUAL ASSISTANCE

Publication number: 20110231379

Abstract: Techniques described herein generally relate to real time inference based systems. Example embodiments may set forth devices, methods, and computer programs related to search engine inference based virtual assistance. One example method may include a computing device adapted to receive text as input and a computer processor arranged to determine at least one inference regarding subject matter of the text based on one or more web searches of one or more terms within the text. The inference(s) may then be automatically displayed upon the inference(s) being determined. The text may be automatically received as input from a voice-to-text converter as voice-to-text conversion producing the text is occurring.

Type: Application

Filed: March 16, 2010

Publication date: September 22, 2011

Inventor: Ezekiel Kruglick
AUTO-TRANSCRIPTION BY CROSS-REFERENCING SYNCHRONIZED MEDIA RESOURCES

Publication number: 20110112832

Abstract: A media archive comprising a plurality of media resources associated with events that occurred during a time interval are processed to synchronize the media resources. Sequences of patterns are identified in each media resource of the media archive. Elements of the sequences associated with different media resources are correlated such that a set of correlated elements is associated with the same event that occurred in the given time interval. The synchronization information of the processed media resources is represented in a flexible and extensible data format. The synchronization information is used for correction of errors occurring in the media resources of a media archive, for enhancing processes identifying information in media resources, for example by transcription of audio resources or by optical character recognition of images.

Type: Application

Filed: September 30, 2010

Publication date: May 12, 2011

Applicant: ALTUS LEARNING SYSTEMS, INC.

Inventors: Michael F. Prorock, Thomas J. Prorock
SPEECH RECOGNITION SYSTEM AND METHOD

Publication number: 20110029316

Abstract: According to the present invention, a method for integrating processes with a multi-faceted human centered interface is provided. The interface is facilitated to implement a hands free, voice driven environment to control processes and applications. A natural language model is used to parse voice initiated commands and data, and to route those voice initiated inputs to the required applications or processes. The use of an intelligent context based parser allows the system to intelligently determine what processes are required to complete a task which is initiated using natural language. A single window environment provides an interface which is comfortable to the user by preventing the occurrence of distracting windows from appearing. The single window has a plurality of facets which allow distinct viewing areas. Each facet has an independent process routing its outputs thereto. As other processes are activated, each facet can reshape itself to bring a new process into one of the viewing areas.

Type: Application

Filed: October 18, 2010

Publication date: February 3, 2011

Applicant: Nuance Communications, Inc.

Inventors: Richard Grant, Pedro E. McGregor
MOBILE TERMINAL AND CONTROLLING METHOD THEREOF

Publication number: 20100332226

Abstract: A mobile terminal and controlling method thereof are disclosed, by which a specific content and another content associated with the specific content can be quickly searched using a user's voice. The present invention includes inputting a voice for a search for a specific content provided to the mobile terminal via a microphone, analyzing a meaning of the inputted voice, searching a memory for at least one content to which a voice name having a meaning associated with the analyzed voice is tagged, and displaying the searched at least one content.

Type: Application

Filed: June 30, 2010

Publication date: December 30, 2010

Applicant: LG ELECTRONICS INC.

Inventors: In Jik Lee, Jong Keun Youn, Dae Sung Jung, Jae Min Joh, Sun Hwa Cha, Seung Heon Yang, Jae Hoon Yu
Voice to Text to Voice Processing

Publication number: 20100324894

Abstract: Technologies are generally described for voice to text to voice processing. An audio signal can be preprocessed and translated into text prior to being processed in the textual domain. The text domain processing or subsequent text to voice regeneration can seek to improve clarity, correct grammar, adjust vocabulary level, remove profanity, correct slang, alter dialect, alter accent, or provide other modifications of various oral communication characteristics. The processed text may be translated back into the audio domain for delivery to a listener. The processing at each stage may be driven by a set of objectives and constraints set by the speaker, the listener, a third party, or any combination of explicit or implicit participants. The voice processing may translate the voice content from a specific human language to the same human language with various improvements. The processing may also involve translation into one or more other languages.

Type: Application

Filed: June 17, 2009

Publication date: December 23, 2010

Inventor: Miodrag Potkonjak
Automated Extraction of Semantic Content and Generation of a Structured Document from Speech

Publication number: 20100299135

Abstract: Techniques are disclosed for automatically generating structured documents based on speech, including identification of relevant concepts and their interpretation. In one embodiment, a structured document generator uses an integrated process to generate a structured textual document (such as a structured textual medical report) based on a spoken audio stream. The spoken audio stream may be recognized using a language model which includes a plurality of sub-models arranged in a hierarchical structure. Each of the sub-models may correspond to a concept that is expected to appear in the spoken audio stream. Different portions of the spoken audio stream may be recognized using different sub-models. The resulting structured textual document may have a hierarchical structure that corresponds to the hierarchical structure of the language sub-models that were used to generate the structured textual document.

Type: Application

Filed: May 22, 2009

Publication date: November 25, 2010

Inventors: Juergen Fritsch, Michael Finke, Detlef Koll, Monika Woszczyna, Girija Yegnanarayanan
IDENTIFYING KEYWORD OCCURRENCES IN AUDIO DATA

Publication number: 20100179811

Abstract: Occurrences of one or more keywords in audio data are identified using a speech recognizer employing a language model to derive a transcript of the keywords. The transcript is converted into a phoneme sequence. The phonemes of the phoneme sequence are mapped to the audio data to derive a time-aligned phoneme sequence that is searched for occurrences of keyword phoneme sequences corresponding to the phonemes of the keywords. Searching includes computing a confusion matrix. The language model used by the speech recognizer is adapted to keywords by increasing the likelihoods of the keywords in the language model. For each potential occurrences keywords detected, a corresponding subset of the audio data may be played back to an operator to confirm whether the potential occurrences correspond to actual occurrences of the keywords.

Type: Application

Filed: January 13, 2010

Publication date: July 15, 2010

Applicant: CRIM

Inventors: Vishwa Nath GUPTA, Gilles BOULIANNE
ELECTRONIC DEVICES WITH VOICE COMMAND AND CONTEXTUAL DATA PROCESSING CAPABILITIES

Publication number: 20100088100

Abstract: An electronic device may capture a voice command from a user. The electronic device may store contextual information about the state of the electronic device when the voice command is received. The electronic device may transmit the voice command and the contextual information to computing equipment such as a desktop computer or a remote server. The computing equipment may perform a speech recognition operation on the voice command and may process the contextual information. The computing equipment may respond to the voice command. The computing equipment may also transmit information to the electronic device that allows the electronic device to respond to the voice command.

Type: Application

Filed: October 2, 2008

Publication date: April 8, 2010

Inventor: Aram M. Lindahl
APPARATUS AND METHOD FOR SUPPORTING KEYWORD INPUT

Publication number: 20090300003

Abstract: A keyword input supporting apparatus includes a document acquisition unit that acquires a document having a plurality of components containing text data, a main component selection unit that selects a component having many characters in the text data as a main component, a part-of-speech analysis unit that analyzes the part-of-speech of the text data contained in the main component, and adds a semantic attribute to each of words of the text data, a specific name extraction unit that extracts as a specific name a word, having a predetermined semantic attribute or part of speech, from the words, a specific name storage that stores the specific name together with the corresponding semantic attribute, a keyword candidate classification unit that performs classification of the specific name from the storage as a keyword candidate based on the semantic attribute, and a keyword candidate presentation unit that presents the keyword candidate to a user.

Type: Application

Filed: May 27, 2009

Publication date: December 3, 2009

Inventors: Masaru Suzuki, Satoshi Kinoshita, Hideo Umeki, Wataru Nakano
METHOD AND APPARATUS OF AUTOMATED MESSAGE CONVERSION BASED ON AVAILABILITY OF BANDWIDTH

Publication number: 20090110158

Abstract: A computer implemented method, apparatus, and computer usable program product for managing a communications session. The process monitors a bandwidth of the communications device in response to detecting an exchange of audio-based messages on a communications device. In response to detecting the bandwidth below a threshold, the process converts a subsequent outgoing audio-based message into an outgoing text-based message and associates a low bandwidth indicator to the outgoing text-based message to form a distinguished text-based message. The process then transmits the distinguished text-based message for receipt by an intended recipient.

Type: Application

Filed: October 25, 2007

Publication date: April 30, 2009

Inventors: Yen-Fu Chen, Fabian F. Morgan, Keith Raymond Walker, Sarah Vijoya White Eagle
Method and apparatus for automatically completing text input using speech recognition

Publication number: 20090055174

Abstract: Provided are a method and apparatus for automatically completing a text input using speech recognition. The method includes: receiving a first part of a text from a user through a text input device; recognizing a speech of the user, which corresponds to the text; and completing a remaining part of the text based on the first part of the text and the recognized speech. Therefore, accuracy of the text input and convenience of the speech recognition can be ensured, and a non-input part of the text can be easily input based on the input part of the text and the recognized speech at a high speed.

Type: Application

Filed: December 21, 2007

Publication date: February 26, 2009

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Ick-sang Han, Joong-mi Cho, Yoon-kyung Song, Byung-kwan Kwak, Nam-hoon Kim, Ji-yeun Kim
SYSTEM AND METHOD FOR IMPROVING RECOGNITION ACCURACY IN SPEECH RECOGNITION APPLICATIONS

Publication number: 20090043587

Abstract: A speech recognition system and method are provided to correctly distinguish among multiple interpretations of an utterance. This system is particularly useful when the set of possible interpretations is large, changes dynamically, and/or contains items that are not phonetically distinctive. The speech recognition system extends the capabilities of mobile wireless communication devices that are voice operated after their initial activation.

Type: Application

Filed: October 17, 2008

Publication date: February 12, 2009

Applicant: Vocera Communications, Inc.

Inventor: Robert E. Shostak
CONTROLLING A SET-TOP BOX VIA REMOTE SPEECH RECOGNITION

Publication number: 20090030681

Abstract: A device may receive over a network a digitized speech signal from a remote control that accepts speech. In addition, the device may convert the digitized speech signal into text, use the text to obtain command information applicable to a set-top box, and send the command information to the set-top box to control presentation of multimedia content on a television in accordance with the command information.

Type: Application

Filed: July 23, 2007

Publication date: January 29, 2009

Inventors: Ashutosh K. Sureka, Sathish K. Subramanian, Sidhartha Basu, Indivar Verma
Text oriented, user-friendly editing of a voicemail message

Publication number: 20090024389

Abstract: A system in one embodiment includes a server associated with a unified messaging system (UMS). The server records speech of a user as an audio data file, translates the audio data file into a text data file, and maps each word within the text data file to a corresponding segment of audio data in the audio data file. A graphical user interface (GUI) of a message editor running on an endpoint associated with the user displays the text data file on the endpoint and allows the user to identify a portion of the text data file for replacement. The server being further operable to record new speech of the user as new audio data and to replace one or more segments of the audio data file corresponding to the portion of the text with the new audio data.

Type: Application

Filed: July 20, 2007

Publication date: January 22, 2009

Applicant: Cisco Technology, Inc.

Inventors: Joseph F. Khouri, Laurent Philonenko, Mukul Jain, Shmuel Shaffer
SPEECH RECOGNITION DICTIONARY COMPILATION ASSISTING SYSTEM, SPEECH RECOGNITION DICTIONARY COMPILATION ASSISTING METHOD AND SPEECH RECOGNITION DICTIONARY COMPILATION ASSISTING PROGRAM

Publication number: 20090024392

Abstract: A speech recognition dictionary making supporting system for efficiently making/updating a speech recognition dictionary/language model with reduced speech recognition errors by using text data available at low cost. The speech recognition dictionary making supporting system comprises a recognition dictionary storage section (105), a language model storage section (106), and a sound model storage section (107). A virtual speech recognizing section (102) creates virtual speech recognition result text data in regard to an analyzed text data created by a text analyzing section (101) with reference to a recognition dictionary, language model, and sound model, and compares the virtual speech recognition result text data with the original analyzed text data. An updating section (103) updates the recognition dictionary and language model so that the different portions in both the text data may be lessened.

Type: Application

Filed: February 2, 2007

Publication date: January 22, 2009

Applicant: NEC CORPORATION

Inventor: Takafumi Koshinaka
Method and System for Dynamic Creation of Contexts

Publication number: 20080288249

Abstract: A method and a system for a speech recognition system (1), comprising an electronic document, which is a speech based document comprising one or more sections of text recognized or transcribed from sections of speech, wherein said sections of speech are dictated by an author and processed by a speech recognizer (4) in the speech recognition system (1) into corresponding sections of text of said speech based document. The method comprises the steps of dynamically creating and adapting sub contexts by said speech recognizer and associating said sub context to said sections of text.

Type: Application

Filed: December 7, 2006

Publication date: November 20, 2008

Applicant: KONINKLIJKE PHILIPS ELECTRONICS, N.V.

Inventors: Gerhard Grobauer, Miklos Papai
Voice Based Network Management Method and Agent

Publication number: 20080275707

Abstract: A method of providing voice based device management, comprising defining a set of one or more status queries for a device, defining for each of the status queries a respective set of status responses for the device corresponding to the instantaneous status of the device, mapping the status queries to corresponding voice format status queries, and mapping the status responses to corresponding voice format status responses.

Type: Application

Filed: May 2, 2005

Publication date: November 6, 2008

Inventor: Muthukumar Suriyanarayanan
DYNAMICALLY DEFINING A VOICEXML GRAMMAR IN AN X+V PAGE OF A MULTIMODAL APPLICATION

Publication number: 20080195393

Abstract: Dynamically defining a VoiceXML grammar of a multimodal application, implemented with the multimodal application operating on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal application operatively coupled to a VoiceXML interpreter, and the method includes loading the X+V page by the multimodal application, from a web server into the multimodal device for execution, the X+V page including one or more VoiceXML grammars in one or more VoiceXML dialogs, including at least one in-line grammar that is declared but undefined; retrieving by the multimodal application a grammar definition for the in-line grammar from the web server without reloading the X+V page; and defining by the multimodal application the in-line grammar with the retrieved grammar definition before executing the VoiceXML dialog containing the in-line grammar.

Type: Application

Filed: February 12, 2007

Publication date: August 14, 2008

Inventors: Charles W. Cross, Hilary A. Pike, Lisa A. Seacat, Marc T. White
Integrated speech recognition and semantic classification

Publication number: 20080177547

Abstract: A novel system integrates speech recognition and semantic classification, so that acoustic scores in a speech recognizer that accepts spoken utterances may be taken into account when training both language models and semantic classification models. For example, a joint association score may be defined that is indicative of a correspondence of a semantic class and a word sequence for an acoustic signal. The joint association score may incorporate parameters such as weighting parameters for signal-to-class modeling of the acoustic signal, language model parameters and scores, and acoustic model parameters and scores. The parameters may be revised to raise the joint association score of a target word sequence with a target semantic class relative to the joint association score of a competitor word sequence with the target semantic class. The parameters may be designed so that the semantic classification errors in the training data are minimized.

Type: Application

Filed: January 19, 2007

Publication date: July 24, 2008

Applicant: Microsoft Corporation

Inventors: Sibel Yaman, Li Deng, Dong Yu, Ye-Yi Wang, Alejandro Acero
AUTOMATIC SPEECH RECOGNITION WITH A SELECTION LIST

Publication number: 20080162136

Abstract: Methods, apparatus, and computer program products are described for automatic speech recognition (‘ASR’) that include accepting by the multimodal application speech input and visual input for selecting or deselecting items in a selection list, the speech input enabled by a speech recognition grammar; providing, from the multimodal application to the grammar interpreter, the speech input and the speech recognition grammar; receiving, by the multimodal application from the grammar interpreter, interpretation results including matched words from the grammar that correspond to items in the selection list and a semantic interpretation token that specifies whether to select or deselect items in the selection list; and determining, by the multimodal application in dependence upon the value of the semantic interpretation token, whether to select or deselect items in the selection list that correspond to the matched words.

Type: Application

Filed: January 3, 2007

Publication date: July 3, 2008

Inventors: Ciprian Agapi, Soonthorn Ativanichayaphong, Charles W. Cross, Gerald M. McCobb
ADDING REAL-TIME DICTATION CAPABILITIES FOR SPEECH PROCESSING OPERATIONS HANDLED BY A NETWORKED SPEECH PROCESSING SYSTEM

Publication number: 20080154593

Abstract: The present invention can include a speech processing method for providing dictation capabilities to a voice server. The method can include a step of establishing a real-time voice communication session involving a voice interface. Speech for the communication session can be streamed to a remotely located voice server. A real-time stream of text can be received from the voice server. The stream of text can include text that has been speech-to-text converted by the voice server from the streamed speech. The voice server can use a MRCP based non-halting interface to receive the real-time stream of speech and a delivery interface to deliver real-time text to a designated endpoint.

Type: Application

Filed: December 22, 2006

Publication date: June 26, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: WILLIAM V. DA PALMA, BRIEN H. MUSCHETT, WENDI L. NUSBICKEL, RONALD D. SWAN
SWITCHING BETWEEN MODALITIES IN A SPEECH APPLICATION ENVIRONMENT EXTENDED FOR INTERACTIVE TEXT EXCHANGES

Publication number: 20080147406

Abstract: The present solution includes a method for dynamically switching modalities in a dialogue session involving a voice server. In the method, a dialogue session can be established between a user and a speech application. During the dialogue session, the user can interact using an original modality, which is either a speech modality, a text exchange modality, or a multi mode modality that includes a text exchange modality. The speech application can interact using a speech modality. A modality switch trigger can be detected that changes the original modality to a different modality. The modality transition to the second modality can be transparent to the speech application. The speech application can be a standard VoiceXML based speech application that lacks an inherent text exchange capability.

Type: Application

Filed: December 19, 2006

Publication date: June 19, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: WILLIAM V. DA PALMA, BAIJU D. MANDALIA, VICTOR S. MOORE, WENDI L. NUSBICKEL
USING AN AUTOMATED SPEECH APPLICATION ENVIRONMENT TO AUTOMATICALLY PROVIDE TEXT EXCHANGE SERVICES

Publication number: 20080147395

Abstract: The present solution includes an automated response method. The method can receive user interactions entered through a real-time text exchange interface. These user interactions with the speech application can be dynamically and automatically converted as necessary into a format consumable by a voice server. A text input API of a voice server can be used to allow the voice server to directly accept text input. Further, automated interactions can be received from the voice server, which are dynamically and automatically converted into a format accepted by the text exchange interface. The text exchange interface can be an off-the-shelf unmodified interface. The speech application can be a VoiceXML based application that lacks an inherent text exchange capability.

Type: Application

Filed: December 19, 2006

Publication date: June 19, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: WILLIAM V. DA PALMA, BAIJU D. MANDALIA, VICTOR S. MOORE, WENDI L. NUSBICKEL
INFERRING SWITCHING CONDITIONS FOR SWITCHING BETWEEN MODALITIES IN A SPEECH APPLICATION ENVIRONMENT EXTENDED FOR INTERACTIVE TEXT EXCHANGES

Publication number: 20080147407

Abstract: The disclosed solution includes a method for dynamically switching modalities based upon inferred conditions in a dialogue session involving a speech application. The method establishes a dialogue session between a user and the speech application. During the dialogue session, the user interacts using an original modality and a second modality. The speech application interacts using a speech modality only. A set of conditions indicative of interaction problems using the original modality can be inferred. Responsive to the inferring step, the original modality can be changed to the second modality. A modality transition to the second modality can be transparent the speech application and can occur without interrupting the dialogue session. The original modality and the second modality can be different modalities; one including a text exchange modality and another including a speech modality.

Type: Application

Filed: December 19, 2006

Publication date: June 19, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: WILLIAM V. DA PALMA, BAIJU D. MANDALIA, VICTOR S. MOORE, WENDI L. NUSBICKEL
Dialogue flow interpreter development tool

Patent number: 7389213

Abstract: A computer software product is used to create applications for enabling a dialogue between a human and a computer. The software product provides a programming tool that insulates software developers from time-consuming, technically-challenging programming tasks by enabling the developer to specify generalized instructions to a Dialogue Flow Interpreter, which invokes functions to implement a speech application, automatically populating a library with dialogue objects that are available to other applications. The speech applications created through the DFI may be implemented as COM (component object model) objects, and so the applications can be easily integrated into a variety of different platforms. In addition, “translator” object classes are provided to handle specific types of data, such as currency, numeric data, dates, times, string variables, etc. These translator object classes have utility either as part of the DFI library or as a sub-library separate from dialogue implementation.

Type: Grant

Filed: January 3, 2006

Date of Patent: June 17, 2008

Assignee: Unisys Corporation

Inventors: Karl Wilmer Scholz, James S. Irwin, Samir Tamri
METHOD AND DEVICE TO DISTINGUISH BETWEEN VOICE CONVERSATION AND AUTOMATED SPEECH RECOGNITION

Publication number: 20080140418

Abstract: A method and device for performing some preprocessing on voice transmissions depending upon the intended destination of the transmission. The device includes a receiving component configured to receive a voice signal from a source over a network. The device also includes a processing component configured to determine a destination address associated with the received signal, determine a signal processing algorithm from a plurality of signal processing algorithms based on the determined address, and process the voice signal according to the specified algorithm. The device further includes a delivery component configured to send the processed signal to the associated address.

Type: Application

Filed: October 30, 2007

Publication date: June 12, 2008

Inventor: Gilad Odinak
METHOD AND SYSTEM FOR THE GENERATION OF A VOICE EXTENSIBLE MARKUP LANGUAGE APPLICATION FOR A VOICE INTERFACE PROCESS

Publication number: 20080134020

Abstract: A method and system for Extensible Markup Language (XML) application transformation. Specifically, in one embodiment, a method is disclosed for the generation of markup language applications (e.g., a VXML application) for a voice interface process. First, a call flow diagram is converted into a in an XML format. The call flow diagram describes the voice interface process. Next, a lookup table of entries in XML is created by mapping a plurality of audio files and their corresponding textual representations with audio states in the. Then, an intermediate application is created in the XML format from the by merging corresponding entries in the lookup table with the audio states. Finally, the intermediate application is transformed into a second application of a second markup language format that is a static representation of the call flow diagram.

Type: Application

Filed: October 23, 2007

Publication date: June 5, 2008

Inventor: Ramy M. Adeeb
VOICE CONFIRMATION AUTHENTICATION FOR DOMAIN NAME TRANSACTIONS

Publication number: 20080126097

Abstract: A method with an information processing system manages a user transaction request associated with a domain name based on an authentication status of a user. The method comprises receiving a request from a user for a transaction associated with a domain name. A voice confirmation user interface is presented to the user and user authentication information, such as spoken user authentication information from the user, is collected. The user authentication information is communicated to an information processing system. An authentication status associated with the user is received from the information processing system. The method also includes determining to allow the request from the user for the transaction associated with the domain name based at least in part on the received authentication status.

Type: Application

Filed: November 27, 2006

Publication date: May 29, 2008

Applicant: Ashantiplc Limited

Inventors: Sahar Sarid, Kishore Bhavnanie
Voice-operated services

Patent number: RE42868

Abstract: A method and apparatus accesses a database where entries are linked to at least two sets of patterns. One or more patterns of a first set of patterns are recognized within a received signal. The recognized patterns are used to identify entries and compile a list of patterns in a second set of patterns to which those entries are also linked. The list is then used to recognize a second received signal. The received signals may, for example, be voice signals or signals indicating the origin or destination of the received signals.

Type: Grant

Filed: October 25, 1995

Date of Patent: October 25, 2011

Assignee: Cisco Technology, Inc.

Inventors: David J. Attwater, Steven J. Whittaker, Francis J. Scahill, Alison D. Simons

prev 1 2 3 next