Patents by Inventor Soonthorn Ativanichayaphong

Soonthorn Ativanichayaphong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20120065982
    Abstract: Dynamically generating a vocal help prompt in a multimodal application that include detecting a help-triggering event for an input element of a VoiceXML dialog, where the detecting is implemented with a multimodal application operating on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal application is operatively coupled to a VoiceXML interpreter, and the multimodal application has no static help text. Dynamically generating a vocal help prompt in a multimodal application according to embodiments of the present invention typically also includes retrieving, by the VoiceXML interpreter from a source of help text, help text for an element of a speech recognition grammar, forming by the VoiceXML interpreter the help text into a vocal help prompt, and presenting by the multimodal application the vocal help prompt through a computer user interface to a user.
    Type: Application
    Filed: November 23, 2011
    Publication date: March 15, 2012
    Applicant: Nuance Communications, Inc.
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, JR., David Jaramillo, Yan Li
  • Patent number: 8086463
    Abstract: Dynamically generating a vocal help prompt in a multimodal application that include detecting a help-triggering event for an input element of a VoiceXML dialog, where the detecting is implemented with a multimodal application operating on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal application is operatively coupled to a VoiceXML interpreter, and the multimodal application has no static help text. Dynamically generating a vocal help prompt in a multimodal application according to embodiments of the present invention typically also includes retrieving, by the VoiceXML interpreter from a source of help text, help text for an element of a speech recognition grammar, forming by the VoiceXML interpreter the help text into a vocal help prompt, and presenting by the multimodal application the vocal help prompt through a computer user interface to a user.
    Type: Grant
    Filed: September 12, 2006
    Date of Patent: December 27, 2011
    Assignees: Nuance Communications, Inc., International Business Machines Corporation
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, Jr., David Jaramillo, Yan Li
  • Patent number: 8073692
    Abstract: Enabling grammars in web page frames, including receiving, in a multimodal application on a multimodal device, a frameset document, where the frameset document includes markup defining web page frames; obtaining by the multimodal application content documents for display in each of the web page frames, where the content documents include navigable markup elements; generating by the multimodal application, for each navigable markup element in each content document, a segment of markup defining a speech recognition grammar, including inserting in each such grammar markup identifying content to be displayed when words in the grammar are matched and markup identifying a frame where the content is to be displayed; and enabling by the multimodal application all the generated grammars for speech recognition.
    Type: Grant
    Filed: November 2, 2010
    Date of Patent: December 6, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, Jr., Gerald M. McCobb
  • Patent number: 8073698
    Abstract: Methods, apparatus, and computer program products are described for enabling global grammars for a particular multimodal application according to the present invention by loading a multimodal web page; determining whether the loaded multimodal web page is one of a plurality of multimodal web pages of the particular multimodal application. If the loaded multimodal web page is one of the plurality of multimodal web pages of the particular multimodal application, enabling global grammars typically includes loading any currently unloaded global grammars of the particular multimodal application identified in the multimodal web page and maintaining any previously loaded global grammars. If the loaded multimodal web page is not one of the plurality of multimodal web pages of the particular multimodal application, enabling global grammars typically includes unloading any currently loaded global grammars.
    Type: Grant
    Filed: August 31, 2010
    Date of Patent: December 6, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, Jr., Gerald M. McCobb
  • Patent number: 7966188
    Abstract: A method for enhancing voice interactions within a portable multimodal computing device using visual messages. A multimodal interface can be provided that includes an audio interface and a visual interface. A speech input can then be received and a voice recognition task can be performed upon at least a portion of the speech input. At least one message within the multimodal interface can be visually presented, wherein the message is a prompt for the speech input and/or a confirmation of the speech input.
    Type: Grant
    Filed: May 20, 2003
    Date of Patent: June 21, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Soonthorn Ativanichayaphong, David Jaramillo, Gerald McCobb, Leslie R. Wilson
  • Patent number: 7962343
    Abstract: A method (200) of building a grammar with baseforms generated dynamically from user utterances can include the steps of recording (205) a user utterance, generating (210) a baseform using the user utterance, creating or adding to (215) a grammar rule using the baseform, and binding (230) the grammar rule in a grammar document of a voice extensible markup language program. Generating a baseform can optionally include introducing a new element to VoiceXML with attributes that enable generating the baseform from a referenced recording such as the user utterance. In one embodiment, the method can be used to create (235) a phonebook and a grammar to access the phonebook by repeatedly visiting a form containing the grammar rule with attributes that enable generating the baseform from the referenced recording.
    Type: Grant
    Filed: November 21, 2008
    Date of Patent: June 14, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, Jr., Brien H. Muschett
  • Patent number: 7953597
    Abstract: A computer-implemented method and system are provided for filling a graphic-based form field in response to a speech utterance. The computer-implemented method includes generating a grammar corresponding to the form field, the grammar being based on a user profile and comprising a semantic interpretation string. The method further includes creating an auto-fill event based upon the at least one grammar and responsive to the speech utterance, the auto-fill event causing the filling of the form field with data corresponding to the user profile. The system includes a grammar-generating module for generating a grammar corresponding to the form field, the grammar being based on a user profile and comprising a semantic interpretation string. The system also includes an event module for creating an auto-fill event based upon the at least one grammar and responsive to the speech utterance, the event causing the filling of the form field with data corresponding to the user profile.
    Type: Grant
    Filed: August 9, 2005
    Date of Patent: May 31, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, Jr., Gerald M. McCobb
  • Publication number: 20110047452
    Abstract: Enabling grammars in web page frames, including receiving, in a multimodal application on a multimodal device, a frameset document, where the frameset document includes markup defining web page frames; obtaining by the multimodal application content documents for display in each of the web page frames, where the content documents include navigable markup elements; generating by the multimodal application, for each navigable markup element in each content document, a segment of markup defining a speech recognition grammar, including inserting in each such grammar markup identifying content to be displayed when words in the grammar are matched and markup identifying a frame where the content is to be displayed; and enabling by the multimodal application all the generated grammars for speech recognition.
    Type: Application
    Filed: November 2, 2010
    Publication date: February 24, 2011
    Applicant: Nuance Communications, Inc.
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, JR., Gerald M. McCobb
  • Patent number: 7870000
    Abstract: The present disclosure relates to prompting for a spoken response that provides input for multiple elements. A single spoken utterance including content for multiple elements can be received, where each element is mapped to a data field. The spoken utterance can be speech-to-text converted to derive values for each of the multiple elements. An utterance level confidence score can be determined, which can fall below an associated certainty threshold. Element-level confidence scores for each of the derived elements can then be ascertained. A first set of the multiple elements can have element-level confidence scores above an associated certainty threshold and a second set can have scores below. Values can be stored in data fields mapped to the first set. A prompt for input for the second set can be played.
    Type: Grant
    Filed: March 28, 2007
    Date of Patent: January 11, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Soonthorn Ativanichayaphong, Gerald M. McCobb, Paritosh D. Patel, Marc White
  • Publication number: 20100324889
    Abstract: Methods, apparatus, and computer program products are described for enabling global grammars for a particular multimodal application according to the present invention by loading a multimodal web page; determining whether the loaded multimodal web page is one of a plurality of multimodal web pages of the particular multimodal application. If the loaded multimodal web page is one of the plurality of multimodal web pages of the particular multimodal application, enabling global grammars typically includes loading any currently unloaded global grammars of the particular multimodal application identified in the multimodal web page and maintaining any previously loaded global grammars. If the loaded multimodal web page is not one of the plurality of multimodal web pages of the particular multimodal application, enabling global grammars typically includes unloading any currently loaded global grammars.
    Type: Application
    Filed: August 31, 2010
    Publication date: December 23, 2010
    Applicant: Nuance Communications, Inc.
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, JR., Gerald M. McCobb
  • Patent number: 7840409
    Abstract: Ordering recognition results produced by an automatic speech recognition (‘ASR’) engine for a multimodal application implemented with a grammar of the multimodal application in the ASR engine, with the multimodal application operating in a multimodal browser on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal application operatively coupled to the ASR engine through a VoiceXML interpreter, includes: receiving, in the VoiceXML interpreter from the multimodal application, a voice utterance; determining, by the VoiceXML interpreter using the ASR engine, a plurality of recognition results in dependence upon the voice utterance and the grammar; determining, by the VoiceXML interpreter according to semantic interpretation scripts of the grammar, a weight for each recognition result; and sorting, by the VoiceXML interpreter, the plurality of recognition results in dependence upon the weight for each recognition result.
    Type: Grant
    Filed: February 27, 2007
    Date of Patent: November 23, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, Jr., Igor R. Jablokov, Gerald McCobb
  • Patent number: 7827033
    Abstract: Enabling grammars in web page frames, including receiving, in a multimodal application on a multimodal device, a frameset document, where the frameset document includes markup defining web page frames; obtaining by the multimodal application content documents for display in each of the web page frames, where the content documents include navigable markup elements; generating by the multimodal application, for each navigable markup element in each content document, a segment of markup defining a speech recognition grammar, including inserting in each such grammar markup identifying content to be displayed when words in the grammar are matched and markup identifying a frame where the content is to be displayed; and enabling by the multimodal application all the generated grammars for speech recognition.
    Type: Grant
    Filed: December 6, 2006
    Date of Patent: November 2, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, Jr., Gerald M. McCobb
  • Patent number: 7809575
    Abstract: Methods, apparatus, and computer program products are described for enabling global grammars for a particular multimodal application according to the present invention by loading a multimodal web page; determining whether the loaded multimodal web page is one of a plurality of multimodal web pages of the particular multimodal application. If the loaded multimodal web page is one of the plurality of multimodal web pages of the particular multimodal application, enabling global grammars typically includes loading any currently unloaded global grammars of the particular multimodal application identified in the multimodal web page and maintaining any previously loaded global grammars. If the loaded multimodal web page is not one of the plurality of multimodal web pages of the particular multimodal application, enabling global grammars typically includes unloading any currently loaded global grammars.
    Type: Grant
    Filed: February 27, 2007
    Date of Patent: October 5, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, Jr., Gerald M. McCobb
  • Patent number: 7739117
    Abstract: A computer-implemented method and system are provided for filling a graphic-based form field in response to a speech utterance. The computer-implemented method includes generating a grammar corresponding to the form field, the grammar being based on a user profile and comprising a semantic interpretation string. The method further includes creating an auto-fill event based upon the at least one grammar and responsive to the speech utterance, the auto-fill event causing the filling of the form field with data corresponding to the user profile. The system includes a grammar-generating module for generating a grammar corresponding to the form field, the grammar being based on a user profile and comprising a semantic interpretation string. The system also includes an event module for creating an auto-fill event based upon the at least one grammar and responsive to the speech utterance, the event causing the filling of the form field with data corresponding to the user profile.
    Type: Grant
    Filed: September 20, 2004
    Date of Patent: June 15, 2010
    Assignees: Nuance Communications, Inc.
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, Jr., Gerald M. McCobb
  • Publication number: 20090076818
    Abstract: A method (200) of building a grammar with baseforms generated dynamically from user utterances can include the steps of recording (205) a user utterance, generating (210) a baseform using the user utterance, creating or adding to (215) a grammar rule using the baseform, and binding (230) the grammar rule in a grammar document of a voice extensible markup language program. Generating a baseform can optionally include introducing a new element to VoiceXML with attributes that enable generating the baseform from a referenced recording such as the user utterance. In one embodiment, the method can be used to create (235) a phonebook and a grammar to access the phonebook by repeatedly visiting a form containing the grammar rule with attributes that enable generating the baseform from the referenced recording.
    Type: Application
    Filed: November 21, 2008
    Publication date: March 19, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, JR., Brien H. Muschett
  • Patent number: 7487085
    Abstract: A method (200) of building a grammar with baseforms generated dynamically from user utterances can include the steps of recording (205) a user utterance, generating (210) a baseform using the user utterance, creating or adding to (215) a grammar rule using the baseform, and binding (230) the grammar rule in a grammar document of a voice extensible markup language program. Generating a baseform can optionally include introducing a new element to VoiceXML with attributes that enable generating the baseform from a referenced recording such as the user utterance. In one embodiment, the method can be used to create (235) a phonebook and a grammar to access the phonebook by repeatedly visiting a form containing the grammar rule with attributes that enable generating the baseform from the referenced recording.
    Type: Grant
    Filed: August 24, 2004
    Date of Patent: February 3, 2009
    Assignee: International Business Machines Corporation
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, Jr., Brien H. Muschett
  • Publication number: 20080255851
    Abstract: Speech-enabled content navigation and control of a distributed multimodal browser is disclosed, the browser providing an execution environment for a multimodal application, the browser including a graphical user agent (‘GUA’) and a voice user agent (‘VUA’), the GUA operating on a multimodal device, the VUA operating on a voice server, that includes: transmitting, by the GUA, a link message to the VUA, the link message specifying voice commands that control the browser and an event corresponding to each voice command; receiving, by the GUA, a voice utterance from a user, the voice utterance specifying a particular voice command; transmitting, by the GUA, the voice utterance to the VUA for speech recognition by the VUA; receiving, by the GUA, an event message from the VUA, the event message specifying a particular event corresponding to the particular voice command; and controlling, by the GUA, the browser in dependence upon the particular event.
    Type: Application
    Filed: April 12, 2007
    Publication date: October 16, 2008
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, Gerald M. McCobb
  • Publication number: 20080249782
    Abstract: Web service support for a multimodal client processing a multimodal application, the multimodal client providing an execution environment for the application and operating on a multimodal device supporting multiple modes of user interaction including a voice mode and one or more non-voice modes, the application stored on an application server, includes: receiving, by the server, an application request from the client that specifies the application and device characteristics; determining, by a multimodal adapter of the server, modality requirements for the application; selecting, by the adapter, a modality web service in dependence upon the modality requirements and the characteristics for the device; determining, by the adapter, whether the device supports VoIP in dependence upon the characteristics; providing, by the server, the application to the client; and providing, by the adapter to the client in dependence upon whether the device supports VoIP, access to the modality web service for processing the appl
    Type: Application
    Filed: April 4, 2007
    Publication date: October 9, 2008
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, Gerald M. McCobb
  • Publication number: 20080243502
    Abstract: The invention discloses prompting for a spoken response that provides input for multiple elements. A single spoken utterance including content for multiple elements can be received, where each element is mapped to a data field. The spoken utterance can be speech-to-text converted to derive values for each of the multiple elements. An utterance level confidence score can be determined, which can fall below an associated certainty threshold. Element-level confidence scores for each of the derived elements can then be ascertained. A first set of the multiple elements can have element-level confidence scores above an associated certainty threshold and a second set can have scores below. Values can be stored in data fields mapped to the first set. A prompt for input for the second set can be played. Accordingly, data fields are partially filled in based upon the original speech utterance, where a second prompt for unfilled fields is played.
    Type: Application
    Filed: March 28, 2007
    Publication date: October 2, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: SOONTHORN ATIVANICHAYAPHONG, Gerald M. McCobb, PARITOSH D. PATEL, MARC WHITE
  • Publication number: 20080208586
    Abstract: Enabling natural language understanding using an X+V page of a multimodal application implemented with a statistical language model (‘SLM’) grammar of the multimodal application in an automatic speech recognition (‘ASR’) engine, with the multimodal application operating in a multimodal browser on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the multimodal application operatively coupled to the ASR engine through a VoiceXML interpreter, including: receiving, in the ASR engine from the multimodal application, a voice utterance; generating, by the ASR engine according to the SLM grammar, at least one recognition result for the voice utterance; determining, by an action classifier for the VoiceXML interpreter, an action identifier in dependence upon the recognition result, the action identifier specifying an action to be performed by the multimodal application; and interpreting, by the VoiceXML interpreter, the multimodal application in depen
    Type: Application
    Filed: February 27, 2007
    Publication date: August 28, 2008
    Inventors: Soonthorn Ativanichayaphong, Charles W. Cross, Gerald M. McCobb