Speech To Text Systems (epo) Patents (Class 704/E15.043)
  • Publication number: 20120035907
    Abstract: A method, performed on a server, of translating between languages includes receiving first audio data for a first language from a mobile device, translating the first audio data to second audio data for a second language, receiving an indication that the mobile device has moved between two locations, and sending the second audio data to the mobile device in response to the indication.
    Type: Application
    Filed: August 5, 2010
    Publication date: February 9, 2012
    Inventors: Michael J. Lebeau, John Nicholas Jitkoff
  • Publication number: 20120035923
    Abstract: The disclosed invention provides a system and apparatus for providing a telematics system user with an improved texting experience. A messaging experience engine database enables voice avatar/personality selection, acronym conversion, shorthand conversion, and custom audio and video mapping. As an interpreter of the messaging content that is passed through the telematics system, the system eliminates the need for a user to manually manipulate a texting device, or to read such a device. The system recognizes functional content and executes actions based on the identified functional content.
    Type: Application
    Filed: August 9, 2010
    Publication date: February 9, 2012
    Applicant: General Motors LLC
    Inventor: Kevin R. Krause
  • Publication number: 20120029918
    Abstract: Systems for recording, searching for, and sharing media files among a plurality of users are disclosed. The systems include a server that is configured to receive, index, and store a plurality of media files, which are received by the server from a plurality of sources, within at least one database in communication with the server. In addition, the server is configured to make one or more of the media files accessible to one or more persons—other than the original sources of such media files. Still further, the server is configured to transcribe the media files into text; receive and publish comments associated with the media files within a graphical user interface of a website; and allow users to query and playback excerpted portions of such media files.
    Type: Application
    Filed: October 11, 2011
    Publication date: February 2, 2012
    Inventor: Walter Bachtiger
  • Publication number: 20120029917
    Abstract: A system that incorporates teachings of the present disclosure may include, for example, a server including a controller to receive audio signals and content identification information from a media processor, generate text representing a voice message based on the audio signals, determine an identity of media content based on the content identification information, generate an enhanced message having text and additional content where the additional content is obtained by the controller based on the identity of the media content, and transmit the enhanced message to the media processor for presentation on the display device, where the enhanced message is accessible by one or more communication devices that are associated with a social network and remote from the media processor. Other embodiments are disclosed.
    Type: Application
    Filed: August 2, 2010
    Publication date: February 2, 2012
    Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventors: HISAO CHANG, BERNARD S. RENGER
  • Publication number: 20120022868
    Abstract: The subject matter of this specification can be implemented in, among other things, a computer-implemented method for correcting words in transcribed text including receiving speech audio data from a microphone. The method further includes sending the speech audio data to a transcription system. The method further includes receiving a word lattice transcribed from the speech audio data by the transcription system. The method further includes presenting one or more transcribed words from the word lattice. The method further includes receiving a user selection of at least one of the presented transcribed words. The method further includes presenting one or more alternate words from the word lattice for the selected transcribed word. The method further includes receiving a user selection of at least one of the alternate words. The method further includes replacing the selected transcribed word in the presented transcribed words with the selected alternate word.
    Type: Application
    Filed: September 30, 2011
    Publication date: January 26, 2012
    Inventors: Michael J. LeBeau, William J. Byrne, John Nicholas Jitkoff, Brandon M. Ballinger, Trausti Kristjansson
  • Publication number: 20120022865
    Abstract: A system and method for efficiently reducing transcription error using hybrid voice transcription is provided. A voice stream is parsed from a call into utterances. An initial transcribed value and corresponding recognition score are assigned to each utterance. A transcribed message is generated for the call and includes the initial transcribed values. A threshold is applied to the recognition scores to identify those utterances with recognition scores below the threshold as questionable utterances. At least one questionable utterance is compared to other questionable utterances from other calls and a group of similar questionable utterances is formed. One or more of the similar questionable utterances is selected from the group. A common manual transcription value is received for the selected similar questionable utterances. The common manual transcription value is assigned to the remaining similar questionable utterances in the group.
    Type: Application
    Filed: July 20, 2010
    Publication date: January 26, 2012
    Inventor: David Milstein
  • Publication number: 20120022867
    Abstract: Methods, computer program products and systems are described for speech-to-text conversion. A voice input is received from a user of an electronic device and contextual metadata is received that describes a context of the electronic device at a time when the voice input is received. Multiple base language models are identified, where each base language model corresponds to a distinct textual corpus of content. Using the contextual metadata, an interpolated language model is generated based on contributions from the base language models. The contributions are weighted according to a weighting for each of the base language models. The interpolated language model is used to convert the received voice input to a textual output. The voice input is received at a computer server system that is remote to the electronic device. The textual output is transmitted to the electronic device.
    Type: Application
    Filed: September 29, 2011
    Publication date: January 26, 2012
    Inventors: Brandon M. Ballinger, Johan Schalkwyk, Michael H. Cohen, Cyril Georges Luc Allauzen, Michael D. Riley
  • Publication number: 20120022853
    Abstract: A computer-implemented input-method editor process includes receiving a request from a user for an application-independent input method editor having written and spoken input capabilities, identifying that the user is about to provide spoken input to the application-independent input method editor, and receiving a spoken input from the user. The spoken input corresponds to input to an application and is converted to text that represents the spoken input. The text is provided as input to the application.
    Type: Application
    Filed: September 29, 2011
    Publication date: January 26, 2012
    Inventors: Brandon M. Ballinger, Johan Schalkwyk, Michael H. Cohen, William J. Byrne, Gudmundur Hafsteinsson, Michael J. LeBeau
  • Publication number: 20120022866
    Abstract: Methods, computer program products and systems are described for converting speech to text. Sound information is received at a computer server system from an electronic device, where the sound information is from a user of the electronic device. A context identifier indicates a context within which the user provided the sound information. The context identifier is used to select, from among multiple language models, a language model appropriate for the context. Speech in the sound information is converted to text using the selected language model. The text is provided for use by the electronic device.
    Type: Application
    Filed: September 29, 2011
    Publication date: January 26, 2012
    Inventors: Brandon M. Ballinger, Johan Schalkwyk, Michael H. Cohen, Cyril Georges Luc Allauzen
  • Publication number: 20120016671
    Abstract: A system and methods for transcribing text from audio and video files including a set of transcription hosts and an automatic speech recognition system. ASR word-lattices are dynamically selected from either a text box or word-lattice graph wherein the most probable text sequences are presented to the transcriptionist. Secure transcriptions may be accomplished by segmenting a digital audio file into a set of audio slices for transcription by a plurality of transcriptionist. No one transcriptionist is aware of the final transcribed text, only small portions of transcribed text. Secure and high quality transcriptions may be accomplished by segmenting a digital audio file into a set of audio slices, sending them serially to a set of transcriptionists and updating the acoustic and language models at each step to improve the word-lattice accuracy.
    Type: Application
    Filed: July 15, 2010
    Publication date: January 19, 2012
    Inventors: Pawan Jaggi, Abhijeet Sangwan
  • Publication number: 20120010876
    Abstract: A voice integration platform and method provide for integration of a voice interface with a data system that includes stored data. The voice integration platform comprises one or more generic software components, the generic software components being configured to enable development of a specific voice user interface that is designed to interact with the data system in order to present the stored data to a user.
    Type: Application
    Filed: September 22, 2011
    Publication date: January 12, 2012
    Applicant: Ben Franklin Patent Holding LLC
    Inventors: Andrew G. Smolenski, Steven Markman, Pericles Haleftiras, Jon Thomas Layton, Lizanne Kaiser, Gregory S. Kluthe, Michael W. Achenbach
  • Publication number: 20120010883
    Abstract: A computer program product, for performing data determination from medical record transcriptions, resides on a computer-readable medium and includes computer-readable instructions for causing a computer to obtain a medical transcription of a dictation, the dictation being from medical personnel and concerning a patient, analyze the transcription for an indicating phrase associated with a type of data desired to be determined from the transcription, the type of desired data being relevant to medical records, determine whether data indicated by text disposed proximately to the indicating phrase is of the desired type, and store an indication of the data if the data is of the desired type.
    Type: Application
    Filed: February 8, 2011
    Publication date: January 12, 2012
    Applicant: eScription, Inc.
    Inventors: Roger S. Zimmerman, Paul Egerman, George Zavaliagkos
  • Publication number: 20120004910
    Abstract: Systems and method for processing speech from a user is disclosed. In the system of the present invention, the user's speech is received as input audio stream. The input audio stream is converted text that corresponds to the input audio stream. The converted text is converted to an echo audio stream. Then, the echo audio stream is sent to the user. This process is performed in real time. Accordingly, the user is able to determine whether or not the speech to text process was correct, or that his or her speech was corrected converted to text. If the conversion was incorrect, the user is able to correct the conversion process by using editing commands. The corresponding text is then analyzed to determine the operation which it demands. Then, the operation is performed on the corresponding text.
    Type: Application
    Filed: November 24, 2009
    Publication date: January 5, 2012
    Inventors: Romulo De Guzman Quidilig, Kenneth Nakagawa, Michiyo Manning
  • Publication number: 20120004911
    Abstract: A system for identification of video content in a video signal is provided via a sound track audio signal. The audio signal is processed with filtering and non linear transformations to extract voice signals from the sound track channel. The extracted voice signals are coupled to a speech recognition system to provide in text form, the words of the video content, which is later compared with a reference library of words or dialog from known video programs or movies. Other attributes of the video signal or transport stream may be combined with closed caption data or closed caption text for identification purposes. Example attributes include DVS/SAP information, time code information, histograms, and or rendered video or pictures.
    Type: Application
    Filed: June 30, 2010
    Publication date: January 5, 2012
    Inventor: Ronald Quan
  • Publication number: 20120005701
    Abstract: A system for identification of video content in a video signal is provided via a sound track audio signal. The audio signal is processed with filtering, frequency translation, and or non linear transformations to extract voice signals from the sound track channel. The extracted voice signals are coupled to a speech recognition system to provide in text form, the words of the video content, which is later compared with a reference library of words or dialog from known video programs or movies. Other attributes of the video signal or transport stream may be combined with closed caption data or closed caption text for identification purposes. Example attributes include DVS/SAP information, time code information, histograms, and or rendered video or pictures.
    Type: Application
    Filed: June 30, 2010
    Publication date: January 5, 2012
    Inventor: Ronald Quan
  • Publication number: 20110321008
    Abstract: GUI form code comprising a set of GUI elements can be imported. A user interface description can be generated from the GUI form code that has an element corresponding to each GUI element. For each user interface element converted from a corresponding to one of the GUI elements, a user interface element type can be determined as can temporal associations between the user interface elements. A Conversation User Interface (CUI) code corresponding to the GUI form code can be created from the user interface description. When creating the CUI code for each of the user interface elements, different and rules to convert the user interface element into CUI code can be used depending on a user interface element type of the user interface element being converted. When creating the CUI code, the user interface elements can be temporally ordered based on the pre-determined spatio-temporal associations between the graphical user interface (GUI) elements.
    Type: Application
    Filed: June 28, 2010
    Publication date: December 29, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: ALBEE JHONEY, PRAVEEN K. VAIDYANATHAN
  • Publication number: 20110320199
    Abstract: According to one embodiment, an apparatus for fusing voiced phoneme units in Text-To-Speech, includes a reference unit selection module configured to select a reference unit from the plurality of units based on pitch cycle information of the each unit and the number of pitch cycles of the target segment. The apparatus includes a template creation module configured to create a template based on the reference unit selected by the reference unit selection module and the number of pitch cycles of the target segment, wherein the number of pitch cycles of the template is same with that of pitch cycles of the target segment. The apparatus includes a pitch cycle alignment module configured to align pitch cycles of each unit of the plurality of units except the reference unit with pitch cycles of the template by using a dynamic programming algorithm.
    Type: Application
    Filed: July 15, 2011
    Publication date: December 29, 2011
    Inventors: Jian Luan, Jian Li
  • Publication number: 20110320198
    Abstract: One or more embodiments present a script to a user in an interactive script environment. A digital representation of a manuscript is analyzed. This digital representation includes a set of roles and a set of information associated with each role in the set of roles. An active role in the set of roles that is associated with a given user is identified based on the analyzing. At least a portion of the manuscript is presented to the given user via a user interface. The portion includes at least a subset of information in the set of information. Information within the set of information that is associated with the active role is presented in a visually different manner than information within the set of information that is associated with a non-active role, which is a role that is associated with a user other than the given user.
    Type: Application
    Filed: June 27, 2011
    Publication date: December 29, 2011
    Inventor: Randall Lee THREEWITS
  • Publication number: 20110313764
    Abstract: A system and method is provided for reducing latency for automatic speech recognition. In one embodiment, intermediate results produced by multiple search passes are used to update a display of transcribed text.
    Type: Application
    Filed: August 27, 2011
    Publication date: December 22, 2011
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Michiel Adriaan Unico Bacchiani, Brian Scott Amento
  • Publication number: 20110305326
    Abstract: This invention involves additional details and uses for the invention described in U.S. Pat. No. 7,047,192 Simultaneous Multi-User Real-Time Speech Recognition System file by Poirier. The patent granted to Poirier teaches a platform based on audio events on which can be built larger applications to solve problems of capturing and transcribing human conversations. U.S. Pat. No. 7,047,192 also explains and teaches that indexing, cataloging, editing, and searching audio is possible using a browser to find specific content within text which is then directly linked with the relative audio event. More specifically it describes how this patent can be used as a building block approach to provide functionality for real-time automatic speech recognition systems that a scalable from a single user to hundreds and potentially thousands of users having conversations.
    Type: Application
    Filed: March 20, 2011
    Publication date: December 15, 2011
    Inventors: JAMEY POIRIER, MARK HANEGRAAFF, DARRELL POIRIER
  • Publication number: 20110307254
    Abstract: A system and method of speech recognition involving a mobile device. Speech input is received (202) on a mobile device (102) and converted (204) to a set of phonetic symbols. Data relating to the phonetic symbols is transferred (206) from the mobile device over a communications network (104) to a remote processing device (106) where it is used (208) to identify at least one matching data item from a set of data items (114). Data relating to the at least one matching data item is transferred (210) from the remote processing device to the mobile device and presented (214) thereon.
    Type: Application
    Filed: December 10, 2009
    Publication date: December 15, 2011
    Inventors: Melvyn Hunt, John Bridle
  • Publication number: 20110300833
    Abstract: A visual voicemail system can convert visual voicemail message content to an alternate format based on the location of the recipient device, whether and how the recipient device is in motion, a priority of the message content, user preferences, or other criteria. Alternately, a recipient wireless device may also convert content to an alternate format based on the similar criteria. Content may be presented automatically to a user on recipient device based on such criteria. Content may be converted from audio to text, text to audio, or from any format to any other format. Location, motion data, user preferences, etc. may be obtained from a location based service system, a recipient wireless device, or any other source.
    Type: Application
    Filed: June 8, 2010
    Publication date: December 8, 2011
    Applicant: AT&T MOBILITY II LLC
    Inventor: Venson M. Shaw
  • Publication number: 20110301952
    Abstract: The present invention provides a speech recognition processing system in which speech recognition processing is executed parallelly by plural speech recognizing units. Before text data as the speech recognition result is output from each of the speech recognizing units, information indicating each speaker is parallelly displayed on a display in emission order of each speech. When the text data is output from each of the speech recognizing units, the text data is associated with the information indicating each speaker and the text data is displayed.
    Type: Application
    Filed: March 25, 2010
    Publication date: December 8, 2011
    Applicant: NEC CORPORATION
    Inventors: Takafumi Koshinaka, Masahiko Hamanaka
  • Publication number: 20110295458
    Abstract: An exemplary method includes a vehicle operation safety system detecting an operating parameter of a vehicle, detecting that a mobile access device is located within a predefined vicinity of the vehicle, and disabling one or more features of the mobile access device in response to the detecting of the operating parameter of the vehicle and the detecting that the mobile access device is located within the predefined vicinity of the vehicle. Corresponding methods and systems are also disclosed.
    Type: Application
    Filed: May 28, 2010
    Publication date: December 1, 2011
    Applicant: VERIZON VIRGINIA
    Inventor: Nicole Ma Ellen Halsey-Fenderson
  • Publication number: 20110295623
    Abstract: The present invention relates to a workers compensation related insurance complaint management, processing and reporting system. According to some embodiments, an insurance complaint is received at a conversion system for processing. The received information may be converted to a specified format to populate an interactive web form. The web form may be augmented by various third party data sources and information may be additionally pre-filled through the conversion process. Automated complaint management reminders are generated by the system.
    Type: Application
    Filed: May 26, 2010
    Publication date: December 1, 2011
    Applicant: Hartford Fire Insurance Company
    Inventors: Jill Rich Behringer, Marlene G. Chickerella, JulieeAnn McCollum
  • Publication number: 20110288863
    Abstract: Voice stream augmented note taking may be provided. An audio stream associated with at least one speaker may be recorded and converted into text chunks. A text entry may be received from a user, such as in an electronic document. The text entry may be compared to the text chunks to identify matches, and the matching text chunks may be displayed to the user for selection.
    Type: Application
    Filed: May 21, 2010
    Publication date: November 24, 2011
    Applicant: Microsoft Corporation
    Inventor: David John Rasmussen
  • Publication number: 20110288862
    Abstract: Methods and systems for performing audio synchronization with corresponding textual transcription and determining confidence values of the timing-synchronization are provided. Audio and a corresponding text (e.g., transcript) may be synchronized in a forward and reverse direction using speech recognition to output a time-annotated audio-lyrics synchronized data. Metrics can be computed to quantify and/or qualify a confidence of the synchronization. Based on the metrics, example embodiments describe methods for enhancing an automated synchronization process to possibly adapted Hidden Markov Models (HMMs) to the synchronized audio for use during the speech recognition. Other examples describe methods for selecting an appropriate HMM for use.
    Type: Application
    Filed: May 18, 2010
    Publication date: November 24, 2011
    Inventor: Ognjen Todic
  • Publication number: 20110282664
    Abstract: Methods and systems for providing services and/or computing resources are provided. A method may include converting voice data into text data and tagging at least one portion of the text data in the text conversion file with at least one tag, the at least one tag indicating that the at least one portion of the text data includes a particular type of data. The method may also include displaying the text data on a display such that the at least one portion of text data is displayed with at least one associated graphical element indicating that the at least one portion of text data is associated with the at least one tag. The at least one portion of text data may be a selectable item on the display allowing a user interfacing with the display to select the at least one portion of text data in order to apply the at least one portion of text data to an application.
    Type: Application
    Filed: May 14, 2010
    Publication date: November 17, 2011
    Applicant: FUJITSU LIMITED
    Inventors: Hideaki Tanioka, Daisuke Ito, Hidenobu Ito
  • Publication number: 20110282687
    Abstract: An automated system updates electronic medical records (EMRs) based on dictated reports, without requiring manual data entry into on-screen forms. A dictated report is transcribed by an automatic speech recognizer, and facts are extracted from the report and stored in encoded form. Information from a patient's report is also stored in encoded form. The resulting encoded information from the report and EMR are reconciled with each other, and changes to be made to the EMR are identified based on the reconciliation. The identified changes are made to the EMR automatically, without requiring manual data entry into the EMR.
    Type: Application
    Filed: February 28, 2011
    Publication date: November 17, 2011
    Inventor: Detlef Koll
  • Publication number: 20110282522
    Abstract: A system for converting audible air traffic control instructions for pilots operating from an air facility to textual format. The system may comprise a processor connected to a jack of the standard pilot headset and a separate portable display screen connected to the processor. The processor may have a language converting functionality which can recognize traffic control nomenclature and display messages accordingly. Displayed text may be limited to information intended for a specific aircraft. The display may show hazardous discrepancies between authorized altitudes and headings and actual altitudes and headings. The display may be capable of correction by the user, and may utilize Global Positioning System (GPS) to obtain appropriate corrections. The system may date and time stamp communications and hold the same in memory. The system may have computer style user functions such as scrollability and operating prompts.
    Type: Application
    Filed: May 13, 2010
    Publication date: November 17, 2011
    Inventors: Robert S. Prus, Konrad Robert Sliwowski
  • Publication number: 20110276327
    Abstract: A method including receiving a vocal input including words spoken by a user; determining vocal characteristics associated with the vocal input mapping the vocal characteristics to textual characteristics; and generating a voice-to-expressive text that includes, in addition to text corresponding to the words spoken by the user, a textual representation of the vocal characteristics based on the mapping.
    Type: Application
    Filed: May 28, 2010
    Publication date: November 10, 2011
    Applicant: SONY ERICSSON MOBILE COMMUNICATIONS AB
    Inventor: Eral Foxenland
  • Publication number: 20110276328
    Abstract: An application server for reducing ambiance noise in an auscultation signal, and for recording comments while auscultating a patient with an electronic stethoscope This application server (AS) comprises: means (SPH) for receiving samples of a raw auscultation signal representing auscultation sounds mixed with ambiance sounds, this raw auscultation signal being transmitted by a first microphone (M1) embedded in a stethoscope (ES), means (SPH) for receiving samples of an ambiance signal transmitted by a second microphone (M2) in a phone (IPP1), means (ASE) for processing the samples of the auscultation signal and the samples of the ambiance signal for generating an auscultation signal without ambiance sounds, means (LBM) for sending the auscultation signal without ambiance sounds to at least the headset of said stethoscope (ES), means (VRM) for recognizing vocal sounds in the ambiance signal, and converting these vocal sounds into text for storing comments into a database (DB).
    Type: Application
    Filed: July 10, 2009
    Publication date: November 10, 2011
    Inventors: Raymond Gass, Michel Le Creff
  • Publication number: 20110276326
    Abstract: A method and system for operational improvements in a dispatch console in a multi-source environment includes receiving (310) a plurality of audio streams simultaneously from a plurality of mobile devices, transcribing received audio streams by the means of speech-to-text conversion, presenting real-time transcriptions to the user and determining (320) if a first keyword is present in at least one of the plurality of audio and/or text streams. Upon determining the presence of the first keyword, the dispatch console automatically performs (330) at least one predefined dispatch console operation from a list of predefined dispatch console operations. The dispatch console further receives (340) a second keyword based on determining the presence of the first keyword and checks (350) for the presence of the second keyword within the audio and/or text streams thereby enabling additional automated dispatch console operations.
    Type: Application
    Filed: May 6, 2010
    Publication date: November 10, 2011
    Applicant: Motorola, Inc.
    Inventors: Arthur L. Fumarolo, Mark Shahaf
  • Publication number: 20110276325
    Abstract: According to certain embodiments, training a transcription system includes accessing recorded voice data of a user from one or more sources. The recorded voice data comprises voice samples. A transcript of the recorded voice data is accessed. The transcript comprises text representing one or more words of each voice sample. The transcript and the recorded voice data are provided to a transcription system to generate a voice profile for the user. The voice profile comprises information used to convert a voice sample to corresponding text.
    Type: Application
    Filed: May 5, 2010
    Publication date: November 10, 2011
    Applicant: Cisco Technology, Inc.
    Inventors: Todd C. Tatum, Michael A. Ramalho, Paul M. Dunn, Shantanu Sarkar, Tyrone T. Thorsen, Alan D. Gatzke
  • Publication number: 20110267419
    Abstract: Techniques for recording and replay of a live conference while still attending the live conference are described. A conferencing system includes a user interface generator, a live conference processing module, and a replay processing module. The user interface generator is configured to generate a user interface that includes a replay control panel and one or more output panels. The live conference processing module is configured to extract information included in received conferencing data that is associated with one or more conferencing modalities, and to display the information in the one or more output panels in a live manner (e.g., as a live conference). The replay processing module is configured to enable information associated with the one or more conferencing modalities corresponding to a time of the conference session prior to live to be presented at a desired rate, possibly different from the real-time rate, if a replay mode is selected in the replay control panel.
    Type: Application
    Filed: April 30, 2010
    Publication date: November 3, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Kori Inkpen Quinn, Rajesh Hegde, Zhengyou Zhang, John Tang, Sasa Junuzovic, Christopher Brooks
  • Publication number: 20110271194
    Abstract: This specification describes technologies relating to content presentation. In general, one aspect of the subject matter described in this specification can be embodied in methods that include the actions of presenting a content item to a user; receiving a user input indicating a voice interaction; receiving a voice input from the user; transmitting the voice input to a content system; receiving a command responsive to the voice input; and executing, using one or more processors, the command including modifying the content item. Other embodiments of this aspect include corresponding systems, apparatus, and computer program products.
    Type: Application
    Filed: April 29, 2010
    Publication date: November 3, 2011
    Applicant: GOOGLE INC.
    Inventors: Jennifer W. Lin, Ping Wu
  • Publication number: 20110270609
    Abstract: Various embodiments of systems, methods, and computer programs are disclosed for providing real-time resources to participants in an audio conference session. One embodiment is a method for providing real-time resources to participants in an audio conference session via a communication network. One such method comprises: a conferencing system establishing an audio conference session between a plurality of computing devices via a communication network, each computing device generating a corresponding audio stream comprising a speech signal; and in real-time during the audio conference session, a server: receiving and processing the audio streams to determine the speech signals; extracting words from the speech signals; analyzing the extracted words to determine a relevant keyword being discussed in the audio conference session; identifying a resource related to the relevant keyword; and providing the resource to one or more of the computing devices.
    Type: Application
    Filed: April 30, 2010
    Publication date: November 3, 2011
    Applicant: American Teleconferncing Services Ltd.
    Inventors: Boland T. Jones, David Michael Guthrie, Laurence Schaefer, J. Douglas Martin
  • Publication number: 20110269429
    Abstract: A communication device may participate in telephone calls. The communication device may allow a user to request transcription of a telephone call by prompting the user when the telephone call is completed. The communication device may display a call history user interface and, in response to a selection of a telephone call from the call history user interface, may request transcription of the selected telephone call. The communication device may include a dedicated transcription button that, when pressed, causes audio content of a telephone call to be sent to a transcription service. The communication device may display a preferences user interface via which a user may elect to have all incoming and outgoing telephone calls transcribed, all incoming and outgoing telephone calls to/from selected contacts transcribed, and/or have the communication device prompt the user about transcription when each telephone call is completed.
    Type: Application
    Filed: November 23, 2009
    Publication date: November 3, 2011
    Applicant: SPEECHINK, INC.
    Inventor: Konstantin Othmer
  • Publication number: 20110264451
    Abstract: A method and apparatus useful to train speech recognition engines is provided. Many of today's speech recognition engines require training to particular individuals to accurately convert speech to text. The training requires the use of significant resources for certain applications. To alleviate the resources, a trainer is provided with the text transcription and the audio file. The trainer updates the text based on the audio file. The changes are provided to the speech recognition to train the recognition engine and update the user profile. In certain aspects, the training is reversible as it is possible to over train the system such that the trained system is actually less proficient.
    Type: Application
    Filed: April 21, 2011
    Publication date: October 27, 2011
    Applicant: nVoq Incorporated
    Inventors: Jeffrey Hoepfinger, David Mondragon
  • Publication number: 20110265004
    Abstract: An interactive media device and method allows a user to select a format by which stored text of a book or stored audio of a book or combinations thereof are presented to the user. For example, if the user wishes to read the book, the user may select display of the text of a screen for reading. Alternatively, the user may select audio of the text resulting in the text being converted so that audio sound of the text is provided for listening. Interactive book or place marking allows the user via a device interface to mark a place within the book. Upon returning the user may continue from that place, either by reading displayed text or by listening to audio. Place marking, and presentation of the content either visually or audibly may continue, at the user's selection, until the book is completed.
    Type: Application
    Filed: April 24, 2010
    Publication date: October 27, 2011
    Inventor: Anthony G. Sitko
  • Publication number: 20110261941
    Abstract: A system and method for monitoring telephone activity and conversation content in a correctional facility comprises providing a communicative connection between a caller and a recipient, alerting at least one of the caller and the recipient that the communications may be recorded, delivering the conversation between the caller and the recipient over the communicative connection and storing the conversation into a call record memory. After the communicative connection has been terminated, speech recognition software is executed to identify a plurality of conversation words within the call record memory. By comparing the conversation words with a database of trigger words, a determination can be made as to whether the conversation is of interest to the correctional facility. Based on that comparison step, a detection response is executed.
    Type: Application
    Filed: April 27, 2010
    Publication date: October 27, 2011
    Inventors: Jay Walters, Randy L. Reeves, William L. Pope
  • Publication number: 20110257973
    Abstract: A control system for mounting in a vehicle and for providing information to a portable electronic device for processing by the portable electronic device is shown and described. The control system includes a first interface for communicating with the portable electronic device and a memory device. The control system also includes a processing circuit communicably coupled to the first interface and the memory device, the processing circuit configured to extract information from the memory device and to provide the information to the first interface so that the first interface communicates the information to the portable electronic device. The processing circuit is further configured to determine the capabilities of the portable electronic device based on data received from the portable electronic device via the first interface and to determine whether or not to communicate the information to the portable electronic device based on the determined capabilities.
    Type: Application
    Filed: January 14, 2011
    Publication date: October 20, 2011
    Inventors: Richard J. Chutorash, Elisabet Anderson, Rodger W. Eich, Jeffrey Golden, Philip J. Vanderwall, Michael J. Sims
  • Publication number: 20110257971
    Abstract: Methods, system, and articles are described herein for receiving an audio input and a facial image sequence for a period of time, in which the audio input includes speech input from multiple speakers. The audio input is extracted based on the received facial image sequence to extract a speech input of a particular speaker.
    Type: Application
    Filed: April 14, 2010
    Publication date: October 20, 2011
    Applicant: T-MOBILE USA, INC.
    Inventor: Andrew R. Morrison
  • Publication number: 20110251843
    Abstract: A method, system, and computer program product compensation of intra-speaker variability in speaker diarization are provided. The method includes: dividing a speech session into segments of duration less than an average duration between speaker change; parameterizing each segment by a time dependent probability density function supervector, for example, using a Gaussian Mixture Model; computing a difference between successive segment supervectors; and computing a scatter measure such as a covariance matrix of the difference as an estimate of intra-speaker variability. The method further includes compensating the speech session for intra-speaker variability using the estimate of intra-speaker variability.
    Type: Application
    Filed: April 8, 2010
    Publication date: October 13, 2011
    Applicant: International Business Machines Corporation
    Inventor: Hagai Aronowitz
  • Publication number: 20110251971
    Abstract: An embodiment of the invention comprises a real-time collaborative technical support (RTCTS) system that may automatically generate and/or maintain social networks that may be dynamically evolving. The social networks may be based on the output of at least one multi-modal classification algorithm. These outputs may be occurring in real-time.
    Type: Application
    Filed: April 8, 2010
    Publication date: October 13, 2011
    Applicant: International Business Machines Corporation
    Inventors: Timothy J. Bethea, Neil H. Boyette, Isaac K. Cheng, Vikas Krishna, Yolanda A. Rankin, Yongshin Yu
  • Publication number: 20110246194
    Abstract: A client station having access to an application is provided. The application has at least one indicia having a first configuration and a second configuration different from the first configuration. The second configuration indicating the application is able to accept input.
    Type: Application
    Filed: March 30, 2010
    Publication date: October 6, 2011
    Applicant: nVoq Incorporated
    Inventors: Rebecca Heins, Edward Kizhner
  • Publication number: 20110243311
    Abstract: Methods and systems for automatic phone call tracking and analysis of the content and outcomes of a call are provided. These systems may provide businesses with the ability to track and view analytics of the number and various outcomes of calls, thereby providing up-to-date real-time analysis of the automatically-generated results of client interactions with staff answering the phones. Methods and systems in accordance with the present invention quantitatively and objectively analyze staff performance and marketing return on investment (ROI), and track patient demand across various procedures. This may automatically provide information on the number of calls with various outcomes, e.g., the customer booked an appointment, the customer hung up while on hold, the customer was connected with voicemail, the customer left a message on voicemail, the customer is an existing client, etc. Other automatically-detected aspects of phone call contents are provided.
    Type: Application
    Filed: March 30, 2010
    Publication date: October 6, 2011
    Inventor: Grant L. Aldrich
  • Publication number: 20110246197
    Abstract: A mechanism is provided for authenticating and using a personal voice profile. The voice profile may be issued by a trusted third party, such as a certification authority. The personal voice profile may include information for generating a digest or digital signature for text messages. A speech synthesis system may speak the text message using the voice characteristics, such as prosodic characteristics, only if the voice profile is authenticated and the text message is valid and free of tampering.
    Type: Application
    Filed: June 17, 2011
    Publication date: October 6, 2011
    Applicant: Nuance Communications, Inc.
    Inventors: Rafael Graniello Cabezas, Jason Eric Moore, Elizabeth Salvia
  • Publication number: 20110246195
    Abstract: A dictation system that allows using trainable code phrases is provided. The dictation system operates by receiving audio and recognizing the audio as text. The text/audio may contain code phrases that are identified by a comparator that matches the text/audio and replaces the code phrase with a standard clause that is associated with the code phrase. The database or memory containing the code phrases is loaded with matched standard clauses that may be identified to provide a hierarchal system such that certain code phrases may have multiple meanings depending on the user.
    Type: Application
    Filed: March 21, 2011
    Publication date: October 6, 2011
    Applicant: nVoq Incorporated
    Inventors: Charles Corfield, Brian Marquette, David Mondragon, Rebecca Heins
  • Publication number: 20110230159
    Abstract: A vehicle communication system includes a computer processor in communication with a memory circuit, a transceiver in communication with the processor and operable to communicate with one or more wireless devices, and one or more storage locations storing one or more pieces of emergency contact information. In this illustrative system, the processor is operable to establish communication with a first wireless device through the transceiver. Upon detection of an emergency event by at least one vehicle based sensor system, the vehicle communication system is operable to contact an emergency operator. The vehicle communication system is further operable to display one or more of the one or more pieces of emergency contact information in a selectable manner. Upon selection of one of the one or more pieces of emergency contact information, the vehicle computing system places a call to a phone number associated with the selected emergency contact.
    Type: Application
    Filed: March 19, 2010
    Publication date: September 22, 2011
    Applicant: FORD GLOBAL TECHNOLOGIES, LLC
    Inventor: David Anthony Hatton