Speech To Text Systems (epo) Patents (Class 704/E15.043)
  • Publication number: 20120209606
    Abstract: Obtaining information from audio interactions associated with an organization. The information may comprise entities, relations or events. The method comprises: receiving a corpus comprising audio interactions; performing audio analysis on audio interactions of the corpus to obtain text documents; performing linguistic analysis of the text documents; matching the text documents with one or more rules to obtain one or more matches; and unifying or filtering the matches.
    Type: Application
    Filed: February 14, 2011
    Publication date: August 16, 2012
    Applicant: Nice Systems Ltd.
    Inventors: Maya Gorodetsky, Ezra Daya, Oren Pereg
  • Publication number: 20120209607
    Abstract: A method and communication device disclosed includes displaying a video on a display, converting voice audio data to textual data by applying voice-to-text conversion, and displaying the textual data as scrolling text displayed along with the video on the display and either above, below or across the video. The method may further include receiving a voice call indication from a network, providing the voice call indication to a user interface where the voice call indication corresponds to an incoming voice call; and receiving a user input for receiving the voice call and displaying the voice call as scrolling text. In another embodiment, a method includes displaying application related data on a display; converting voice audio data to textual data by applying voice-to-text conversion; converting the textual data to a video format; and displaying the textual data as scrolling text over the application related data on the display.
    Type: Application
    Filed: April 13, 2012
    Publication date: August 16, 2012
    Applicant: QUALCOMM Incorporated
    Inventors: Dinesh Kumar Garg, Manish Poddar
  • Publication number: 20120203551
    Abstract: Embodiments of the present invention provide a method, system and computer program product for automated follow-up for e-meetings. In an embodiment of the invention, a method for automated follow-up for e-meetings is provided. The method includes monitoring content provided to an e-meeting managed by an e-meeting server executing in memory of a host computer. The method also includes applying a rule in a rules base to the monitored content. Finally, the method includes triggering generation of a follow up item in response to applying the rule to the monitored content.
    Type: Application
    Filed: February 4, 2011
    Publication date: August 9, 2012
    Applicant: International Business Machines Corporation
    Inventors: Geetika T. Lakshmanan, Martin Oberhofer
  • Publication number: 20120201362
    Abstract: Methods, systems, and computer program products are provided for generating and posting messages to social networks based on voice input. One example method includes receiving an audio signal that corresponds to spoken content, generating one or more representations of the spoken content, and causing the one or more representations of the spoken content to be posted to a social network.
    Type: Application
    Filed: February 3, 2012
    Publication date: August 9, 2012
    Applicant: GOOGLE INC.
    Inventors: Steve Crossan, Ujjwal Singh
  • Publication number: 20120203552
    Abstract: A device may receive over a network a digitized speech signal from a remote control that accepts speech. In addition, the device may convert the digitized speech signal into text, use the text to obtain command information applicable to a set-top box, and send the command information to the set-top box to control presentation of multimedia content on a television in accordance with the command information.
    Type: Application
    Filed: April 16, 2012
    Publication date: August 9, 2012
    Applicant: VERIZON DATA SERVICES INDIA PVT. LTD.
    Inventors: Ashutosh K. Sureka, Sathish K. Subramanian, Sidhartha Basu, Indivar Verma
  • Publication number: 20120197523
    Abstract: A mobile device communicates with an in-vehicle system to provide a network-based calendar and related features for viewing and/or editing within a vehicle. The mobile device executes a specialized application that retrieves calendar data from one or more calendar sources in a native calendar format, and converts the calendar data to a customized vehicle format designed specifically for convenient transfer and viewing within the vehicle. The user may record spoken voice notes that can be processed to automatically create new calendar entries. An alert feature schedules visual and/or audio alerts to notify the user in advance of scheduled calendar events. When a scheduled calendar event time is reached, the in-vehicle system may automatically place a call to an event invitee or generating a route to an event destination.
    Type: Application
    Filed: January 27, 2011
    Publication date: August 2, 2012
    Applicant: HONDA MOTOR CO., LTD.
    Inventor: David M. Kirsch
  • Publication number: 20120197640
    Abstract: A system and method is provided for combining active and unsupervised learning for automatic speech recognition. This process enables a reduction in the amount of human supervision required for training acoustic and language models and an increase in the performance given the transcribed and un-transcribed data.
    Type: Application
    Filed: April 9, 2012
    Publication date: August 2, 2012
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Dilek Zeynep Hakkani-Tür, Giuseppe Riccardi
  • Publication number: 20120191451
    Abstract: In one embodiment, the invention provides a method, comprising providing a first communications channel to transmit digital content to a notes-access application for storage against a particular user, the first communications channel being selected from the group consisting of an SMS channel, an MMS channel, a fax channel, an e-mail channel, and an IM channel; responsive to receiving digital content from said user via the first communications channel storing said digital content in the database associated with said notes-access application; and providing a second communications channel to the notes-access application whereby the digital content stored by the notes-access application against said user is provided to said user, the second communications channel being selected from the group consisting of an SMS channel, an MMS channel, a fax channel, an e-mail channel, and an IM channel.
    Type: Application
    Filed: March 29, 2012
    Publication date: July 26, 2012
    Inventor: Yue Fang
  • Publication number: 20120191452
    Abstract: Disclosed is a system for generating a representation of a group interaction, the system comprising: a transcription module adapted to generate a transcript of the group interaction from audio source data representing the group interaction, the transcript comprising a sequence of lines of text, each line corresponding to an audible utterance in the audio source data; and a labeling module adapted to generate a conversation path from the transcript by labeling each transcript line with an identifier identifying the speaker of the corresponding utterance in the audio source data; and generate the representation of the group interaction by associating the conversation path with a plurality of voice profiles, each voice profile corresponding to an identified speaker in the conversation path.
    Type: Application
    Filed: April 4, 2012
    Publication date: July 26, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Anand Krishnaswamy, Rajeev Palanki
  • Publication number: 20120185240
    Abstract: An embodiment provides a system and method for generating and sending a simplified message using speech recognition. The system provides a speech recognition software that may be utilized for receiving audio, converting audio to text derived from audio, comparing text derived from audio to match fields to find matches, replacing matched text with contents of replacement fields associated to the match fields, generating an output message incorporating the replacement text into the text derived from audio, transmitting the output message to a messaging system and redistributing the output message to recipients.
    Type: Application
    Filed: January 13, 2012
    Publication date: July 19, 2012
    Inventors: Michael D. Goller, Stuart E. Goller
  • Publication number: 20120185250
    Abstract: A distributed dictation/transcription system is provided. The system provides a client station, dictation manager, and dictation server networked such that the dictation manager selects a dictation server to transcribe audio from the client station. The dictation manager selects one of a plurality of dictation servers based on conventional load balancing and on a determination of whether the user profile is already uploaded to a dictation server. While selecting a dictation server or uploading a profile, the client may begin dictating, which audio would be stored in a buffer of dictation manager until a dictation server was selected or available. The user may receive in real time or near real time a display of the textual data that may be corrected by the user to update the user profile.
    Type: Application
    Filed: March 16, 2012
    Publication date: July 19, 2012
    Applicant: NVOQ INCORPORATED
    Inventors: Richard Beach, Christopher Butler, Jon Ford, Brian Marquette, Christopher Omland
  • Publication number: 20120185249
    Abstract: A method and a system of history tracking corrections in a speech based document. The speech based document comprises one or more sections of text recognized or transcribed from sections of speech, wherein the sections of speech are dictated by a user and processed by a speech recognizer in a speech recognition system into corresponding sections of text of the speech based document. The method comprises associating at least one speech attribute to each section of text in the speech based document, said speech attribute comprising information related to said section of text, respectively; presenting said speech based document on a presenting unit; detecting an action being performed within any of said sections of text; and updating information of said speech attributes related to the kind of action detected on one of said sections of text for updating said speech based document.
    Type: Application
    Filed: February 3, 2012
    Publication date: July 19, 2012
    Applicant: Nuance Communications Austria GMBH
    Inventors: Gerhard Grobauer, Miklos Papai
  • Publication number: 20120179465
    Abstract: Audio content is converted to text using speech recognition software. The text is then associated with a distinct voice or a generic placeholder label if no distinction can be made. From the text and voice information, a word cloud is generated based on key words and key speakers. A visualization of the cloud displays as it is being created. Words grow in size in relation to their dominance. When it is determined that the predominant words or speakers have changed, the word cloud is complete. That word cloud continues to be displayed statically and a new word cloud display begins based upon a new set of predominant words or a new predominant speaker or set of speakers. This process may continue until the meeting is concluded. At the end of the meeting, the completed visualization may be saved to a storage device, sent to selected individuals, removed, or any combination of the preceding.
    Type: Application
    Filed: January 10, 2011
    Publication date: July 12, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Susan Marie Cox, Janani Janakiraman, Fang Lu, Loulwa Salem
  • Publication number: 20120179466
    Abstract: A speech to text converting device includes a display, a voice receiving module, a voice recognition module, an identity recognition module, and a control module. The voice receiving module receives a voice signal. The voice recognition module converts the voice signal to voice data and produces text data corresponding to the voice data. The identity recognition module receives the voice signal and establishes an identity data corresponding to the voice signal. The control module displays the text data and the identity data together on the display.
    Type: Application
    Filed: August 8, 2011
    Publication date: July 12, 2012
    Applicant: HON HAI PRECISION INDUSTRY CO., LTD.
    Inventors: YUAN-FU HUANG, TIEN-PING LIU, CHIEN-HUANG CHANG
  • Publication number: 20120172012
    Abstract: A method for controlling a mobile communications device while located in a mobile vehicle involves pairing the mobile communications device with a telematics unit via short range wireless communication. The method further involves, receiving an incoming text message at the mobile device while the mobile device is paired with the telematics unit. Upon receiving the text message, a text messaging management strategy is implemented via the telematics unit and/or the mobile device, where the text messaging management strategy is executable via an application that is resident on the mobile device.
    Type: Application
    Filed: January 4, 2011
    Publication date: July 5, 2012
    Applicant: GENERAL MOTORS LLC
    Inventors: Anthony J. Sumcad, Shawn F. Granda, Lawrence D. Cepuran, Steven Swanson
  • Publication number: 20120173236
    Abstract: A speech to text converting device includes a display, a voice receiving module, and a voice recognition module, an input module, and a control module. The voice receiving module receives a speech within a certain period of time. The voice recognition module converts the speech to voice data. The control module establishes text data corresponding to the voice data and displays the text data, any inputted words, and the relevant time period.
    Type: Application
    Filed: August 8, 2011
    Publication date: July 5, 2012
    Applicant: HON HAI PRECISION INDUSTRY CO., LTD.
    Inventors: YUAN-FU HUANG, TIEN-PING LIU, CHIEN-HUANG CHANG
  • Publication number: 20120173235
    Abstract: One embodiment described herein may take the form of a system or method for generating subtitles (also known as “closed captioning”) of an audio component of a multimedia presentation automatically for one or more stored presentations. In general, the system or method may access one or more multimedia programs stored on a storage medium, either as an entire program or in portions. Upon retrieval, the system or method may perform an analysis of the audio component of the program and generate a subtitle text file that corresponds to the audio component. In one embodiment, the system or method may perform a speech recognition analysis on the audio component to generate the subtitle text file.
    Type: Application
    Filed: December 31, 2010
    Publication date: July 5, 2012
    Applicant: Eldon Technology Limited
    Inventor: Dale Llewelyn Mountain
  • Publication number: 20120166192
    Abstract: Systems, methods, and computer readable media providing a speech input interface. The interface can receive speech input and non-speech input from a user through a user interface. The speech input can be converted to text data and the text data can be combined with the non-speech input for presentation to a user.
    Type: Application
    Filed: November 18, 2011
    Publication date: June 28, 2012
    Applicant: APPLE INC.
    Inventor: Kazuhisa Yanagihara
  • Publication number: 20120166191
    Abstract: A method and system for providing text-to-audio conversion of an electronic book displayed on a viewer. A user selects a portion of displayed text and converts it into audio. The text-to-audio conversion may be performed via a header file and pre-recorded audio for each electronic book, via text-to-speech conversion, or other available means. The user may select manual or automatic text-to audio conversion. The automatic text-to-audio conversion may be performed by automatically turning the pages of the electronic book or by the user manually turning the pages. The user may also select to convert the entire electronic book, or portions of it, into audio. The user may also select an option to receive an audio definition of a particular word in the electronic book. The present invention allows a user to control the system by selecting options from a screen or by entering voice commands.
    Type: Application
    Filed: November 17, 2011
    Publication date: June 28, 2012
    Applicant: ADREA LLC
    Inventors: John S. Hendricks, Michael L. Asmussen
  • Publication number: 20120166193
    Abstract: A visual toolkit for prioritizing speech transcription is provided. The toolkit can include a logger (102) for capturing information from a speech recognition system, a processor (104) for determining an accuracy rating of the information, and a visual display (106) for categorizing the information and prioritizing a transcription of the information based on the accuracy rating. The prioritizing identifies spoken utterances having a transcription priority in view of the recognized result. The visual display can include a transcription category (156) having a modifiable textbox entry with a text entry initially corresponding to a text of the recognized result, and an accept button (157) for validating a transcription of the recognized result. The categories can be automatically ranked by the accuracy rating in an ordered priority for increasing an efficiency of transcription.
    Type: Application
    Filed: January 19, 2012
    Publication date: June 28, 2012
    Applicant: Nuance Communications, Inc.
    Inventors: Jeffrey S. Kobal, Girish Dhanakshirur
  • Publication number: 20120158405
    Abstract: A speech recognition device (1) processes speech data (SD) of a dictation and establishes recognized text information (ETI) and link information (LI) of the dictation. In a synchronous playback mode of the speech recognition device (1), during acoustic playback of the dictation a correction device (10) synchronously marks the word of the recognized text information (ETI) which word relates to speech data (SD) just played back marked by link information (LI) is marked synchronously, the just marked word featuring the position of an audio cursor (AC). When a user of the speech recognition device (1) recognizes an incorrect word, he positions a text cursor (TC) at the incorrect word and corrects it. Cursor synchronization means (15) makes it possible to synchronize text cursor (TC) with audio cursor (AC) or audio cursor (AC) with text cursor (TC) so the positioning of the respective cursor (AC, TC) is simplified considerably.
    Type: Application
    Filed: February 13, 2012
    Publication date: June 21, 2012
    Applicant: Nuance Communications Austria GmbH
    Inventor: Wolfgang Gschwendtner
  • Publication number: 20120150537
    Abstract: Confidential information included in image and voice data is filtered in an apparatus that includes an extraction unit for extracting a character string from an image frame, and a conversion unit for converting audio data to a character string. The apparatus also includes a determination unit for determining, in response to contents of a database, whether at least one of the image frame and the audio data include confidential information. The apparatus also includes a masking unit for concealing contents of the image frame by masking the image frame in response to determining that the image frame includes confidential information, and for making the audio data inaudible by masking the audio data in response to determining that the audio data includes confidential information. The playback unit included in the apparatus is for playing back the image frame and the audio data.
    Type: Application
    Filed: December 5, 2011
    Publication date: June 14, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Seiji Abe, Mitsuru Shioya, Shigeki Takeuchi, Daisuke Tomoda
  • Publication number: 20120143605
    Abstract: In one implementation, a collaboration server is a conference bridge or other network device configured to host an audio and/or video conference among a plurality of conference participants. The collaboration server sends conference data and a media stream including speech to a speech recognition engine. The conference data may include the conference roster or text extracted from documents or other files shared in the conference. The speech recognition engine updates a default language model according to the conference data and transcribes the speech in the media stream based on the updated language model. In one example, the performance of default language model, the updated language model, or both may be tested using a confidence interval or submitted for approval of the conference participant.
    Type: Application
    Filed: December 1, 2010
    Publication date: June 7, 2012
    Applicant: Cisco Technology, Inc.
    Inventors: Tyrone Terry Thorsen, Alan Darryl Gatzke
  • Publication number: 20120143607
    Abstract: The present invention provides a speech recognition system combined with one or more alternate input modalities to ensure efficient and accurate text input. The speech recognition system achieves less than perfect accuracy due to limited processing power, environmental noise, and/or natural variations in speaking style. The alternate input modalities use disambiguation or recognition engines to compensate for reduced keyboards, sloppy input, and/or natural variations in writing style. The ambiguity remaining in the speech recognition process is mostly orthogonal to the ambiguity inherent in the alternate input modality, such that the combination of the two modalities resolves the recognition errors efficiently and accurately. The invention is especially well suited for mobile devices with limited space for keyboards or touch-screen input.
    Type: Application
    Filed: December 6, 2011
    Publication date: June 7, 2012
    Inventors: Michael LONGÉ, Richard Eyraud, Keith C. Hullfish
  • Publication number: 20120130714
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media relating to speaker verification. In one aspect, a system receives a first user identity from a second user, and, based on the identity, accesses voice characteristics. The system randomly generates a challenge sentence according to a rule and/or grammar, based on the voice characteristics, and prompts the second user to speak the challenge sentence. The system verifies that the second user is the first user if the spoken challenge sentence matches the voice characteristics. In an enrollment aspect, the system constructs an enrollment phrase that covers a minimum threshold of unique speech sounds based on speaker-distinctive phonemes, phoneme clusters, and prosody. Then user utters the enrollment phrase and extracts voice characteristics for the user from the uttered enrollment phrase.
    Type: Application
    Filed: November 24, 2010
    Publication date: May 24, 2012
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Ilija Zeljkovic, Taniya Mishra, Amanda Stent, Ann K. Syrdal, Jay Wilpon
  • Publication number: 20120123778
    Abstract: A method and apparatus for providing security control of short messaging service (SMS) messages and multimedia messaging service (MMS) messages in a unified messaging (UM) system are disclosed. An SMS or MMS message directed to a recipient mailbox in a UM system is received. It is determined that the recipient mailbox is a secondary mailbox associated with a primary mailbox in the UM system. The message is audited according to an audit policy associated with the recipient mailbox.
    Type: Application
    Filed: November 11, 2010
    Publication date: May 17, 2012
    Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventors: Mehrad Yasrebi, James Jackson, Cheryl Lockett
  • Publication number: 20120120446
    Abstract: A document generation method and system using speech data, and an image forming apparatus including the document generation system. The method includes setting document editing information including at least one of document form information and sentence pattern information for editing a document when the speech data is generated as the document; converting the speech data into text; and generating the text as the document based on the document editing information.
    Type: Application
    Filed: November 14, 2011
    Publication date: May 17, 2012
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Hyun-sub KIL, Mok-hwa Lim
  • Publication number: 20120123779
    Abstract: Devices, methods, and computer program products are for facilitating enhanced social interactions using a mobile device. A method for facilitating an enhanced social interaction using a mobile device includes receiving an audio input at the mobile device, determining a salient portion of the audio input, receiving relevant information associated with the salient portion, and presenting the relevant information via the mobile device.
    Type: Application
    Filed: November 15, 2010
    Publication date: May 17, 2012
    Inventors: James Pratt, Steven Belz, Marc Sullivan
  • Publication number: 20120114233
    Abstract: A system, method, and computer program product for automatically analyzing multimedia data are disclosed. Embodiments receive multimedia data, detect portions having specified features, and output a corresponding subset of the multimedia data. Content features from downloaded or streaming movies or video clips are identified as a human probably would do, but in essentially real time. Embodiments then generate an index or menu based on individual consumer preferences. Consumers can peruse the index, or produce customized trailers, or edit and tag content with metadata as desired. The tool can categorize and cluster content by feature, to assemble a library of scenes or scene clusters according to user-selected criteria.
    Type: Application
    Filed: June 28, 2011
    Publication date: May 10, 2012
    Applicant: Sony Corporation
    Inventor: Priyan Gunatilake
  • Publication number: 20120109648
    Abstract: A communication system is described. The communication system including an automatic speech recognizer configured to receive a speech signal and to convert the speech signal into a text sequence. The communication also including a speech analyzer configured to receive the speech signal. The speech analyzer configured to extract paralinguistic characteristics from the speech signal. In addition, the communication system includes a speech output device coupled with the automatic speech recognizer and the speech analyzer. The speech output device configured to convert the text sequence into an output speech signal based on the extracted paralinguistic characteristics.
    Type: Application
    Filed: October 30, 2011
    Publication date: May 3, 2012
    Inventor: Fathy Yassa
  • Publication number: 20120109628
    Abstract: A communication system is described. The communication system including an automatic speech recognizer configured to receive a speech signal and to convert the speech signal into a text sequence. The communication system also including a speech analyzer configured to receive the speech signal. The speech analyzer configured to extract paralinguistic characteristics from the speech signal. In addition, the communication system includes a voice analyzer configured to receive the speech signal. The voice analyzer configured to generate one or more phonemes based on the speech signal. The communication system includes a speech output device coupled with the automatic speech recognizer, the speech analyzer and the voice analyzer. The speech output device configured to convert the text sequence into an output speech signal based on the extracted paralinguistic characteristics and said one or more phonemes.
    Type: Application
    Filed: October 30, 2011
    Publication date: May 3, 2012
    Inventor: Fathy Yassa
  • Publication number: 20120109627
    Abstract: A networked communication system is described. The communication system including an automatic speech recognizer configured to receive a speech signal from a client over a network and to convert the speech signal into a text sequence. The communication also including a speech analyzer configured to receive the speech signal. The speech analyzer configured to extract paralinguistic characteristics from the speech signal. In addition, the communication system includes a speech output device coupled with the automatic speech recognizer and the speech analyzer. The speech output device configured to convert the text sequence into an output speech signal based on the extracted paralinguistic characteristics.
    Type: Application
    Filed: October 30, 2011
    Publication date: May 3, 2012
    Inventor: Fathy Yassa
  • Patent number: 8165879
    Abstract: A voice output device, includes: a compound word voice data storage unit that stores voice data in association with each of compound words which is formed of a plurality of words; a text display unit that displays text containing a plurality of words; a word designation unit that designates any of the words in the text displayed by the text display unit as a designated word based on a user's operation; a compound word detection unit that detects a compound word in which voice data is stored in the compound word voice data storage unit, from among the plurality of words in the text containing the designated word; and a voice output unit that outputs voice data corresponding to the compound word detected by the compound word detection unit as a voice.
    Type: Grant
    Filed: January 3, 2008
    Date of Patent: April 24, 2012
    Assignee: Casio Computer Co., Ltd.
    Inventors: Takatoshi Abe, Takuro Abe, Takashi Kojo
  • Publication number: 20120089395
    Abstract: A method of operating a communication system includes generating a transcript of at least a portion of a conversation between a plurality of users. The transcript includes a plurality of subsets of characters. The method further includes displaying the transcript on a plurality of communication devices, identifying an occurrence of at least one selected subset of characters from the plurality of subsets of characters, and querying a definition source for at least one definition for the selected subset of characters. The definition for the selected subset of characters is displayed on the plurality of communication devices.
    Type: Application
    Filed: October 7, 2010
    Publication date: April 12, 2012
    Applicant: Avaya, Inc.
    Inventors: David L. Chavez, Larry J. Hardouin
  • Publication number: 20120089394
    Abstract: Techniques involving visual display of information related to matching user utterances against graph patterns are described. In one or more implementations, an utterance of a user is obtained that has been indicated as corresponding to a graph pattern through linguistic analysis. The utterance is displayed in a user interface as a representation of the graph pattern.
    Type: Application
    Filed: October 6, 2010
    Publication date: April 12, 2012
    Applicant: VirtuOz SA
    Inventors: Dan Teodosiu, Elizabeth Ireland Powers, Pierre Serge Vincent LeRoy, Sebastien Jean-Marie Christian Saunier
  • Publication number: 20120084086
    Abstract: Disclosed herein are systems, methods and non-transitory computer-readable media for performing speech recognition across different applications or environments without model customization or prior knowledge of the domain of the received speech. The disclosure includes recognizing received speech with a collection of domain-specific speech recognizers, determining a speech recognition confidence for each of the speech recognition outputs, selecting speech recognition candidates based on a respective speech recognition confidence for each speech recognition output, and combining selected speech recognition candidates to generate text based on the combination.
    Type: Application
    Filed: September 30, 2010
    Publication date: April 5, 2012
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Mazin GILBERT, Srinivas Bangalore, Patrick Haffner, Robert Bell
  • Publication number: 20120078626
    Abstract: Methods and systems for converting speech to text are disclosed. One method includes analyzing multimedia content to determine the presence of closed captioning data. The method includes, upon detecting closed captioning data, indexing the closed captioning data as associated with the multimedia content. The method also includes, upon failure to detect closed captioning data in the multimedia content, extracting audio data from multimedia content, the audio data including speech data, performing a plurality of speech to text conversions on the speech data to create a plurality of transcripts of the speech data, selecting text from one or more of the plurality of transcripts to form an amalgamated transcript, and indexing the amalgamated transcript as associated with the multimedia content.
    Type: Application
    Filed: September 27, 2010
    Publication date: March 29, 2012
    Inventors: Johney Tsai, Matthew Miller, David Strong
  • Publication number: 20120078629
    Abstract: According to one embodiment, a meeting support apparatus includes a storage unit, a determination unit, a generation unit. The storage unit is configured to store storage information for each of words, the storage information indicating a word of the words, pronunciation information on the word, and pronunciation recognition frequency. The determination unit is configured to generate emphasis determination information including an emphasis level that represents whether a first word should be highlighted and represents a degree of highlighting determined in accordance with a pronunciation recognition frequency of a second word when the first word is highlighted, based on whether the storage information includes second set corresponding to first set and based on the pronunciation recognition frequency of the second word when the second set is included. The generation unit is configured to generate an emphasis character string based on the emphasis determination information when the first word is highlighted.
    Type: Application
    Filed: March 25, 2011
    Publication date: March 29, 2012
    Inventors: Tomoo Ikeda, Nobuhiro Shimogori, Kouji Ueno
  • Publication number: 20120078628
    Abstract: The head-mounted text display system for the hearing impaired is a speech-to-text system, in which spoken words are converted into a visual textual display and displayed to the user in passages containing a selected number of words. The system includes a head-mounted visual display, such as eyeglass-type dual liquid crystal displays or the like, and a controller. The controller includes an audio receiver, such as a microphone or the like, for receiving spoken language and converting the spoken language into electrical signals. The controller further includes a speech-to-text module for converting the electrical signals representative of the spoken language to a textual data signal representative of individual words. A transmitter associated with the controller transmits the textual data signal to a receiver associated with the head-mounted display. The textual data is then displayed to the user in passages containing a selected number of individual words.
    Type: Application
    Filed: September 28, 2010
    Publication date: March 29, 2012
    Inventor: MAHMOUD M. GHULMAN
  • Publication number: 20120065969
    Abstract: An embodiment of the invention includes methods and systems for contextual social network communications during a phone conversation. A telephone conversation between a first user and at least one second user is monitored. More specifically, a monitor identifies terms spoken by the first user and the second user during the telephone conversation. The terms spoken are translated into textual keywords by a translating module. One or more of the second user's web applications are searched by a processor for portion(s) of the second user's web applications that include at least one of the keywords. The processor also searches one or more of the first user's web applications for portion(s) of the first user's web applications that include at least one of the keywords. The portion(s) of the second user's web applications and the portion(s) of the first user's web applications are displayed to the first user during the telephone conversation.
    Type: Application
    Filed: September 13, 2010
    Publication date: March 15, 2012
    Applicant: International Business Machines Corporation
    Inventors: Lisa Seacat DeLuca, Pamela A. Nesbitt
  • Publication number: 20120065970
    Abstract: A system and method for providing a discussion, including receiving by a processor text related to a discussion; converting by the processor the text to voice; storing by the processor in a memory the converted voice; receiving by the processor voice related to the discussion; storing by the processor in the memory the received voice; receiving by the processor a request to play voice related to at least part of the discussion; and transmitting by the processor audio containing the voice identified by the request related to the at least part of the discussion.
    Type: Application
    Filed: September 15, 2010
    Publication date: March 15, 2012
    Applicant: Sequent, Inc.
    Inventors: Charanjit SINGH, Mukesh Sehgal
  • Publication number: 20120059652
    Abstract: A method for transcribing a spoken communication includes acts of receiving a spoken first communication from a first sender to a first recipient, obtaining information relating to a second communication, which is different from the first communication, from a second sender to a second recipient, using the obtained information to obtain a language model, and using the language model to transcribe the spoken first communication.
    Type: Application
    Filed: August 30, 2011
    Publication date: March 8, 2012
    Inventors: Jeffrey P. Adams, Kenneth Basye, Ryan Thomas
  • Publication number: 20120053936
    Abstract: A method for generating a transcription of a videoconference includes matching human speech of a videoconference to writable symbols. The human speech is encoded in audio data of the videoconference. The writable symbols are parsed into a plurality of statements. For each statement of the plurality of statements, user profile data stored in computer-readable memory is used to determine which participant of a plurality of participants of the videoconference is most likely the source of the statement. A transcription of the videoconference is generated that identifies for each statement the determination of which participant of the plurality of participants of the videoconference is most likely the source of the statement.
    Type: Application
    Filed: August 31, 2010
    Publication date: March 1, 2012
    Applicant: Fujitsu Limited
    Inventor: David L. Marvit
  • Publication number: 20120053937
    Abstract: A text content summary is created from speech content. A focus more signal is issued by a user while receiving the speech content. The focus more signal is associated with a time window, and the time window is associated with a part of the speech content. It is determined whether to use the part of the speech content associated with the time window to generate a text content summary based on a number of the focus more signals that are associated with the time window. The user may express relative significance to different portions of speech content, so as to generate a personal text content summary.
    Type: Application
    Filed: August 22, 2011
    Publication date: March 1, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: BAO HUA CAO, LE HE, XING JIN, QING BO WANG, XIN ZHOU
  • Publication number: 20120053938
    Abstract: In one embodiment, a communication request from a remote requester is intercepted at the computing device. Based on the intercepted communication request, one or more voicemail features are enabled at the computing device, independent of carrier voicemail support. The remote requester may be, for example, a caller or a voicemail server, and the intercepted communication request may be a phone call or a voicemail notification, respectively. In another embodiment, a system at a computing device coupled to a network includes a communication request handler and a voicemail manager. The communication request handler intercepts a communication request from a remote requester at the computing device. The intercepted communication request may be a voicemail notification from a network server or a phone call from a caller. The voicemail manager enables one or more voicemail features at the computing device, independent of carrier voicemail support, based on the intercepted communication request.
    Type: Application
    Filed: September 30, 2011
    Publication date: March 1, 2012
    Applicant: Google Inc.
    Inventor: Jean-Michel Trivi
  • Publication number: 20120041758
    Abstract: A method and system for synchronizing words in an input text of a speech with a continuous recording of the speech. A received input text includes previously recorded content of the speech to be reproduced. A synthetic speech corresponding to the received input text is generated. Ratio data including a ratio between the respective pronunciation times of words included in the received text in the generated synthetic speech is computed. The ratio data is used to determine an association between erroneously recognized words of the received text and a time to reproduce each erroneously recognized word. The association is outputted in a recording medium and/or displayed on a display device.
    Type: Application
    Filed: October 24, 2011
    Publication date: February 16, 2012
    Applicant: Nuance Communications, Inc.
    Inventors: Noriko Imoto, Tetsuya Uda, Takatoshi Watanabe
  • Publication number: 20120035925
    Abstract: Automatic capture and population of task and list items in an electronic task or list surface via voice or audio input through an audio recording-capable mobile computing device is provided. A voice or audio task or list item may be captured for entry into a task application interface or into a list authoring surface interface for subsequent use as task items, reminders, “to do” items, list items, agenda items, work organization outlines, and the like. Captured voice or audio content may be transcribed locally or remotely, and transcribed content may be populated into a task or list authoring surface user interface that may be displayed on the capturing device (e.g., mobile telephone), or that may be stored remotely and subsequently displayed in association with a number of applications on a number of different computing devices.
    Type: Application
    Filed: October 12, 2011
    Publication date: February 9, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Ned B. Friend, Kanav Arora, Marta Rey-Babarro, David De La Brena Valderrama, Erez Kikin-Gil, Matthew J. Kotler, Charles W. Parker, Maya Rodrig, Igor Zaika
  • Publication number: 20120035924
    Abstract: In one implementation, a computer-implemented method includes receiving, at a mobile computing device, ambiguous user input that indicates more than one of a plurality of commands; and determining a current context associated with the mobile computing device that indicates where the mobile computing device is currently located. The method can further include disambiguating the ambiguous user input by selecting a command from the plurality of commands based on the current context associated with the mobile computing device; and causing output associated with performance of the selected command to be provided by the mobile computing device.
    Type: Application
    Filed: July 20, 2011
    Publication date: February 9, 2012
    Applicant: GOOGLE INC.
    Inventors: John Nicholas JITKOFF, Michael J. LEBEAU
  • Publication number: 20120035926
    Abstract: A method, apparatus, and system are directed towards employing machine representations of phonemes to generate and manage domain names, and/or messaging addresses. A user of a computing device may provide an audio input signal such as obtained from human language sounds. The audio input signal is received at a phoneme encoder that converts the sounds into machine representations of the sounds using a phoneme representation viewable as a sequence of alpha-numeric values. The sequence of alpha-numeric values may then be combined with a host name, or the like to generate a URI, a message address, or the like. The generated URI, message address, or the like, may then be used to communication over a network.
    Type: Application
    Filed: October 18, 2011
    Publication date: February 9, 2012
    Applicant: Demand Media, Inc.
    Inventor: Christopher J. Ambler
  • Publication number: 20120033794
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for selectively transcribing messages. Five general approaches are disclosed herein. The first approach is directed to checking for a transcription capable client, which transcribes messages when a client device is capable of receiving transcriptions. The second and third approaches are platform-controlled and user-controlled predefined selective transcription. One aspect of this approach is driven by transcription rules. The fourth approach is user-controlled on-demand selective transcription before the message is stored or deposited for transcription. An example of this is a user transferring an incoming caller to voicemail and indicating that the voicemail be transcribed. The fifth approach is user-controlled on-demand selective transcription after the message is stored. In one embodiment of this approach, a user must specifically request that a stored message be transcribed.
    Type: Application
    Filed: August 6, 2010
    Publication date: February 9, 2012
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: James JACKSON, Philip Cunetto, Mehrad Yasrebi