Speech To Text Systems (epo) Patents (Class 704/E15.043)
  • Publication number: 20130132079
    Abstract: A first plurality of audio features associated with a first utterance may be obtained. A first text result associated with a first speech-to-text translation of the first utterance may be obtained based on an audio signal analysis associated with the audio features, the first text result including at least one first word. A first set of audio features correlated with at least a first portion of the first speech-to-text translation associated with the at least one first word may be obtained. A display of at least a portion of the first text result that includes the at least one first word may be initiated. A selection indication may be received, indicating an error in the first speech-to-text translation, the error associated with the at least one first word.
    Type: Application
    Filed: November 17, 2011
    Publication date: May 23, 2013
    Applicant: Microsoft Corporation
    Inventors: Muhammad Shoaib B. Sehgal, Mirza Muhammad Raza
  • Publication number: 20130132080
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for crowd-sourced data labeling. The system requests a respective response from each of a set of entities. The set of entities includes crowd workers. Next, the system incrementally receives a number of responses from the set of entities until at least one of an accuracy threshold is reached and m responses are received, wherein the accuracy threshold is based on characteristics of the number of responses. Finally, the system generates an output response based on the number of responses.
    Type: Application
    Filed: November 18, 2011
    Publication date: May 23, 2013
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Jason Williams, Tirso Alonso, Barbara B. Hollister, Ilya Dan Melamed
  • Publication number: 20130124202
    Abstract: Provided in some embodiments is a method including receiving ordered script words are indicative of dialogue words to be spoken, receiving audio data corresponding to at least a portion of the dialogue words to be spoken and including timecodes associated with dialogue words, generating a matrix of the ordered script words versus the dialogue words, aligning the matrix to determine hard alignment points that include matching consecutive sequences of ordered script words with corresponding sequences of dialogue words, partitioning the matrix of ordered script words into sub-matrices bounded by adjacent hard-alignment points and including corresponding sub-sets the script and dialogue words between the hard-alignment points, and aligning each of the sub-matrices.
    Type: Application
    Filed: May 28, 2010
    Publication date: May 16, 2013
    Inventor: Walter W. Chang
  • Publication number: 20130124204
    Abstract: Example methods and systems for displaying one or more indications that indicate (i) the direction of a source of sound and (ii) the intensity level of the sound are disclosed. A method may involve receiving audio data corresponding to sound detected by a wearable computing system. Further, the method may involve analyzing the audio data to determine both (i) a direction from the wearable computing system of a source of the sound and (ii) an intensity level of the sound. Still further, the method may involve causing the wearable computing system to display one or more indications that indicate (i) the direction of the source of the sound and (ii) the intensity level of the sound.
    Type: Application
    Filed: April 17, 2012
    Publication date: May 16, 2013
    Applicant: GOOGLE INC.
    Inventors: Adrian Wong, Xiaoyu Miao
  • Publication number: 20130124189
    Abstract: A system and methodology that provides a network-based, e.g., cloud-based, background expert for predicting and/or accomplishing a user's goals is disclosed herein. Moreover, the system monitors, in the background, user generated data and/or publicly available data to determine and/or infer a user's goal, with or without an active indication/request from the user. Typically, the user-generated data can include user conversations, such as, but not limited to, speech data in a voice call, text messages, chat dialogues, etc. Further, the system identifies an action or task that facilitates accomplishment of the user goal in real-time. Moreover, the system can automatically perform the action/task and/or request user authorization prior to performing the action/task.
    Type: Application
    Filed: November 10, 2011
    Publication date: May 16, 2013
    Applicant: AT&T INTELLECTUAL PROPERTY I, LP
    Inventor: CHRISTOPHER BALDWIN
  • Publication number: 20130124203
    Abstract: Provided in some embodiments is a computer implemented method that includes providing script data including script words indicative of dialogue words to be spoken, providing recorded dialogue audio data corresponding to at least a portion of the dialogue words to be spoken, wherein the recorded dialogue audio data includes timecodes associated with recorded audio dialogue words, matching at least some of the script words to corresponding recorded audio dialogue words to determine alignment points, determining that a set of unmatched script words are accurate based on the matching of at least some of the script words matched to corresponding recorded audio dialogue words, generating time-aligned script data including the script words and their corresponding timecodes and the set of unmatched script words determined to be accurate based on the matching of at least some of the script words matched to corresponding recorded audio dialogue words.
    Type: Application
    Filed: May 28, 2010
    Publication date: May 16, 2013
    Inventors: Jerry R. Scoggins, II, Walter W. Chang, David A. Kuspa
  • Publication number: 20130117019
    Abstract: A remote laboratory gateway enables a plurality of students to access and control a laboratory experiment remotely. Access is provided by an experimentation gateway, which is configured to provide secure access to the experiment via a network-centric, web-enabled interface graphical user interface. Experimental hardware is directly controlled by an experiment controller, which is communicatively coupled to the experimentation gateway and which may be a software application, a standalone computing device, or a virtual machine hosted on the experimentation gateway. The remote laboratory of the present specification may be configured for a software-as-a-service business model.
    Type: Application
    Filed: November 7, 2011
    Publication date: May 9, 2013
    Inventors: David Akopian, Arsen Melkonyan, Murillo Pontual, Grant Huang, Andreas Robert Gampe
  • Publication number: 20130117021
    Abstract: A method and system uses an integration application to extract an information feature from a message and to provide the information feature to a vehicle interface device which acts on the information feature to provide a service. The extracted information feature may be automatically acted upon, or may be outputted for review, editing, and/or selection before being acted on. The vehicle interface device may include a navigation system, infotainment system, telephone, and/or a head unit. The message may be received by the vehicle interface device or from a portable or remote device in linked communication with the vehicle interface device. The message may be a voice-based or text-based message. The service may include placing a call, sending a message, or providing navigation instructions using the information feature. An off-board or back-end service provider in communication with the integration application may extract and/or transcribe the information feature and/or provide a service.
    Type: Application
    Filed: October 31, 2012
    Publication date: May 9, 2013
    Applicant: GM Global Technolog Operations LLC
    Inventor: GM Global Technology Operations LLC
  • Publication number: 20130117018
    Abstract: A method, computer program product, and system for voice content transcription during collaboration sessions is described. A method may comprise receiving an indication to provide one or more real-time voice content-to-text content transcriptions to a first collaboration session participant. The one or more real-time voice content-to-text content transcriptions may correspond to voice content of a second collaboration session participant in one or more collaboration sessions including the first collaboration session participant and the second collaboration session participant.
    Type: Application
    Filed: November 3, 2011
    Publication date: May 9, 2013
    Applicant: International Business Machines Corporation
    Inventors: Patrick Joseph O'Sullivan, Edith Helen Stern, Barry E. Willner, Hong Bing Zhang
  • Publication number: 20130110510
    Abstract: A natural language call router forwards an incoming call from a caller to an appropriate destination. The call router has a speech recognition mechanism responsive to words spoken by a caller for producing recognized text corresponding to the spoken words. A robust parsing mechanism is responsive to the recognized text for detecting a class of words in the recognized text. The class is defined as a group of words having a common attribute. An interpreting mechanism is responsive to the detected class for determining the appropriate destination for routing the call.
    Type: Application
    Filed: October 28, 2011
    Publication date: May 2, 2013
    Applicant: Cellco Partnership d/b/a Verizon Wireless
    Inventors: Veronica Klein, Deborah Washington Brown
  • Publication number: 20130110509
    Abstract: A particular method includes receiving, at a representational state transfer endpoint device, a first user input related to a first speech to text conversion performed by a speech to text transcription service. The method also includes receiving, at the representational state transfer endpoint device, a second user input related to a second speech to text conversion performed by the speech to text transcription service. The method includes processing of the first user input and the second user input at the representational state transfer endpoint device to generate speech to text adjustment information.
    Type: Application
    Filed: October 28, 2011
    Publication date: May 2, 2013
    Applicant: Microsoft Corporation
    Inventors: Jeremy Edward Cath, Timothy Edwin Harris, Marc Mercuri, James Oliver Tisdale, III
  • Publication number: 20130110508
    Abstract: An electronic device and a control method are provided. The electronic device includes a voice receiver which receives a voice of a user; a signal processor which performs signal processing on the received voice; a communicator which communicates with a first external device; and a controller which determines a text corresponding to the received voice of the user, and controls the communicator to transmit the signal processed voice and the determined text to the first external device.
    Type: Application
    Filed: September 5, 2012
    Publication date: May 2, 2013
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Eun-sang BAK, Ju-rack CHAE, Jae-hwan KIM, Yu LIU
  • Publication number: 20130103397
    Abstract: Exemplary embodiments provide systems, devices and methods that allow creation and management of lists of items in an integrated manner on an interactive graphical user interface. A user may speak a plurality of list items in a natural unbroken manner to provide an audio input stream into an audio input device. Exemplary embodiments may automatically process the audio input stream to convert the stream into a text output, and may process the text output into one or more n-grams that may be used as list items to populate a list on a user interface.
    Type: Application
    Filed: October 21, 2011
    Publication date: April 25, 2013
    Applicant: WAL-MART STORES, INC.
    Inventors: Dion Almaer, Bernard Paul Cousineau, Ben Galbraith
  • Publication number: 20130103399
    Abstract: Aspects relate to machine recognition of human voices in live or recorded audio content, and delivering text derived from such live or recorded content as real time text, with contextual information derived from characteristics of the audio. For example, volume information can be encoded as larger and smaller font sizes. Speaker changes can be detected and indicated through text additions, or color changes to the font. A variety of other context information can be detected and encoded in graphical rendition commands available through RTT, or by extending the information provided with RTT packets, and processing that extended information accordingly for modifying the display of the RTT text content.
    Type: Application
    Filed: October 21, 2011
    Publication date: April 25, 2013
    Applicant: RESEARCH IN MOTION LIMITED
    Inventor: Scott Peter GAMMON
  • Publication number: 20130096916
    Abstract: A multichannel security system is disclosed, which system is for granting and denying access to a host computer in response to a demand from an access-seeking individual and computer. The access-seeker has a peripheral device operative within an authentication channel to communicate with the security system. The access-seeker initially presents identification and password data over an access channel which is intercepted and transmitted to the security computer. The security computer then communicates with the access-seeker. A biometric analyzer—a voice or fingerprint recognition device—operates upon instructions from the authentication program to analyze the monitored parameter of the individual. In the security computer, a comparator matches the biometric sample with stored data, and, upon obtaining a match, provides authentication. The security computer instructs the host computer to grant access and communicates the same to the access-seeker, whereupon access is initiated over the access channel.
    Type: Application
    Filed: December 1, 2010
    Publication date: April 18, 2013
    Applicant: NETLABS.COM, INC.
    Inventor: Ram Pemmaraju
  • Publication number: 20130085754
    Abstract: A method for providing suggestions includes capturing audio that includes speech and receiving textual content from a speech recognition engine. The speech recognition engine performs speech recognition on the audio signal to obtain the textual content, which includes one or more passages. The method also includes receiving a selection of a portion of a first word in a passage in the textual content, wherein the passage includes multiple words, and retrieving a set of suggestions that can potentially replace the first word. At least one suggestion from the set of suggestions provides a multi-word suggestion for potentially replacing the first word. The method further includes displaying, on a display device, the set of suggestions, and highlighting a portion of the textual content, as displayed on the display device, for potentially changing to one of the suggestions from the set of suggestions.
    Type: Application
    Filed: September 14, 2012
    Publication date: April 4, 2013
    Applicant: Google Inc.
    Inventors: Richard Z. Cohen, Marcus A. Foster, Luca Zanolin
  • Publication number: 20130085755
    Abstract: The present application describes systems, articles of manufacture, and methods for continuous speech recognition for mobile computing devices. One embodiment includes determining whether a mobile computing device is receiving operating power from an external power source or a battery power source, and activating a trigger word detection subroutine in response to determining that the mobile computing device is receiving power from the external power source. In some embodiments, the trigger word detection subroutine operates continually while the mobile computing device is receiving power from the external power source. The trigger word detection subroutine includes determining whether a plurality of spoken words received via a microphone includes one or more trigger words, and in response to determining that the plurality of spoken words includes at least one trigger word, launching an application corresponding to the at least one trigger word included in the plurality of spoken words.
    Type: Application
    Filed: September 15, 2012
    Publication date: April 4, 2013
    Applicant: GOOGLE INC.
    Inventors: Bjorn Erik Bringert, Pawel Pietryka, Peter John Hodgson, Simon Tickner, Henrique Penha, Richard Zarek Cohen, Luca Zanolin, Dave Burke
  • Publication number: 20130080163
    Abstract: According to an embodiment, an information processing apparatus includes a storage unit, a detector, an acquisition unit, and a search unit. The storage unit configured to store therein voice indices, each of which associates a character string included in voice text data obtained from a voice recognition process with voice positional information, the voice positional information indicating a temporal position in the voice data and corresponding to the character string. The acquisition unit acquires reading information being at least a part of a character string representing a reading of a phrase to be transcribed from the voice data played back. The search unit specifies, as search targets, character strings whose associated voice positional information is included in the played-back section information among the character strings included in the voice indices, and retrieves a character string including the reading represented by the reading information from among the specified character strings.
    Type: Application
    Filed: June 26, 2012
    Publication date: March 28, 2013
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Nobuhiro Shimogori, Tomoo Ikeda, Kouji Ueno, Osamu Nishiyama, Hirokazu Suzuki, Manabu Nagao
  • Publication number: 20130080162
    Abstract: Query history expansion may be provided. Upon receiving a spoken query from a user, an adapted language model may be applied to convert the spoken query to text. The adapted language model may comprise a plurality of queries interpolated from the user's previous queries and queries associated with other users. The spoken query may be executed and the results of the spoken query may be provided to the user.
    Type: Application
    Filed: September 23, 2011
    Publication date: March 28, 2013
    Applicant: Microsoft Corporation
    Inventors: Shuangyu Chang, Michael Levit, Bruce Melvin Buntschuh
  • Publication number: 20130066630
    Abstract: A system for correcting errors in automatically generated audio transcriptions includes an audio recorder, a computerized transcription generator, a voice recording, a collection of link data, transcription text, an audio player, a system of cross linking, and a text editor including a text display with a cursor. The system permits a user to correct transcription errors using techniques of jump to position; show position; and track playback.
    Type: Application
    Filed: September 9, 2012
    Publication date: March 14, 2013
    Inventor: Kenneth D. Roe
  • Publication number: 20130058471
    Abstract: Presented are systems and methods for creating a transcription of a conference call. The system joins an audio conference call with a device associated with a participant, of a plurality of participants joined to the conference through one or more associated devices. The system them creates a speech audio file corresponding to a portion of the participant's speech during the conference and converting contemporaneously, at the device, the speech audio file to a local partial transcript. The system then acquires a plurality of partial transcripts from at least one of the associated devices, so that the device can provide a complete transcript.
    Type: Application
    Filed: September 1, 2011
    Publication date: March 7, 2013
    Inventor: Juan Martin Garcia
  • Publication number: 20130060568
    Abstract: Using structured communications within an organization or retail environment, the users establish a fabric of communications that allows external users of devices or applications to integrate in a way that is non-disruptive, measured and structured. An observation platform may be used for performing structured communications. A signal is received from a first communication device at a second communication device associated with a computer system, wherein the computer system is associated with an organization, wherein a first characteristic of the signal corresponds to an audible source and a second characteristic of the signal corresponds to information indicative of a geographic position of the first communication device.
    Type: Application
    Filed: October 31, 2012
    Publication date: March 7, 2013
    Inventors: Steven Paul Russell, Guy R. VanBuskirk, Andrew W. Kittler
  • Publication number: 20130054237
    Abstract: A communications system includes a first communications device cooperating with a second communications device. The first communications device multiplexes a digital speech message and a corresponding text message into a multiplexed signal, and wirelessly transmits the multiplexed signal. The second communications device wirelessly receives the multiplexed signal, de-multiplexes the multiplexed signal digital into the speech message and the corresponding text message, decodes the speech message for an audio output transducer, and operates a text processor on the corresponding text message for display. The corresponding text message is displayed in synchronization with the speech message output by the audio output transducer. A memory is coupled to the text processor for storing the text message, and the text processor is configured to display the stored text message.
    Type: Application
    Filed: August 25, 2011
    Publication date: February 28, 2013
    Applicant: Harris Corporation of the State of Delaware
    Inventors: William N. Furman, John W. Nieto, Marcelo De Risio
  • Publication number: 20130054239
    Abstract: A system, method and computer program product for authoring and presenting discrete data elements and datasets on any computing device are described. Said datasets can comprise of typed, entered or speech-converted text, numbers, images, and sounds. Said system and method feature a user-controlled timer that can be set in intervals of one or more milliseconds and can be used to display said data elements in said dataset in succession. Another feature described is a randomizer which can present said data elements in said dataset in an unpredictable and random order.
    Type: Application
    Filed: August 20, 2012
    Publication date: February 28, 2013
    Inventor: Benjamin Z. Levy
  • Publication number: 20130054240
    Abstract: An apparatus and a method for recognizing a voice by using a lip image are provided. The apparatus includes: a voice recognizer which recognizes a voice of a user and outputs text information based on the recognized voice; a lip shape detector which detects a lip shape of the user; and a voice recognition result verifier which determines whether the text information output by the voice recognizer is correct, by using a result of the detection by the lip shape detector.
    Type: Application
    Filed: August 27, 2012
    Publication date: February 28, 2013
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jong-hyuk JANG, Hee-seob RYU, Kyung-mi PARK, Seung-kwon PARK, Jae-hyun BAE
  • Publication number: 20130054241
    Abstract: A method and system for producing and working with transcripts according to the invention eliminates time inefficiencies. By dispersing a source recording to a transcription team in small segments, so that team members transcribe segments in parallel, a rapid transcription process delivers a fully edited transcript within minutes. Clients can view accurate, grammatically correct, proofread and fact-checked documents that shadow live proceedings by mere minutes. The rapid transcript includes time coding, speaker identification and summary. A viewer application allows a client to view a video recording side-by-side with a transcript. Clicking on a word in the transcript locates the corresponding recorded content; advancing a recording to a particular point locates and displays the corresponding spot in the transcript. The recording is viewed using common video features, and may be downloaded. The client can edit the transcript and insert comments. Any number of colleagues can view and edit simultaneously.
    Type: Application
    Filed: October 30, 2012
    Publication date: February 28, 2013
    Inventor: Adam Michael GOLDBERG
  • Publication number: 20130046537
    Abstract: Some embodiments disclosed herein store a target application and a dictation application. The target application may be configured to receive input from a user. The dictation application interface may include a full overlay mode option, where in response to selection of the full overlay mode option, the dictation application interface is automatically sized and positioned over the target application interface to fully cover a text area of the target application interface to appear as if the dictation application interface is part of the target application interface. The dictation application may be further configured to receive an audio dictation from the user, convert the audio dictation into text, provide the text in the dictation application interface and in response to receiving a first user command to complete the dictation, automatically copy the text from the dictation application interface and inserting the text into the target application interface.
    Type: Application
    Filed: August 19, 2011
    Publication date: February 21, 2013
    Applicant: DOLBEY & COMPANY, INC.
    Inventors: Curtis A. Weeks, Aaron G. Weeks, Stephen E. Barton
  • Publication number: 20130041662
    Abstract: A device and method to control applications using voice data. In one embodiment, a method includes detecting voice data from a user, converting the voice data to text data, matching the text data to an identifier the identifier associated with a list of identifiers for controlling operation of the application, and controlling the application based on the identifier matched with the text data. In another embodiment, voice data may be received from a control device.
    Type: Application
    Filed: August 8, 2011
    Publication date: February 14, 2013
    Inventor: Sriram Sampathkumaran
  • Publication number: 20130041661
    Abstract: A device may include a communication interface configured to receive audio signals associated with audible communications from a user; an output device; and logic. The logic may be configured to determine one or more audio qualities associated with the audio signals, map the one or more audio qualities to at least one value, generate audio-related information based on the mapping, and provide, via the output device during the audible communications, the audio-related information to the user.
    Type: Application
    Filed: August 8, 2011
    Publication date: February 14, 2013
    Applicants: CELLCO PARTNERSHIP, VERIZON NEW JERSEY INC.
    Inventors: Woo Beum Lee, Arvind Basra
  • Publication number: 20130041663
    Abstract: A communication application configured to support a conversation among participants over a communication network. The communication application is configured to (i) support one or more media types within the context of the conversation, (ii) interleave the one or more media types in a time-indexed order within the context of the conversation, (iii) enable the participants to render the conversation including the interleaved one or more media types in either a real-time rendering mode or time-shifted rendering mode, and (iv) seamlessly transition the conversation between the two modes so that the conversation may take place substantially live when in the real-time rendering mode or asynchronously when in the time-shifted rendering mode.
    Type: Application
    Filed: October 12, 2012
    Publication date: February 14, 2013
    Applicant: VOXER IP LLC
    Inventor: VOXER IP LLC
  • Publication number: 20130041646
    Abstract: In accordance with the embodiments of the present invention, a system and method for enabling preview, editing, and transmission of emergency notification messages is provided. The system includes a controller, a microphone, and a speech-to-text engine for receiving an audio message input to the microphone and for convert the audio message to a text message. The resulting text message is displayed on a local display, where a user can edit the message via a text editor. Text and/or audio notification devices are provided for displaying the edited text data as a text message. Other embodiments are disclosed and claimed.
    Type: Application
    Filed: August 10, 2011
    Publication date: February 14, 2013
    Applicant: SIMPLEXGRINNELL LP
    Inventors: Daniel G. Farley, Matthew Farley, John R. Haynes
  • Publication number: 20130035937
    Abstract: A system and method for efficiently transcribing verbal messages to text is provided. Verbal messages are received and at least one of the verbal messages is divided into segments. Automatically recognized text is determined for each of the segments by performing speech recognition and a confidence rating is assigned to the automatically recognized text for each segment. A threshold is applied to the confidence ratings and those segments with confidence ratings that fall below the threshold are identified. The segments that fall below the threshold are assigned to one or more human agents starting with those segments that have the lowest confidence ratings. Transcription from the human agents is received for the segments assigned to that agent. The transcription is assembled with the automatically recognized text of the segments not assigned to the human agents as a text message for the at least one verbal message.
    Type: Application
    Filed: August 6, 2012
    Publication date: February 7, 2013
    Inventors: Mike O. Webb, Bruce J. Peterson, Janet S. Kaseda
  • Publication number: 20130035936
    Abstract: A transcription system is applicable to transcription for a language in which there is limited pronunciation and/or acoustic data. A transcription station is configured using pronunciation data and acoustic data for use with the language. The pronunciation data and/or the acoustic data is initially from another dialect of a language, another language from a language group, or is universal (e.g., not specific to any particular language). A partial transcription of the audio recording is accepted via the transcription station (e.g., from a transcriptionist). One or more repetitions of one or more portions of the partial transcription are identified in the audio recording, and can be accepted during transcription. The pronunciation data and/or the acoustic data is updated in a bootstrapping manner during transcription, thereby improving the efficiency of the transcription process.
    Type: Application
    Filed: August 1, 2012
    Publication date: February 7, 2013
    Applicant: Nexidia Inc.
    Inventors: Jacob B. Garland, Marsal Gavalda
  • Publication number: 20130030805
    Abstract: According to one embodiment, a transcription support system supports transcription work to convert voice data to text. The system includes a first storage unit configured to store therein the voice data; a playback unit configured to play back the voice data; a second storage unit configured to store therein voice indices, each of which associates a character string obtained from a voice recognition process with voice positional information, for which the voice positional information is indicative of a temporal position in the voice data and corresponds to the character string; a text creating unit that creates the text in response to an operation input of a user; and an estimation unit configured to estimate already-transcribed voice positional information indicative of a position at which the creation of the text is completed in the voice data based on the voice indices.
    Type: Application
    Filed: March 15, 2012
    Publication date: January 31, 2013
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Hirokazu Suzuki, Nobuhiro Shimogori, Tomoo Ikeda, Kouji Ueno, Osamu Nishiyama, Manabu Nagao
  • Publication number: 20130030804
    Abstract: A method is described for improving the accuracy of a transcription generated by an automatic speech recognition (ASR) engine. A personal vocabulary is maintained that includes replacement words. The replacement words in the personal vocabulary are obtained from personal data associated with a user. A transcription is received of an audio recording. The transcription is generated by an ASR engine using an ASR vocabulary and includes a transcribed word that represents a spoken word in the audio recording. Data is received that is associated with the transcribed word. A replacement word from the personal vocabulary is identified, which is used to re-score the transcription and replace the transcribed word.
    Type: Application
    Filed: July 26, 2011
    Publication date: January 31, 2013
    Inventors: George Zavaliagkos, William F. Ganong, III, Uwe H. Jost, Shreedhar Madhavapeddi, Gary B. Clayton
  • Publication number: 20130030807
    Abstract: The wireless voice recognition system for data retrieval comprises a server, a database and an input/output device, operably connected to the server. When the user speaks, the voice transmission is converted into a data stream using a specialized user interface. The input/output device and the server exchange the data stream. The server uses a programming interface having an engine to match and compare the stream of audible data to a data element of selected searchable information. A data element of recognized information is generated and transferred to the input/output device for user verification.
    Type: Application
    Filed: September 28, 2012
    Publication date: January 31, 2013
    Inventors: Stephen S. Burns, Mickey W. Kowitz, Michael F. Bell
  • Publication number: 20130030806
    Abstract: In an embodiment, a transcription support system includes: a first storage, a playback unit, a second storage, a text generating unit, an estimating unit, and a setting unit. The first storage stores the voice data therein; a playback unit plays back the voice data; and a second storage stores voice indices, each of which associates a character string obtained from a voice recognition process with voice positional information, for which the voice positional information is indicative of a temporal position in the voice data and corresponds to the character string. The text creating unit creates text; the estimating unit estimates already-transcribed voice positional information based on the voice indices; and the setting unit sets a playback starting position that indicates a position at which playback is started in the voice data based on the already-transcribed voice positional information.
    Type: Application
    Filed: March 15, 2012
    Publication date: January 31, 2013
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Kouji Ueno, Nobuhiro Shimogori, Tomoo Ikeda, Osamu Nisiyama, Hirokazu Suzuki, Manabu Nagao
  • Publication number: 20130024195
    Abstract: A method for facilitating the updating of a language model includes receiving, at a client device, via a microphone, an audio message corresponding to speech of a user; communicating the audio message to a first remote server; receiving, that the client device, a result, transcribed at the first remote server using an automatic speech recognition system (“ASR”), from the audio message; receiving, at the client device from the user, an affirmation of the result; storing, at the client device, the result in association with an identifier corresponding to the audio message; and communicating, to a second remote server, the stored result together with the identifier.
    Type: Application
    Filed: September 15, 2012
    Publication date: January 24, 2013
    Inventors: Marc White, Igor Roditis Jablokov, Victor Roditis Jablokov
  • Publication number: 20130018655
    Abstract: A method of providing speech transcription performance indication includes receiving, at a user device data representing text transcribed from an audio stream by an ASR system, and data representing a metric associated with the audio stream; displaying, via the user device, said text; and via the user device, providing, in user-perceptible form, an indicator of said metric. Another method includes displaying, by a user device, text transcribed from an audio stream by an ASR system; and via the user device, providing, in user-perceptible form, an indicator of a level of background noise of the audio stream. Another method includes receiving data representing an audio stream; converting said data representing an audio stream to text via an ASR system; determining a metric associated with the audio stream; transmitting data representing said text to a user device; and transmitting data representing said metric to the user device.
    Type: Application
    Filed: September 15, 2012
    Publication date: January 17, 2013
    Inventors: James Richard Terrell, II, Marc White, Igor Roditls Jablokov
  • Publication number: 20130018656
    Abstract: A method for facilitating mobile phone messaging, such as text messaging and instant messaging, includes receiving audio data communicated from the mobile communication device, the audio data representing an utterance that is intended to be at least a portion of the text of the message that is to be sent from the mobile phone to a recipient; transcribing the utterance to text based on the received audio data to generate a transcription; and applying a filter to the transcribed text to generate a filtered transcription, the text of which is intended to mimic language patterns of mobile device messaging that is performed manually by users. The method may also be applied to the audio data of a voicemail, with the filtered, transcribed text being communicated to a mobile phone as, for example, an SMS text message.
    Type: Application
    Filed: September 15, 2012
    Publication date: January 17, 2013
    Inventors: Marc White, Cliff Strohofer
  • Publication number: 20130013307
    Abstract: Differential dynamic content delivery including providing a session document for a presentation, wherein the session document includes a session grammar and a session structured document; selecting from the session structured document a classified structural element in dependence upon user classifications of a user participant in the presentation; presenting the selected structural element to the user; streaming presentation speech to the user including individual speech from at least one user participating in the presentation; converting the presentation speech to text; detecting whether the presentation speech contains simultaneous individual speech from two or more users; and displaying the text if the presentation speech contains simultaneous individual speech from two or more users.
    Type: Application
    Filed: September 14, 2012
    Publication date: January 10, 2013
    Applicant: Nuance Communications, Inc.
    Inventors: William Kress Bodin, Michael John Burkhart, Daniel G. Eisenhauer, Daniel Mark Schumacher, Thomas J. Watson
  • Publication number: 20130013305
    Abstract: Various embodiments of the present invention include concept-service components of content-search-service systems which employ ontologies and vocabularies prepared for particular categories of content at particular times in order to score transcripts prepared from content items to enable a search-service component of a content-search-service system to assign estimates of the relatedness of portions of a content item to search criteria in order to render search results to clients of the content-search-service system. The concept-service component processes a search request to generate lists of related terms, and then employs the lists of related terms to process transcripts in order to score transcripts based on information contained in the ontologies.
    Type: Application
    Filed: June 15, 2012
    Publication date: January 10, 2013
    Applicant: Limelight Networks, Inc.
    Inventors: Jonathan Thompson, Vijay Chemburkar, David Bargeron, Soam Acharya
  • Publication number: 20130006626
    Abstract: A voice-based telecommunications login system which includes a login process controller; a speech recognition module; a speaker verification module; a speech synthesis module; and a user database. Responsive to a user-provided first verbal answer to a first verbal question, the first verbal answer is converted to text and compared with data previously stored in the user database. The speech synthesis module provides a second question to the user, and responsive to a user-provided second verbal answer to the second question, the speaker verification module compares the second verbal answer with a voice print of the user previously stored in the user database and validates that the second verbal answer matches a voice print of the user previously stored in the user database. Also disclosed is a method of logging in to the telecommunications system and a computer program product for logging in to the telecommunications system.
    Type: Application
    Filed: June 29, 2011
    Publication date: January 3, 2013
    Applicant: International Business Machines Corporation
    Inventors: Chandrasekara Aiyer, Brent W. Bennet, Elizabeth J. Carey, Chuanfeng Li, Faisal Mansoor, Duncan E. Russell, Aditi Sharma
  • Publication number: 20130006625
    Abstract: A system, method, and computer program product for automatically analyzing multimedia data audio content are disclosed. Embodiments receive multimedia data, detect portions having specified audio features, and output a corresponding subset of the multimedia data and generated metadata. Audio content features including voices, non-voice sounds, and closed captioning, from downloaded or streaming movies or video clips are identified as a human probably would do, but in essentially real time. Particular speakers and the most meaningful content sounds and words and corresponding time-stamps are recognized via database comparison, and may be presented in order of match probability. Embodiments responsively pre-fetch related data, recognize locations, and provide related advertisements. The content features may be also sent to search engines so that further related content may be identified. User feedback and verification may improve the embodiments over time.
    Type: Application
    Filed: June 28, 2011
    Publication date: January 3, 2013
    Applicant: Sony Corporation
    Inventors: Priyan Gunatilake, Djung Nguyen, Abhishek Patil, Dipendu Saha
  • Publication number: 20130006628
    Abstract: A transcript of a group interaction is generated from audio source data representing the group interaction. The transcript includes a sequence of lines of text, each line corresponding to an audible utterance in the audio source data. A conversation path is generated from the transcript by labeling each transcript line with an identifier identifying the speaker of the corresponding utterance in the audio source data. A representation of the group interaction is generated by associating the conversation path with a set of voice profiles, each voice profile corresponding to an identified speaker in the conversation path.
    Type: Application
    Filed: September 13, 2012
    Publication date: January 3, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Anand Krishnaswamy, Rajeev Palanki
  • Publication number: 20130006627
    Abstract: A method of communicating between a sender and a recipient via a personalized message is disclosed comprising: (a) identifying text, via the user interface of a communication device, of a desired lyric phrase from within a pre-existing audio recording; (b) extracting audio substantially associated with the desired lyric phrase from the pre-existing recording into a desired audio clip; (c) inputting personalized text via the user interface; (d) creating the personalized message with the sender identification, the personalized text and access to the desired audio clip; (e) sending an electronic message to the electronic address of the recipient, wherein the electronic message may be an SMS/EMS/MMS message, instant message or email message including a link to the personalized message or an EMS/MMS or email message including the personalized message. An associated method of earning money from the communication along with associated systems are also disclosed.
    Type: Application
    Filed: January 23, 2012
    Publication date: January 3, 2013
    Applicant: Rednote LLC
    Inventors: Scott Guthery, Richard van den Bosch
  • Publication number: 20120330660
    Abstract: A method and system for determining and communicating biometrics of a recorded speaker in a voice transcription process. An interactive voice response system receives a request from a user for a transcription of a voice file. A profile associated with the requesting user is obtained, wherein the profile comprises biometric parameters and preferences defined by the user. The requested voice file is analyzed for biometric elements according to the parameters specified in the user's profile. Responsive to detecting biometric elements in the voice file that conform to the parameters specified in the user's profile, a transcription output of the voice file is modified according to the preferences specified in the user's profile for the detected biometric elements to form a modified transcription output file. The modified transcription output file may then be provided to the requesting user.
    Type: Application
    Filed: September 5, 2012
    Publication date: December 27, 2012
    Applicant: International Business Machines Corporation
    Inventor: Peeyush Jaiswal
  • Publication number: 20120330661
    Abstract: An electronic device may capture a voice command from a user. The electronic device may store contextual information about the state of the electronic device when the voice command is received. The electronic device may transmit the voice command and the contextual information to computing equipment such as a desktop computer or a remote server. The computing equipment may perform a speech recognition operation on the voice command and may process the contextual information. The computing equipment may respond to the voice command. The computing equipment may also transmit information to the electronic device that allows the electronic device to respond to the voice command.
    Type: Application
    Filed: September 5, 2012
    Publication date: December 27, 2012
    Inventor: Aram M. Lindahl
  • Publication number: 20120330658
    Abstract: Systems and methods to process and/or present information relating to voice messages for a user that are received from other persons. In one embodiment, a method implemented in a data processing system includes: receiving first data associated with prior communications or activities for a first user on a mobile device; receiving a voice message for the first user; transcribing the voice message using the first data to provide a transcribed message; and sending the transcribed message to the mobile device for display to the user.
    Type: Application
    Filed: June 20, 2012
    Publication date: December 27, 2012
    Applicant: XOBNI, INC.
    Inventor: Jeffrey Bonforte
  • Publication number: 20120323572
    Abstract: An automatic speech recognizer is used to produce a structured document representing the contents of human speech. A best practice is applied to the structured document to produce a conclusion, such as a conclusion that required information is missing from the structured document. Content is inserted into the structured document based on the conclusion, thereby producing a modified document. The inserted content may be obtained by prompting a human user for the content and receiving input representing the content from the human user.
    Type: Application
    Filed: June 19, 2012
    Publication date: December 20, 2012
    Inventors: Detlef Koll, Juergen Fritsch, Michael Finke