Speech To Text Systems (epo) Patents (Class 704/E15.043)
  • Publication number: 20150149167
    Abstract: Aspects of this disclosure are directed to accurately transforming speech data into one or more word strings that represent the speech data. A speech recognition device may receive the speech data from a user device and an indication of the user device. The speech recognition device may execute a speech recognition algorithm using one or more user and acoustic condition specific transforms that are specific to the user device and an acoustic condition of the speech data. The execution of the speech recognition algorithm may transform the speech data into one or more word strings that represent the speech data. The speech recognition device may estimate which one of the one or more word strings more accurately represents the received speech data.
    Type: Application
    Filed: September 30, 2011
    Publication date: May 28, 2015
    Applicant: GOOGLE INC.
    Inventors: Françoise Beaufays, Johan Schalkwyk, Vincent Olivier Vanhoucke, Petar Stanisa Aleksic
  • Publication number: 20140372115
    Abstract: In one aspect, this application describes a computer-readable storage medium storing instructions that, when executed by one or more processing devices, cause the one or more processing devices to perform operations that include receiving, from a user of a computing device, a spoken input that includes a note and an activation phrase that indicates an intent to record the note. The operations also include determining a target address based at least in part on an identifier associated with a registered user of the computing device, wherein the target address is determined without receiving, from the user, an input indicating the target address when the spoken input is received. The operations also include defining a communication that includes a machine-generated transcript of the note, and sending the communication to the target address.
    Type: Application
    Filed: September 30, 2011
    Publication date: December 18, 2014
    Applicant: GOOGLE, INC.
    Inventors: Michael J. LeBeau, John Nicholas Jitkoff
  • Publication number: 20140372114
    Abstract: In one aspect, this application describes a computer-readable storage medium storing instructions that, when executed by one or more processing devices, cause the one or more processing devices to perform operations that include receiving, from a user of a computing device, a spoken input that includes a note and an activation phrase that indicates an intent to record the note. The operations also include determining a target address based at least in part on an identifier associated with a registered user of the computing device, wherein the target address is determined without receiving, from the user, an input indicating the target address when the spoken input is received. The operations also include defining a communication that includes a machine-generated transcript of the note, and sending the communication to the target address.
    Type: Application
    Filed: August 5, 2011
    Publication date: December 18, 2014
    Applicant: GOOGLE INC.
    Inventors: Michael J. LeBeau, John Nicholas Jitkoff
  • Publication number: 20140122069
    Abstract: A mechanism is provided for utilizing content analytics to automate corrections and improve speech recognition accuracy. A set of current corrected content elements is identified within a transcribed corrected media. Each current corrected content element in the set of current corrected content elements is weighted with an assigned weight based on one or more predetermined weighting conditions and a context of the transcribed corrected media. A confidence level is associated with each corrected content element based on the assigned weight. The set of current corrected content elements and the confidence level associated with each current corrected content element in a set of corrected elements is stored in a storage device for use in a subsequent transcription correction.
    Type: Application
    Filed: October 30, 2012
    Publication date: May 1, 2014
    Applicant: International Business Machines Corporation
    Inventors: Seth E. Bravin, Brian J. Cragun, Robert A. Foyle, Ali Sobhi
  • Publication number: 20140122059
    Abstract: Voice-based input is used to operate a media device and/or to search for media content. Voice input is received by a media device via one or more audio input devices and is translated into a textual representation of the voice input. The textual representation of the voice input is used to search one or more cache mappings between input commands and one or more associated device actions and/or media content queries. One or more natural language processing techniques may be applied to the translated text and the resulting text may be transmitted as a query to a media search service. A media search service returns results comprising one or more content item listings and the results may be presented on a display to a user.
    Type: Application
    Filed: October 31, 2012
    Publication date: May 1, 2014
    Applicant: TiVo Inc.
    Inventors: Mukesh Patel, Lu Silverstein, Srinivas Jandhyala
  • Publication number: 20140122070
    Abstract: A system for converting audible air traffic control instructions for pilots operating from an air facility to textual format. The system may comprise a processor connected to a jack of the standard pilot headset and a separate portable display screen connected to the processor. The processor may have a language converting functionality which can recognize traffic control nomenclature and display messages accordingly. Displayed text may be limited to information intended for a specific aircraft. The display may show hazardous discrepancies between authorized altitudes and headings and actual altitudes and headings. The display may be capable of correction by the user, and may utilize Global Positioning System (GPS) to obtain appropriate corrections. The system may date and time stamp communications and hold the same in memory. The system may have computer style user functions such as scrollability and operating prompts.
    Type: Application
    Filed: October 30, 2012
    Publication date: May 1, 2014
    Inventors: Robert S. Prus, Konrad Robert Sliwowski
  • Publication number: 20140095158
    Abstract: An apparatus, system and method for continuously capturing ambient voice and using it to update content delivered to a user of an electronic device are provided. Subsets of words are continuously extracted from speech and used to deliver content relevant to the subsets of words.
    Type: Application
    Filed: October 2, 2012
    Publication date: April 3, 2014
    Inventor: Matthew VROOM
  • Publication number: 20140088961
    Abstract: Mechanisms for performing dynamic automatic speech recognition on a portion of multimedia content are provided. Multimedia content is segmented into homogeneous segments of content with regard to speakers and background sounds. For the at least one segment, a speaker providing speech in an audio track of the at least one segment is identified using information retrieved from a social network service source. A speech profile for the speaker is generated using information retrieved from the social network service source, an acoustic profile for the segment is generated based on the generated speech profile, and an automatic speech recognition engine is dynamically configured for operation on the at least one segment based on the acoustic profile. Automatic speech recognition operations are performed on the audio track of the at least one segment to generate a textual representation of speech content in the audio track corresponding to the speaker.
    Type: Application
    Filed: September 26, 2012
    Publication date: March 27, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Elizabeth V. Woodward, Shunguo Yan
  • Publication number: 20140086395
    Abstract: In an embodiment, a system maintains a database of a plurality of persons. The database includes an audio clip of a pronunciation of a name of a first person in the database. The system determines from a calendar database that a second person has an event in common with the first person, and transmits to a device associated with the second person an indication that the database includes the pronunciation of the name of the first person.
    Type: Application
    Filed: September 25, 2012
    Publication date: March 27, 2014
    Applicant: Linkedln Corporation
    Inventors: Jonathan Redfern, Manish Mohan Sharma, Seth McLaughlin
  • Publication number: 20140074465
    Abstract: A system and method for generating an acoustic database for a particular narrator that does not require said narrator to recite a pre-defined script. The system and method generate an acoustic database by using voice recognition or speech-to-text algorithms to automatically generate a text script of a voice message while simultaneously or near-simultaneously sampling the voice message to create the acoustic database. The acoustic database may be associated with the narrator of the voice message by an identifier, such as a telephone number. The acoustic database may then be used by a text-to-speech processor to read a text message when the narrator is identified as the sender of the text massage, providing an audio output of the contents of the text message with a simulation of the sender's voice. The user of the system may be provided an audio message that sounds like the originator of the text message.
    Type: Application
    Filed: September 11, 2012
    Publication date: March 13, 2014
    Applicant: DELPHI TECHNOLOGIES, INC.
    Inventors: Christopher A. Hedges, Bradley S. Coon
  • Publication number: 20140074466
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data encoding an utterance and environmental data, obtaining a transcription of the utterance, identifying an entity using the environmental data, submitting a query to a natural language query processing engine, wherein the query includes at least a portion of the transcription and data that identifies the entity, and obtaining one or more results of the query.
    Type: Application
    Filed: September 25, 2012
    Publication date: March 13, 2014
    Applicant: Google Inc.
    Inventors: Matthew Sharifi, Gheorghe Postelnicu
  • Publication number: 20140074464
    Abstract: Some embodiments of the inventive subject matter may include a method for detecting speech loss and supplying appropriate recollection data to the user. The method can include detecting a speech stream from a user. The method can include converting the speech stream to text. The method can include storing the text. The method can include detecting an interruption to the speech stream, wherein the interruption to the speech stream indicates speech loss by the user. The method can include searching a catalog using the text as a search parameter to find relevant catalog data. The method can include presenting the relevant catalog data to remind the user about the speech stream.
    Type: Application
    Filed: September 12, 2012
    Publication date: March 13, 2014
    Applicant: International Business Machines Corporation
    Inventor: Scott H. Berens
  • Publication number: 20140058727
    Abstract: A multimedia recording system is provided. The multimedia recording system includes a storage module, a recognition module, and a tagging module. The storage module stores a multimedia file corresponding to multimedia data with audio content, wherein the multimedia data is received through a computer network. The recognition module converts the audio content of the multimedia data into text. The tagging module produces tag information according to the text, wherein the tag information corresponds to portion(s) of the multimedia file. The disclosure further provides a multimedia recording method.
    Type: Application
    Filed: August 28, 2012
    Publication date: February 27, 2014
    Applicant: HON HAI PRECISION INDUSTRY CO., LTD.
    Inventors: TAI-MING GOU, YI-WEN CAI, CHUN-MING CHEN
  • Publication number: 20140046660
    Abstract: A computer-implemented method for voice based mood analysis includes receiving an acoustic speech of a plurality of words from a user in response to the user utilizing speech to text mode. The computer-implemented method also includes analyzing the acoustic speech to distinguish voice patterns. Further, the computer-implemented method includes measuring a plurality of tone parameters from the voice patterns, wherein the tone parameters comprises voice decibel, timbre and pitch. Furthermore, the computer-implemented method includes identifying mood of the user based on the plurality of tone parameters. Moreover, the computer-implemented method includes streaming appropriate web content to the user based on the mood of the user.
    Type: Application
    Filed: August 10, 2012
    Publication date: February 13, 2014
    Applicant: Yahoo! Inc
    Inventor: Gaurav KAMDAR
  • Publication number: 20140039888
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing speech recognition using models that are based on where, within a building, a speaker makes an utterance are disclosed. The methods, systems, and apparatus include actions of receiving data corresponding to an utterance, and obtaining location indicia for an area within a building where the utterance was spoken. Further actions include selecting one or more models for speech recognition based on the location indicia, wherein each of the selected one or more models is associated with a weight based on the location indicia. Additionally, the actions include generating a composite model using the selected one or more models and the respective weights of the selected one or more models. And the actions also include generating a transcription of the utterance using the composite model.
    Type: Application
    Filed: October 15, 2012
    Publication date: February 6, 2014
    Inventors: Gabriel Taubman, Brian Strope
  • Publication number: 20140019126
    Abstract: Speech-to-text recognition of non-dictionary words by an electronic device having a speech-to-text recognizer and a global positing system (GPS) includes receiving a user's speech and attempting to convert the speech to text using at least a word dictionary; in response to a portion of the speech being unrecognizable, determining if the speech contains a location-based phrase that contains a term relating to any combination of a geographic origin or destination, a current location, and a route; retrieving from a global positioning system location data that are within geographical proximity to the location-based phrase, wherein the location data include any combination of street names, business names, places of interest, and municipality names; updating the word dictionary by temporarily adding words from the location data to the word dictionary; and using the updated word dictionary to convert the previously unrecognized portion of the speech to text.
    Type: Application
    Filed: July 13, 2012
    Publication date: January 16, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Zachary W. Abrams, Paula Besterman, Pamela S. Ross, Eric Woods
  • Publication number: 20140006021
    Abstract: Systems and methods for adjusting a discrete acoustic model complexity in an automatic speech recognition system. In some cases, the systems and methods include a discrete acoustic model, a pronunciation dictionary, and optionally a language model or a grammar model. In some cases, the methods include providing a speech database comprising multiple pairs, each pair including a speech recording called a waveform and an orthographic transcription of the waveform; constructing the discrete acoustic model by converting the orthographic transcription into a phonetic transcription; parameterizing the speech database by transforming the waveforms into a sequence of feature vectors and normalizing the sequences of the feature vectors; and training the acoustic model with the normalized sequences of the feature vectors, wherein the complexity PI of the discrete acoustic model is further adjusted through a procedure that uses a given generalization coefficient N. Other implementations are described.
    Type: Application
    Filed: August 6, 2012
    Publication date: January 2, 2014
    Applicant: VOICE LAB SP. Z O.O.
    Inventor: Marcin Kuropatwinski
  • Publication number: 20140006020
    Abstract: A transcription method, apparatus and computer program product are provided to permit a transcripted text report to be efficiently reviewed. In the context of a method, an audio file and a transcripted text report corresponding to the audio file may be received. For each of a plurality of positions within the transcripted text report, the method correlates the respective position within the transcripted text report with a corresponding position within the audio file. The method also augments the transcripted text report to include a plurality of selectable elements. Each selectable element is associated with a respective position within the transcripted text report. The selectable elements are responsive to user actuation in order to cause the audio file to move to the corresponding position. A corresponding apparatus and computer program product are also provided.
    Type: Application
    Filed: June 29, 2012
    Publication date: January 2, 2014
    Applicant: MCKESSON FINANCIAL HOLDINGS
    Inventors: Tak M. Ko, Dragan Zigic
  • Publication number: 20130346075
    Abstract: Embodiments of apparatus, computer-implemented methods, systems, devices, and computer-readable media are described herein for facilitation of concurrent consumption of media content by a first user of a first computing device and a second user of a second computing device. In various embodiments, facilitation may include superimposition of an animation of the second user over the media content presented on the first computing device, based on captured visual data of the second user received from the second computing device. In various embodiments, the animation may be visually emphasized on determination of the first user's interest in the second user. In various embodiments, facilitation may include conditional alteration of captured visual data of the first user based at least in part on whether the second user has been assigned a trusted status, and transmittal of the altered or unaltered visual data of the first user to the second computing device.
    Type: Application
    Filed: June 25, 2012
    Publication date: December 26, 2013
    Inventors: Paul I. Felkai, Annie Harper, Ratko Jagodic, Rajiv K. Mongia, Garth Shoemaker
  • Publication number: 20130339014
    Abstract: A computer-implemented arrangement is described for performing cepstral mean normalization (CMN) in automatic speech recognition. A current CMN function is stored in a computer memory as a previous CMN function. The current CMN function is updated based on a current audio input to produce an updated CMN function. The updated CMN function is used to process the current audio input to produce a processed audio input. Automatic speech recognition of the processed audio input is performed to determine representative text. If the audio input is not recognized as representative text, the updated CMN function is replaced with the previous CMN function.
    Type: Application
    Filed: June 13, 2012
    Publication date: December 19, 2013
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Yun Tang, Venkatesh Nagesha
  • Publication number: 20130332158
    Abstract: The technology of the present application provides a speech recognition system with at least two different speech recognition engines or a single engine speech recognition engine with at least two different modes of operation. The first speech recognition being used to match audio to text, which text may be words or phrases. The matched audio and text is used by a training module to train a user profile for a natural language speech recognition engine, which is at least one of the two different speech recognition engines or modes. An evaluation module evaluates when the user profile is sufficiently trained to convert the speech recognition engine from the first speech recognition engine or mode to the natural language speech recognition or mode.
    Type: Application
    Filed: June 8, 2012
    Publication date: December 12, 2013
    Applicant: NVOQ INCORPORATED
    Inventors: Charles Corfield, Brian Marquette
  • Publication number: 20130325463
    Abstract: An apparatus, system, and method for identifying related content based on eye movements are disclosed. The system, apparatus, and method display content to a user concurrently in two or more windows, identify areas in each of the windows where a user's eyes focus, extract keywords from the areas where the user's eyes focus in each of the windows, search a communications network for related content using the keywords, and notify the user of the related content by displaying it concurrently with the two or more windows. The keywords are extracted from one or more locations in the two or more windows in which the user's eyes pause for a predetermined amount of time or, when the user's eyes pause on an image, from at least one of the text adjacent to and the metadata associated with that image.
    Type: Application
    Filed: May 31, 2012
    Publication date: December 5, 2013
    Applicant: CA, INC.
    Inventors: Steven L. GREENSPAN, Carrie E. GATES
  • Publication number: 20130325462
    Abstract: A system and method for assigning one or more tags to an image file. In one aspect, a server computer receives an image file captured by a client device. In one embodiment, the image file includes an audio component embedded therein by the client device, where the audio component was spoken by a user of the client device as a tag of the image file. The server computer determines metadata associated with the image file and identifies a dictionary of potential textual tags from the metadata. The server computer determines a textual tag from the audio component and from the dictionary of potential textual tags. The server computer then associates the textual tag with the image file as additional metadata.
    Type: Application
    Filed: May 31, 2012
    Publication date: December 5, 2013
    Applicant: Yahoo! inc.
    Inventors: Oren Somekh, Nadav Golbandi, Liran Katzir, Ronny Lempel, Yoelle Maarek
  • Publication number: 20130325461
    Abstract: According to some embodiments, a user device may receive business enterprise information from a remote enterprise server. The user device may then automatically convert at least some of the business enterprise information into speech output provided to a user of the user device. Speech input from the user may be received via and converted by the user device. The user device may then interact with the remote enterprise server in accordance with the converted speech input and the business enterprise information.
    Type: Application
    Filed: May 30, 2012
    Publication date: December 5, 2013
    Inventors: Guy Blank, Guy Soffer
  • Publication number: 20130317817
    Abstract: Computer models are powerful resources that can be accessed by remote users. Models can be copied without authorization or can become an out-of-date version. A model with a signature, referred to herein as a “signed” model, can indicate the signature without affecting usage by users who are unaware that the model contains the signature. The signed model can respond to an input in a steganographic way such that only the designer of the model knows that the signature is embedded in the model. The response is a way to check the source or other characteristics of the model. The signed model can include embedded signatures of various degrees of detectability to respond to select steganographic inputs with steganographic outputs. In this manner, a designer of signed models can prove whether an unauthorized copy of the signed model is being used by a third party while using publically-available user interfaces.
    Type: Application
    Filed: May 22, 2012
    Publication date: November 28, 2013
    Applicant: Nuance Communications, Inc.
    Inventors: William F. Ganong, III, Paul J. Vozila, Puming Zhan
  • Publication number: 20130311177
    Abstract: A methodology may be provided that automatically annotates web conference application sharing (e.g., sharing scenes and/or slides) based on voice and/or web conference data. In one specific example, a methodology may be provided that threads the annotations and assigns authorship to the correct resources.
    Type: Application
    Filed: May 16, 2012
    Publication date: November 21, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Paul R. Bastide, Robert E. Loredo, Matthew E. Broomhall, Ralph E. LeBlanc, JR., Fang Lu
  • Publication number: 20130304451
    Abstract: Processes capable of accepting linguistic input in one or more languages are generated by re-using existing linguistic components associated with a different anchor language, together with machine translation components that translate between the anchor language and the one or more languages. Linguistic input is directed to machine translation components that translate such input from its language into the anchor language. Those existing linguistic components are then utilized to initiate responsive processing and generate output. Optionally, the output is directed through the machine translation components. A language identifier can initially receive linguistic input and identify the language within which such linguistic input is provided to select an appropriate machine translation component.
    Type: Application
    Filed: May 10, 2012
    Publication date: November 14, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Ruhi Sarikaya, Daniel Boies, Fethiye Asli Celikyilmaz, Anoop K. Deoras, Dustin Rigg Hillard, Dilek Z. Hakkani-Tur, Gokhan Tur, Fileno A. Alleva
  • Publication number: 20130297308
    Abstract: The present disclosure may provide an electronic device capable of audio recording. The electronic device may include a recording function unit configured to record an external sound to store it as an audio file, a conversion unit configured to convert a voice contained in the sound into a text based on a speech-to-text (STT) conversion, and a controller configured to detect a core keyword from the text, and set the detected core keyword to at least part of a file name for the audio file.
    Type: Application
    Filed: October 22, 2012
    Publication date: November 7, 2013
    Applicant: LG ELECTRONICS INC.
    Inventors: Bon Joon KOO, Yireun KIM
  • Publication number: 20130297307
    Abstract: A dictation module is described herein which receives and interprets a complete utterance of the user in incremental fashion, that is, one incremental portion at a time. The dictation module also provides rendered text in incremental fashion. The rendered text corresponds to the dictation module's interpretation of each incremental portion. The dictation module also allows the user to modify any part of the rendered text, as it becomes available. In one case, for instance, the dictation module provides a marking menu which includes multiple options by which a user can modify a selected part of the rendered text. The dictation module also uses the rendered text (as modified or unmodified by the user using the marking menu) to adjust one or more models used by the dictation model to interpret the user's utterance.
    Type: Application
    Filed: May 1, 2012
    Publication date: November 7, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Timothy S. Paek, Bongshin Lee, Bo-June Hsu
  • Publication number: 20130289983
    Abstract: An electronic device and a method of controlling the electronic device are provided. According to an embodiment, the electronic device may recognize a first sound signal output from at least one external device connectable through a communication unit and to control a sound output of at least one of the at least one external device or the sound output unit when a second sound signal is output through the sound output unit.
    Type: Application
    Filed: April 26, 2012
    Publication date: October 31, 2013
    Inventors: Hyorim PARK, Jihyun KIM, Jihwan KIM, Seijun LIM
  • Publication number: 20130262103
    Abstract: A device and method are disclosed for testing the intelligibility of audio announcement systems. The device may include a microphone, a translation engine, a processor, a memory associated with the processor, and a display. The microphone of the analyzer may be coupled to the translation engine, which in-turn may be coupled to the processor, which is in-turn may be coupled to the memory and the display. The translation engine can convert audio speech input from the microphone into data output. The processor can receive the data output and can apply a scoring algorithm thereto. The algorithm can compare the received data against data that is stored in the memory of the analyzer and calculates the accuracy of the received data. The algorithm may translate the calculated accuracy into a standardized STI intelligibility score that is then presented on the display of the analyzer.
    Type: Application
    Filed: March 28, 2012
    Publication date: October 3, 2013
    Applicant: SIMPLEXGRINNELL LP
    Inventor: Rodger Reiswig
  • Publication number: 20130262104
    Abstract: A procurement system may include a first interface configured to receive a query from a user, a command module configured to parameterize the query, an intelligent search and match engine configured to compare the parameterized query with stored queries in a historical knowledge base and, in the event the parameterized query does not match a stored query within the historical knowledge base, search for a match in a plurality of knowledge models, and a response solution engine configured to receive a system response ID from the intelligent search and match engine, the response solution engine being configured to initiate a system action by interacting with sub-system and related databases to generate a system response.
    Type: Application
    Filed: March 28, 2012
    Publication date: October 3, 2013
    Inventors: Subhash Makhija, Santosh Katakol, Dhananlay Nagalkar, Siddhaarth Iyer, Ravi Mevcha
  • Publication number: 20130262107
    Abstract: The disclosure provides a natural language query system and method for processing and analyzing multimodally-originated queries, including voice and proximity-based queries. The natural language query system includes a Web-enabled device including a speech input module for receiving a voice-based query in natural language form from a user and a location/proximity module for receiving location/proximity information from a location/proximity device. The query system also includes a speech conversion module for converting the voice-based query in natural language form to text in natural language form and a natural language processing module for converting the text in natural language form to text in searchable form. The query system further includes a semantic engine module for converting the text in searchable form to a formal database query and a database-look-up module for using the formal database query to obtain a result related to the voice-based query in natural language form from a database.
    Type: Application
    Filed: October 18, 2012
    Publication date: October 3, 2013
    Inventor: David E. Bernard
  • Publication number: 20130262105
    Abstract: Dynamic features are utilized with CRFs to handle long-distance dependencies of output labels. The dynamic features present a probability distribution involved in explicit distance from/to a special output label that is pre-defined according to each application scenario. Besides the number of units in the segment (from the previous special output label to the current unit), the dynamic features may also include the sum of any basic features of units in the segment. Since the added dynamic features are involved in the distance from the previous specific label, the searching lattice associated with Viterbi searching is expanded to distinguish the nodes with various distances. The dynamic features may be used in a variety of different applications, such as Natural Language Processing, Text-To-Speech and Automatic Speech Recognition. For example, the dynamic features may be used to assist in prosodic break and pause prediction.
    Type: Application
    Filed: March 28, 2012
    Publication date: October 3, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Jian Luan, Linfang Wang, Hairong Xia, Sheng Zhao, Daniela Braga
  • Publication number: 20130246063
    Abstract: A system and methods are disclosed which provide simple and rapid animated content creation, particularly for more life-like synthesis of voice segments associated with an animated element. A voice input tool enables quick creation of spoken language segments for animated characters. Speech is converted to text. That text may be reconverted to speech with prosodic elements added. The text, prosodic elements, and voice may be edited.
    Type: Application
    Filed: April 7, 2011
    Publication date: September 19, 2013
    Applicant: GOOGLE INC.
    Inventor: Eric Teller
  • Publication number: 20130246041
    Abstract: Systems and methods for information and action management including managing and communicating critical and non-critical information relating to certain emergency services events or incidents as well as other applications. More specifically, systems and methods for information and action management including a plurality of mobile interface units that include one or more of a language translation sub-system, an action receipt sub-system, a voice-to-text conversion sub-system, a media management sub-system, a revision management sub-system that restricts the abilities of some users, and a report generation sub-system that creates reports operatively coupled to the language translation sub-system, the action receipt sub-system, the voice-to-text conversion sub-system, the media management sub-system, and the revision management sub-system to auto-populate report fields.
    Type: Application
    Filed: August 3, 2012
    Publication date: September 19, 2013
    Inventors: Marc Alexander Costa, Robert Edward Clark,, JR., Akshay Ender Sura
  • Publication number: 20130239018
    Abstract: An embodiment of the present invention is directed to controlling a graphical display responsive to voice input. A voice input is received from a live telephone conversation between two or more parties. The voice input is then analyzed for the presence of a product model number. The product model number is then extracting from the voice input. The product model number is then displayed in a model number listing on the graphical display while the conversation is still ongoing.
    Type: Application
    Filed: March 12, 2012
    Publication date: September 12, 2013
    Applicant: W.W. GRAINGER, INC.
    Inventor: Geoffry A. Westphal
  • Publication number: 20130238329
    Abstract: Techniques for documenting a clinical procedure involve transcribing audio data comprising audio of one or more clinical personnel speaking while performing the clinical procedure. Examples of applicable clinical procedures include sterile procedures such as surgical procedures, as well as non-sterile procedures such as those conventionally involving a core code reporter. The transcribed audio data may be analyzed to identify relevant information for documenting the clinical procedure, and a text report including the relevant information documenting the clinical procedure may be automatically generated.
    Type: Application
    Filed: March 8, 2012
    Publication date: September 12, 2013
    Applicant: Nuance Communications, Inc.
    Inventor: Mariana Casella dos Santos
  • Publication number: 20130231930
    Abstract: A computer implemented method and apparatus for automatically filtering an audio input to make a filtered recording comprising: identifying words used in an audio input, determining whether each identified word is contained in a dictionary of banned words, and creating a filtered recording as an audio output, wherein each word identified in the audio input that is found in the dictionary of banned words, is automatically deleted or replaced in the audio output used to make the filtered recording.
    Type: Application
    Filed: March 1, 2012
    Publication date: September 5, 2013
    Applicant: Adobe Systems Inc.
    Inventor: Antonio Sanso
  • Publication number: 20130211833
    Abstract: Techniques for overlaying a custom interface onto an existing kiosk interface are provided. An event is detected that triggers a kiosk to process an agent that overlays, and without modifying, the kiosk's existing interface. The agent alters screen features and visual presentation of the existing interface and provides additional alternative operations for navigating and executing features defined in the existing interface. In an embodiment, the agent provides a custom interface overlaid onto the existing interface to provide a customer-facing interface for individuals that are sight impaired.
    Type: Application
    Filed: February 9, 2012
    Publication date: August 15, 2013
    Applicant: NCR Corporatioin
    Inventors: Thomas V. Edwards, Daniel Francis Matteo
  • Publication number: 20130197908
    Abstract: Systems and methods for speech processing in telecommunication networks are described. In some embodiments, a method may include receiving speech transmitted over a network, causing the speech to be converted to text, and identifying the speech as predetermined speech in response to the text matching a stored text associated with the predetermined speech. The stored text may have been obtained, for example, by subjecting the predetermined speech to a network impairment condition. The method may further include identifying terms within the text that match terms within the stored text (e.g., despite not being identical to each other), calculating a score between the text and the stored text, and determining that the text matches the stored text in response to the score meeting a threshold value. In some cases, the method may also identify one of a plurality of speeches based on a selected one of a plurality of stored texts.
    Type: Application
    Filed: February 16, 2012
    Publication date: August 1, 2013
    Applicant: TEKTRONIX, INC.
    Inventors: Jihao Zhong, Sylvain Plante, Chunchun Jonina Chan, Jiping Xie
  • Publication number: 20130197903
    Abstract: An exemplary recording method receives the personal information of a speaker transmitted from a RFID tag through a RFID reader. Then the method receives the voice of the speaker through a microphone. The method next receives the personal information of the speaker and the identifier of the audio input device transmitted from the audio input device, and associates the personal information of the speaker with the received identifier of the audio input device. Then, the method receives the voice and the identifier of the audio input device transmitted from the audio input device. The method further converts the received voice to text. The method determines the personal information corresponding to the identifier of the audio input device received with the voice, and associates the converted text with the determined personal information to generate a record.
    Type: Application
    Filed: August 2, 2012
    Publication date: August 1, 2013
    Applicants: HON HAI PRECISION INDUSTRY CO., LTD., HONG FU JIN PRECISION INDUSTRY (ShenZhen) CO., LTD
    Inventor: WEN-HUI ZHANG
  • Publication number: 20130177143
    Abstract: Methods, systems, computer readable media, and apparatuses for voice data transcription are provided. Packets that include voice data may be received and for each time that a threshold amount of voice data is received, a segment of voice data may be created and transcribed to text. A message that includes the transcribed text may then be transmitted to an intended recipient of the voice data, to allow the intended recipient to view the message prior to transcription of the remaining segments of the voice data. Additionally, subsequent to transmission of the message, another message may be transmitted that includes text transcribed from a different segment of the voice data.
    Type: Application
    Filed: January 9, 2012
    Publication date: July 11, 2013
    Applicant: COMCAST CABLE COMMUNICATIONS, LLC
    Inventors: Tong Zhou, Matthew Sirawsky
  • Publication number: 20130173265
    Abstract: Speech-to-text software, sometimes known as dictation software, is software that lets you talk to the computer in some form and have the computer react appropriately to what you are saying. This is totally different to text-to-speech software, which is software can read out text already in the computer. Speech-to-online-text software allows you to speak words into the webpage of an Internet capable device. Speech-to-online-text software will also support the capabilities provided by speech-to-text software. The hardware required to support this technology is an Internet capable device and a compatible microphone. This capability will be especially useful for communicating in different languages and dialects around the world.
    Type: Application
    Filed: January 3, 2012
    Publication date: July 4, 2013
    Inventors: Chiaka Chukwuma Okoroh, Nathaniel A. Moore
  • Publication number: 20130166292
    Abstract: A system for accessing content maintains a set of content selections associated with a first user. The system receives first original content from a first content source associated with a first one of the content selections associated with the first user. The system applies, to the first original content, a first rule (such as a parsing rule) that is specific to the first one of the content selections, to produce first derived content. The system changes the state of at least one component of a human-machine dialogue system (such as a text-to-act engine, a dialogue manager, or an act-to-text engine) based on the first derived content. The system may apply a second rule (such as a dialogue rule) to the first derived content to produce rule output and change the state of the human-machine dialogue system based on the rule output.
    Type: Application
    Filed: December 23, 2011
    Publication date: June 27, 2013
    Inventors: James D. Van Sciver, Christopher Bader, Michael Anthony Aliotti, David Carl Bong
  • Publication number: 20130166285
    Abstract: This specification describes technologies relating to multi core processing for parallel speech-to-text processing. In some implementations, a computer-implemented method is provided that includes the actions of receiving an audio file; analyzing the audio file to identify portions of the audio file as corresponding to one or more audio types; generating a time-ordered classification of the identified portions, the time-ordered classification indicating the one or more audio types and position within the audio file of each portion; generating a queue using the time-ordered classification, the queue including a plurality of jobs where each job includes one or more identifiers of a portion of the audio file classified as belonging to the one or more speech types; distributing the jobs in the queue to a plurality of processors; performing speech-to-text processing on each portion to generate a corresponding text file; and merging the corresponding text files to generate a transcription file.
    Type: Application
    Filed: December 10, 2008
    Publication date: June 27, 2013
    Applicant: Adobe Systems Incorporated
    Inventors: Walter Chang, Michael J. Welch
  • Publication number: 20130158992
    Abstract: An exemplary speech processing method includes extracting voice features from the stored audio files. Next, the method extracts speech(s) of a speaker from one or more audio files that contains voice feature matching one selected voice model, to form a single audio file, implements a speech-to-text algorithm to create a textual file based on the single audio file, and further records time point(s). The method then associates each of the words in the converted text with corresponding recorded time points recorded. Next, the method searches for an input keyword in the converted textual file. The method further obtains a time point associated with a word first appearing in the textual file that matches the keyword, and further controls an audio play device to play the single audio file at the determined time point.
    Type: Application
    Filed: December 30, 2011
    Publication date: June 20, 2013
    Applicants: HON HAI PRECISION INDUSTRY CO., LTD., FU TAI HUA INDUSTRY (SHENZHEN) CO., LTD.
    Inventor: XI LIN
  • Publication number: 20130158991
    Abstract: Methods and systems are provided for communicating information from an aircraft to a computer system at a ground location. One exemplary method involves obtaining an audio input from an audio input device onboard the aircraft, generating text data comprising a textual representation of the one or more words of the audio input, and communicating the text data to the computer system at the ground location.
    Type: Application
    Filed: December 20, 2011
    Publication date: June 20, 2013
    Applicant: HONEYWELL INTERNATIONAL INC.
    Inventors: Xian Qin Dong, Ben Dong, Yunsong Gao
  • Publication number: 20130144619
    Abstract: Techniques for ability enhancement are described. Some embodiments provide an ability enhancement facilitator system (“AEFS”) configured to enhance voice conferencing among multiple speakers. In one embodiment, the AEFS receives data that represents utterances of multiple speakers who are engaging in a voice conference with one another. The AEFS then determines speaker-related information, such as by identifying a current speaker, locating an information item (e.g., an email message, document) associated with the speaker, or the like. The AEFS then informs a user of the speaker-related information, such as by presenting the speaker-related information on a display of a conferencing device associated with the user.
    Type: Application
    Filed: January 23, 2012
    Publication date: June 6, 2013
    Inventors: Richard T. Lord, Robert W. Lord, Nathan P. Myhrvold, Clarence T. Tegreene, Roderick A. Hyde, Lowell L. Wood, JR., Muriel Y. Ishikawa, Victoria Y.H. Wood, Charles Whitmer, Paramvir Bahl, Doughlas C. Burger, Ranveer Chandra, William H. Gates, III, Paul Holman, Jordin T. Kare, Craig J. Mundie, Tim Paek, Desney S. Tan, Lin Zhong, Matthew G. Dyor
  • Publication number: 20130144610
    Abstract: An automated technique is disclosed for processing audio data and generating one or more actions in response thereto. In particular embodiments, the audio data can be obtained during a phone conversation and post-call actions can be provided to the user with contextually relevant entry points for completion by an associated application. Audio transcription services available on a remote server can be leveraged. The entry points can be generated based on keyword recognition in the transcription and passed to the application in the form of parameters.
    Type: Application
    Filed: December 5, 2011
    Publication date: June 6, 2013
    Applicant: Microsoft Corporation
    Inventors: Clif Gordon, Kerry D. Woolsey