Speech To Text Systems (epo) Patents (Class 704/E15.043)
E Subclasses
-
Publication number: 20150149167Abstract: Aspects of this disclosure are directed to accurately transforming speech data into one or more word strings that represent the speech data. A speech recognition device may receive the speech data from a user device and an indication of the user device. The speech recognition device may execute a speech recognition algorithm using one or more user and acoustic condition specific transforms that are specific to the user device and an acoustic condition of the speech data. The execution of the speech recognition algorithm may transform the speech data into one or more word strings that represent the speech data. The speech recognition device may estimate which one of the one or more word strings more accurately represents the received speech data.Type: ApplicationFiled: September 30, 2011Publication date: May 28, 2015Applicant: GOOGLE INC.Inventors: Françoise Beaufays, Johan Schalkwyk, Vincent Olivier Vanhoucke, Petar Stanisa Aleksic
-
Publication number: 20140372114Abstract: In one aspect, this application describes a computer-readable storage medium storing instructions that, when executed by one or more processing devices, cause the one or more processing devices to perform operations that include receiving, from a user of a computing device, a spoken input that includes a note and an activation phrase that indicates an intent to record the note. The operations also include determining a target address based at least in part on an identifier associated with a registered user of the computing device, wherein the target address is determined without receiving, from the user, an input indicating the target address when the spoken input is received. The operations also include defining a communication that includes a machine-generated transcript of the note, and sending the communication to the target address.Type: ApplicationFiled: August 5, 2011Publication date: December 18, 2014Applicant: GOOGLE INC.Inventors: Michael J. LeBeau, John Nicholas Jitkoff
-
Publication number: 20140372115Abstract: In one aspect, this application describes a computer-readable storage medium storing instructions that, when executed by one or more processing devices, cause the one or more processing devices to perform operations that include receiving, from a user of a computing device, a spoken input that includes a note and an activation phrase that indicates an intent to record the note. The operations also include determining a target address based at least in part on an identifier associated with a registered user of the computing device, wherein the target address is determined without receiving, from the user, an input indicating the target address when the spoken input is received. The operations also include defining a communication that includes a machine-generated transcript of the note, and sending the communication to the target address.Type: ApplicationFiled: September 30, 2011Publication date: December 18, 2014Applicant: GOOGLE, INC.Inventors: Michael J. LeBeau, John Nicholas Jitkoff
-
Publication number: 20140122069Abstract: A mechanism is provided for utilizing content analytics to automate corrections and improve speech recognition accuracy. A set of current corrected content elements is identified within a transcribed corrected media. Each current corrected content element in the set of current corrected content elements is weighted with an assigned weight based on one or more predetermined weighting conditions and a context of the transcribed corrected media. A confidence level is associated with each corrected content element based on the assigned weight. The set of current corrected content elements and the confidence level associated with each current corrected content element in a set of corrected elements is stored in a storage device for use in a subsequent transcription correction.Type: ApplicationFiled: October 30, 2012Publication date: May 1, 2014Applicant: International Business Machines CorporationInventors: Seth E. Bravin, Brian J. Cragun, Robert A. Foyle, Ali Sobhi
-
Publication number: 20140122059Abstract: Voice-based input is used to operate a media device and/or to search for media content. Voice input is received by a media device via one or more audio input devices and is translated into a textual representation of the voice input. The textual representation of the voice input is used to search one or more cache mappings between input commands and one or more associated device actions and/or media content queries. One or more natural language processing techniques may be applied to the translated text and the resulting text may be transmitted as a query to a media search service. A media search service returns results comprising one or more content item listings and the results may be presented on a display to a user.Type: ApplicationFiled: October 31, 2012Publication date: May 1, 2014Applicant: TiVo Inc.Inventors: Mukesh Patel, Lu Silverstein, Srinivas Jandhyala
-
Publication number: 20140122070Abstract: A system for converting audible air traffic control instructions for pilots operating from an air facility to textual format. The system may comprise a processor connected to a jack of the standard pilot headset and a separate portable display screen connected to the processor. The processor may have a language converting functionality which can recognize traffic control nomenclature and display messages accordingly. Displayed text may be limited to information intended for a specific aircraft. The display may show hazardous discrepancies between authorized altitudes and headings and actual altitudes and headings. The display may be capable of correction by the user, and may utilize Global Positioning System (GPS) to obtain appropriate corrections. The system may date and time stamp communications and hold the same in memory. The system may have computer style user functions such as scrollability and operating prompts.Type: ApplicationFiled: October 30, 2012Publication date: May 1, 2014Inventors: Robert S. Prus, Konrad Robert Sliwowski
-
Publication number: 20140095158Abstract: An apparatus, system and method for continuously capturing ambient voice and using it to update content delivered to a user of an electronic device are provided. Subsets of words are continuously extracted from speech and used to deliver content relevant to the subsets of words.Type: ApplicationFiled: October 2, 2012Publication date: April 3, 2014Inventor: Matthew VROOM
-
Publication number: 20140086395Abstract: In an embodiment, a system maintains a database of a plurality of persons. The database includes an audio clip of a pronunciation of a name of a first person in the database. The system determines from a calendar database that a second person has an event in common with the first person, and transmits to a device associated with the second person an indication that the database includes the pronunciation of the name of the first person.Type: ApplicationFiled: September 25, 2012Publication date: March 27, 2014Applicant: Linkedln CorporationInventors: Jonathan Redfern, Manish Mohan Sharma, Seth McLaughlin
-
Publication number: 20140088961Abstract: Mechanisms for performing dynamic automatic speech recognition on a portion of multimedia content are provided. Multimedia content is segmented into homogeneous segments of content with regard to speakers and background sounds. For the at least one segment, a speaker providing speech in an audio track of the at least one segment is identified using information retrieved from a social network service source. A speech profile for the speaker is generated using information retrieved from the social network service source, an acoustic profile for the segment is generated based on the generated speech profile, and an automatic speech recognition engine is dynamically configured for operation on the at least one segment based on the acoustic profile. Automatic speech recognition operations are performed on the audio track of the at least one segment to generate a textual representation of speech content in the audio track corresponding to the speaker.Type: ApplicationFiled: September 26, 2012Publication date: March 27, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Elizabeth V. Woodward, Shunguo Yan
-
Publication number: 20140074466Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data encoding an utterance and environmental data, obtaining a transcription of the utterance, identifying an entity using the environmental data, submitting a query to a natural language query processing engine, wherein the query includes at least a portion of the transcription and data that identifies the entity, and obtaining one or more results of the query.Type: ApplicationFiled: September 25, 2012Publication date: March 13, 2014Applicant: Google Inc.Inventors: Matthew Sharifi, Gheorghe Postelnicu
-
Publication number: 20140074464Abstract: Some embodiments of the inventive subject matter may include a method for detecting speech loss and supplying appropriate recollection data to the user. The method can include detecting a speech stream from a user. The method can include converting the speech stream to text. The method can include storing the text. The method can include detecting an interruption to the speech stream, wherein the interruption to the speech stream indicates speech loss by the user. The method can include searching a catalog using the text as a search parameter to find relevant catalog data. The method can include presenting the relevant catalog data to remind the user about the speech stream.Type: ApplicationFiled: September 12, 2012Publication date: March 13, 2014Applicant: International Business Machines CorporationInventor: Scott H. Berens
-
Publication number: 20140074465Abstract: A system and method for generating an acoustic database for a particular narrator that does not require said narrator to recite a pre-defined script. The system and method generate an acoustic database by using voice recognition or speech-to-text algorithms to automatically generate a text script of a voice message while simultaneously or near-simultaneously sampling the voice message to create the acoustic database. The acoustic database may be associated with the narrator of the voice message by an identifier, such as a telephone number. The acoustic database may then be used by a text-to-speech processor to read a text message when the narrator is identified as the sender of the text massage, providing an audio output of the contents of the text message with a simulation of the sender's voice. The user of the system may be provided an audio message that sounds like the originator of the text message.Type: ApplicationFiled: September 11, 2012Publication date: March 13, 2014Applicant: DELPHI TECHNOLOGIES, INC.Inventors: Christopher A. Hedges, Bradley S. Coon
-
Publication number: 20140058727Abstract: A multimedia recording system is provided. The multimedia recording system includes a storage module, a recognition module, and a tagging module. The storage module stores a multimedia file corresponding to multimedia data with audio content, wherein the multimedia data is received through a computer network. The recognition module converts the audio content of the multimedia data into text. The tagging module produces tag information according to the text, wherein the tag information corresponds to portion(s) of the multimedia file. The disclosure further provides a multimedia recording method.Type: ApplicationFiled: August 28, 2012Publication date: February 27, 2014Applicant: HON HAI PRECISION INDUSTRY CO., LTD.Inventors: TAI-MING GOU, YI-WEN CAI, CHUN-MING CHEN
-
Publication number: 20140046660Abstract: A computer-implemented method for voice based mood analysis includes receiving an acoustic speech of a plurality of words from a user in response to the user utilizing speech to text mode. The computer-implemented method also includes analyzing the acoustic speech to distinguish voice patterns. Further, the computer-implemented method includes measuring a plurality of tone parameters from the voice patterns, wherein the tone parameters comprises voice decibel, timbre and pitch. Furthermore, the computer-implemented method includes identifying mood of the user based on the plurality of tone parameters. Moreover, the computer-implemented method includes streaming appropriate web content to the user based on the mood of the user.Type: ApplicationFiled: August 10, 2012Publication date: February 13, 2014Applicant: Yahoo! IncInventor: Gaurav KAMDAR
-
Publication number: 20140039888Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for performing speech recognition using models that are based on where, within a building, a speaker makes an utterance are disclosed. The methods, systems, and apparatus include actions of receiving data corresponding to an utterance, and obtaining location indicia for an area within a building where the utterance was spoken. Further actions include selecting one or more models for speech recognition based on the location indicia, wherein each of the selected one or more models is associated with a weight based on the location indicia. Additionally, the actions include generating a composite model using the selected one or more models and the respective weights of the selected one or more models. And the actions also include generating a transcription of the utterance using the composite model.Type: ApplicationFiled: October 15, 2012Publication date: February 6, 2014Inventors: Gabriel Taubman, Brian Strope
-
Publication number: 20140019126Abstract: Speech-to-text recognition of non-dictionary words by an electronic device having a speech-to-text recognizer and a global positing system (GPS) includes receiving a user's speech and attempting to convert the speech to text using at least a word dictionary; in response to a portion of the speech being unrecognizable, determining if the speech contains a location-based phrase that contains a term relating to any combination of a geographic origin or destination, a current location, and a route; retrieving from a global positioning system location data that are within geographical proximity to the location-based phrase, wherein the location data include any combination of street names, business names, places of interest, and municipality names; updating the word dictionary by temporarily adding words from the location data to the word dictionary; and using the updated word dictionary to convert the previously unrecognized portion of the speech to text.Type: ApplicationFiled: July 13, 2012Publication date: January 16, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Zachary W. Abrams, Paula Besterman, Pamela S. Ross, Eric Woods
-
Publication number: 20140006021Abstract: Systems and methods for adjusting a discrete acoustic model complexity in an automatic speech recognition system. In some cases, the systems and methods include a discrete acoustic model, a pronunciation dictionary, and optionally a language model or a grammar model. In some cases, the methods include providing a speech database comprising multiple pairs, each pair including a speech recording called a waveform and an orthographic transcription of the waveform; constructing the discrete acoustic model by converting the orthographic transcription into a phonetic transcription; parameterizing the speech database by transforming the waveforms into a sequence of feature vectors and normalizing the sequences of the feature vectors; and training the acoustic model with the normalized sequences of the feature vectors, wherein the complexity PI of the discrete acoustic model is further adjusted through a procedure that uses a given generalization coefficient N. Other implementations are described.Type: ApplicationFiled: August 6, 2012Publication date: January 2, 2014Applicant: VOICE LAB SP. Z O.O.Inventor: Marcin Kuropatwinski
-
Publication number: 20140006020Abstract: A transcription method, apparatus and computer program product are provided to permit a transcripted text report to be efficiently reviewed. In the context of a method, an audio file and a transcripted text report corresponding to the audio file may be received. For each of a plurality of positions within the transcripted text report, the method correlates the respective position within the transcripted text report with a corresponding position within the audio file. The method also augments the transcripted text report to include a plurality of selectable elements. Each selectable element is associated with a respective position within the transcripted text report. The selectable elements are responsive to user actuation in order to cause the audio file to move to the corresponding position. A corresponding apparatus and computer program product are also provided.Type: ApplicationFiled: June 29, 2012Publication date: January 2, 2014Applicant: MCKESSON FINANCIAL HOLDINGSInventors: Tak M. Ko, Dragan Zigic
-
Publication number: 20130346075Abstract: Embodiments of apparatus, computer-implemented methods, systems, devices, and computer-readable media are described herein for facilitation of concurrent consumption of media content by a first user of a first computing device and a second user of a second computing device. In various embodiments, facilitation may include superimposition of an animation of the second user over the media content presented on the first computing device, based on captured visual data of the second user received from the second computing device. In various embodiments, the animation may be visually emphasized on determination of the first user's interest in the second user. In various embodiments, facilitation may include conditional alteration of captured visual data of the first user based at least in part on whether the second user has been assigned a trusted status, and transmittal of the altered or unaltered visual data of the first user to the second computing device.Type: ApplicationFiled: June 25, 2012Publication date: December 26, 2013Inventors: Paul I. Felkai, Annie Harper, Ratko Jagodic, Rajiv K. Mongia, Garth Shoemaker
-
Publication number: 20130339014Abstract: A computer-implemented arrangement is described for performing cepstral mean normalization (CMN) in automatic speech recognition. A current CMN function is stored in a computer memory as a previous CMN function. The current CMN function is updated based on a current audio input to produce an updated CMN function. The updated CMN function is used to process the current audio input to produce a processed audio input. Automatic speech recognition of the processed audio input is performed to determine representative text. If the audio input is not recognized as representative text, the updated CMN function is replaced with the previous CMN function.Type: ApplicationFiled: June 13, 2012Publication date: December 19, 2013Applicant: NUANCE COMMUNICATIONS, INC.Inventors: Yun Tang, Venkatesh Nagesha
-
Publication number: 20130332158Abstract: The technology of the present application provides a speech recognition system with at least two different speech recognition engines or a single engine speech recognition engine with at least two different modes of operation. The first speech recognition being used to match audio to text, which text may be words or phrases. The matched audio and text is used by a training module to train a user profile for a natural language speech recognition engine, which is at least one of the two different speech recognition engines or modes. An evaluation module evaluates when the user profile is sufficiently trained to convert the speech recognition engine from the first speech recognition engine or mode to the natural language speech recognition or mode.Type: ApplicationFiled: June 8, 2012Publication date: December 12, 2013Applicant: NVOQ INCORPORATEDInventors: Charles Corfield, Brian Marquette
-
Publication number: 20130325462Abstract: A system and method for assigning one or more tags to an image file. In one aspect, a server computer receives an image file captured by a client device. In one embodiment, the image file includes an audio component embedded therein by the client device, where the audio component was spoken by a user of the client device as a tag of the image file. The server computer determines metadata associated with the image file and identifies a dictionary of potential textual tags from the metadata. The server computer determines a textual tag from the audio component and from the dictionary of potential textual tags. The server computer then associates the textual tag with the image file as additional metadata.Type: ApplicationFiled: May 31, 2012Publication date: December 5, 2013Applicant: Yahoo! inc.Inventors: Oren Somekh, Nadav Golbandi, Liran Katzir, Ronny Lempel, Yoelle Maarek
-
Publication number: 20130325461Abstract: According to some embodiments, a user device may receive business enterprise information from a remote enterprise server. The user device may then automatically convert at least some of the business enterprise information into speech output provided to a user of the user device. Speech input from the user may be received via and converted by the user device. The user device may then interact with the remote enterprise server in accordance with the converted speech input and the business enterprise information.Type: ApplicationFiled: May 30, 2012Publication date: December 5, 2013Inventors: Guy Blank, Guy Soffer
-
Publication number: 20130325463Abstract: An apparatus, system, and method for identifying related content based on eye movements are disclosed. The system, apparatus, and method display content to a user concurrently in two or more windows, identify areas in each of the windows where a user's eyes focus, extract keywords from the areas where the user's eyes focus in each of the windows, search a communications network for related content using the keywords, and notify the user of the related content by displaying it concurrently with the two or more windows. The keywords are extracted from one or more locations in the two or more windows in which the user's eyes pause for a predetermined amount of time or, when the user's eyes pause on an image, from at least one of the text adjacent to and the metadata associated with that image.Type: ApplicationFiled: May 31, 2012Publication date: December 5, 2013Applicant: CA, INC.Inventors: Steven L. GREENSPAN, Carrie E. GATES
-
Publication number: 20130317817Abstract: Computer models are powerful resources that can be accessed by remote users. Models can be copied without authorization or can become an out-of-date version. A model with a signature, referred to herein as a “signed” model, can indicate the signature without affecting usage by users who are unaware that the model contains the signature. The signed model can respond to an input in a steganographic way such that only the designer of the model knows that the signature is embedded in the model. The response is a way to check the source or other characteristics of the model. The signed model can include embedded signatures of various degrees of detectability to respond to select steganographic inputs with steganographic outputs. In this manner, a designer of signed models can prove whether an unauthorized copy of the signed model is being used by a third party while using publically-available user interfaces.Type: ApplicationFiled: May 22, 2012Publication date: November 28, 2013Applicant: Nuance Communications, Inc.Inventors: William F. Ganong, III, Paul J. Vozila, Puming Zhan
-
Publication number: 20130311177Abstract: A methodology may be provided that automatically annotates web conference application sharing (e.g., sharing scenes and/or slides) based on voice and/or web conference data. In one specific example, a methodology may be provided that threads the annotations and assigns authorship to the correct resources.Type: ApplicationFiled: May 16, 2012Publication date: November 21, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Paul R. Bastide, Robert E. Loredo, Matthew E. Broomhall, Ralph E. LeBlanc, JR., Fang Lu
-
Publication number: 20130304451Abstract: Processes capable of accepting linguistic input in one or more languages are generated by re-using existing linguistic components associated with a different anchor language, together with machine translation components that translate between the anchor language and the one or more languages. Linguistic input is directed to machine translation components that translate such input from its language into the anchor language. Those existing linguistic components are then utilized to initiate responsive processing and generate output. Optionally, the output is directed through the machine translation components. A language identifier can initially receive linguistic input and identify the language within which such linguistic input is provided to select an appropriate machine translation component.Type: ApplicationFiled: May 10, 2012Publication date: November 14, 2013Applicant: MICROSOFT CORPORATIONInventors: Ruhi Sarikaya, Daniel Boies, Fethiye Asli Celikyilmaz, Anoop K. Deoras, Dustin Rigg Hillard, Dilek Z. Hakkani-Tur, Gokhan Tur, Fileno A. Alleva
-
Publication number: 20130297308Abstract: The present disclosure may provide an electronic device capable of audio recording. The electronic device may include a recording function unit configured to record an external sound to store it as an audio file, a conversion unit configured to convert a voice contained in the sound into a text based on a speech-to-text (STT) conversion, and a controller configured to detect a core keyword from the text, and set the detected core keyword to at least part of a file name for the audio file.Type: ApplicationFiled: October 22, 2012Publication date: November 7, 2013Applicant: LG ELECTRONICS INC.Inventors: Bon Joon KOO, Yireun KIM
-
Publication number: 20130297307Abstract: A dictation module is described herein which receives and interprets a complete utterance of the user in incremental fashion, that is, one incremental portion at a time. The dictation module also provides rendered text in incremental fashion. The rendered text corresponds to the dictation module's interpretation of each incremental portion. The dictation module also allows the user to modify any part of the rendered text, as it becomes available. In one case, for instance, the dictation module provides a marking menu which includes multiple options by which a user can modify a selected part of the rendered text. The dictation module also uses the rendered text (as modified or unmodified by the user using the marking menu) to adjust one or more models used by the dictation model to interpret the user's utterance.Type: ApplicationFiled: May 1, 2012Publication date: November 7, 2013Applicant: MICROSOFT CORPORATIONInventors: Timothy S. Paek, Bongshin Lee, Bo-June Hsu
-
Publication number: 20130289983Abstract: An electronic device and a method of controlling the electronic device are provided. According to an embodiment, the electronic device may recognize a first sound signal output from at least one external device connectable through a communication unit and to control a sound output of at least one of the at least one external device or the sound output unit when a second sound signal is output through the sound output unit.Type: ApplicationFiled: April 26, 2012Publication date: October 31, 2013Inventors: Hyorim PARK, Jihyun KIM, Jihwan KIM, Seijun LIM
-
Publication number: 20130262107Abstract: The disclosure provides a natural language query system and method for processing and analyzing multimodally-originated queries, including voice and proximity-based queries. The natural language query system includes a Web-enabled device including a speech input module for receiving a voice-based query in natural language form from a user and a location/proximity module for receiving location/proximity information from a location/proximity device. The query system also includes a speech conversion module for converting the voice-based query in natural language form to text in natural language form and a natural language processing module for converting the text in natural language form to text in searchable form. The query system further includes a semantic engine module for converting the text in searchable form to a formal database query and a database-look-up module for using the formal database query to obtain a result related to the voice-based query in natural language form from a database.Type: ApplicationFiled: October 18, 2012Publication date: October 3, 2013Inventor: David E. Bernard
-
Publication number: 20130262104Abstract: A procurement system may include a first interface configured to receive a query from a user, a command module configured to parameterize the query, an intelligent search and match engine configured to compare the parameterized query with stored queries in a historical knowledge base and, in the event the parameterized query does not match a stored query within the historical knowledge base, search for a match in a plurality of knowledge models, and a response solution engine configured to receive a system response ID from the intelligent search and match engine, the response solution engine being configured to initiate a system action by interacting with sub-system and related databases to generate a system response.Type: ApplicationFiled: March 28, 2012Publication date: October 3, 2013Inventors: Subhash Makhija, Santosh Katakol, Dhananlay Nagalkar, Siddhaarth Iyer, Ravi Mevcha
-
Publication number: 20130262105Abstract: Dynamic features are utilized with CRFs to handle long-distance dependencies of output labels. The dynamic features present a probability distribution involved in explicit distance from/to a special output label that is pre-defined according to each application scenario. Besides the number of units in the segment (from the previous special output label to the current unit), the dynamic features may also include the sum of any basic features of units in the segment. Since the added dynamic features are involved in the distance from the previous specific label, the searching lattice associated with Viterbi searching is expanded to distinguish the nodes with various distances. The dynamic features may be used in a variety of different applications, such as Natural Language Processing, Text-To-Speech and Automatic Speech Recognition. For example, the dynamic features may be used to assist in prosodic break and pause prediction.Type: ApplicationFiled: March 28, 2012Publication date: October 3, 2013Applicant: MICROSOFT CORPORATIONInventors: Jian Luan, Linfang Wang, Hairong Xia, Sheng Zhao, Daniela Braga
-
Publication number: 20130262103Abstract: A device and method are disclosed for testing the intelligibility of audio announcement systems. The device may include a microphone, a translation engine, a processor, a memory associated with the processor, and a display. The microphone of the analyzer may be coupled to the translation engine, which in-turn may be coupled to the processor, which is in-turn may be coupled to the memory and the display. The translation engine can convert audio speech input from the microphone into data output. The processor can receive the data output and can apply a scoring algorithm thereto. The algorithm can compare the received data against data that is stored in the memory of the analyzer and calculates the accuracy of the received data. The algorithm may translate the calculated accuracy into a standardized STI intelligibility score that is then presented on the display of the analyzer.Type: ApplicationFiled: March 28, 2012Publication date: October 3, 2013Applicant: SIMPLEXGRINNELL LPInventor: Rodger Reiswig
-
Publication number: 20130246063Abstract: A system and methods are disclosed which provide simple and rapid animated content creation, particularly for more life-like synthesis of voice segments associated with an animated element. A voice input tool enables quick creation of spoken language segments for animated characters. Speech is converted to text. That text may be reconverted to speech with prosodic elements added. The text, prosodic elements, and voice may be edited.Type: ApplicationFiled: April 7, 2011Publication date: September 19, 2013Applicant: GOOGLE INC.Inventor: Eric Teller
-
Publication number: 20130246041Abstract: Systems and methods for information and action management including managing and communicating critical and non-critical information relating to certain emergency services events or incidents as well as other applications. More specifically, systems and methods for information and action management including a plurality of mobile interface units that include one or more of a language translation sub-system, an action receipt sub-system, a voice-to-text conversion sub-system, a media management sub-system, a revision management sub-system that restricts the abilities of some users, and a report generation sub-system that creates reports operatively coupled to the language translation sub-system, the action receipt sub-system, the voice-to-text conversion sub-system, the media management sub-system, and the revision management sub-system to auto-populate report fields.Type: ApplicationFiled: August 3, 2012Publication date: September 19, 2013Inventors: Marc Alexander Costa, Robert Edward Clark,, JR., Akshay Ender Sura
-
Publication number: 20130239018Abstract: An embodiment of the present invention is directed to controlling a graphical display responsive to voice input. A voice input is received from a live telephone conversation between two or more parties. The voice input is then analyzed for the presence of a product model number. The product model number is then extracting from the voice input. The product model number is then displayed in a model number listing on the graphical display while the conversation is still ongoing.Type: ApplicationFiled: March 12, 2012Publication date: September 12, 2013Applicant: W.W. GRAINGER, INC.Inventor: Geoffry A. Westphal
-
Publication number: 20130238329Abstract: Techniques for documenting a clinical procedure involve transcribing audio data comprising audio of one or more clinical personnel speaking while performing the clinical procedure. Examples of applicable clinical procedures include sterile procedures such as surgical procedures, as well as non-sterile procedures such as those conventionally involving a core code reporter. The transcribed audio data may be analyzed to identify relevant information for documenting the clinical procedure, and a text report including the relevant information documenting the clinical procedure may be automatically generated.Type: ApplicationFiled: March 8, 2012Publication date: September 12, 2013Applicant: Nuance Communications, Inc.Inventor: Mariana Casella dos Santos
-
Publication number: 20130231930Abstract: A computer implemented method and apparatus for automatically filtering an audio input to make a filtered recording comprising: identifying words used in an audio input, determining whether each identified word is contained in a dictionary of banned words, and creating a filtered recording as an audio output, wherein each word identified in the audio input that is found in the dictionary of banned words, is automatically deleted or replaced in the audio output used to make the filtered recording.Type: ApplicationFiled: March 1, 2012Publication date: September 5, 2013Applicant: Adobe Systems Inc.Inventor: Antonio Sanso
-
Publication number: 20130211833Abstract: Techniques for overlaying a custom interface onto an existing kiosk interface are provided. An event is detected that triggers a kiosk to process an agent that overlays, and without modifying, the kiosk's existing interface. The agent alters screen features and visual presentation of the existing interface and provides additional alternative operations for navigating and executing features defined in the existing interface. In an embodiment, the agent provides a custom interface overlaid onto the existing interface to provide a customer-facing interface for individuals that are sight impaired.Type: ApplicationFiled: February 9, 2012Publication date: August 15, 2013Applicant: NCR CorporatioinInventors: Thomas V. Edwards, Daniel Francis Matteo
-
Publication number: 20130197908Abstract: Systems and methods for speech processing in telecommunication networks are described. In some embodiments, a method may include receiving speech transmitted over a network, causing the speech to be converted to text, and identifying the speech as predetermined speech in response to the text matching a stored text associated with the predetermined speech. The stored text may have been obtained, for example, by subjecting the predetermined speech to a network impairment condition. The method may further include identifying terms within the text that match terms within the stored text (e.g., despite not being identical to each other), calculating a score between the text and the stored text, and determining that the text matches the stored text in response to the score meeting a threshold value. In some cases, the method may also identify one of a plurality of speeches based on a selected one of a plurality of stored texts.Type: ApplicationFiled: February 16, 2012Publication date: August 1, 2013Applicant: TEKTRONIX, INC.Inventors: Jihao Zhong, Sylvain Plante, Chunchun Jonina Chan, Jiping Xie
-
Publication number: 20130197903Abstract: An exemplary recording method receives the personal information of a speaker transmitted from a RFID tag through a RFID reader. Then the method receives the voice of the speaker through a microphone. The method next receives the personal information of the speaker and the identifier of the audio input device transmitted from the audio input device, and associates the personal information of the speaker with the received identifier of the audio input device. Then, the method receives the voice and the identifier of the audio input device transmitted from the audio input device. The method further converts the received voice to text. The method determines the personal information corresponding to the identifier of the audio input device received with the voice, and associates the converted text with the determined personal information to generate a record.Type: ApplicationFiled: August 2, 2012Publication date: August 1, 2013Applicants: HON HAI PRECISION INDUSTRY CO., LTD., HONG FU JIN PRECISION INDUSTRY (ShenZhen) CO., LTDInventor: WEN-HUI ZHANG
-
Publication number: 20130177143Abstract: Methods, systems, computer readable media, and apparatuses for voice data transcription are provided. Packets that include voice data may be received and for each time that a threshold amount of voice data is received, a segment of voice data may be created and transcribed to text. A message that includes the transcribed text may then be transmitted to an intended recipient of the voice data, to allow the intended recipient to view the message prior to transcription of the remaining segments of the voice data. Additionally, subsequent to transmission of the message, another message may be transmitted that includes text transcribed from a different segment of the voice data.Type: ApplicationFiled: January 9, 2012Publication date: July 11, 2013Applicant: COMCAST CABLE COMMUNICATIONS, LLCInventors: Tong Zhou, Matthew Sirawsky
-
Publication number: 20130173265Abstract: Speech-to-text software, sometimes known as dictation software, is software that lets you talk to the computer in some form and have the computer react appropriately to what you are saying. This is totally different to text-to-speech software, which is software can read out text already in the computer. Speech-to-online-text software allows you to speak words into the webpage of an Internet capable device. Speech-to-online-text software will also support the capabilities provided by speech-to-text software. The hardware required to support this technology is an Internet capable device and a compatible microphone. This capability will be especially useful for communicating in different languages and dialects around the world.Type: ApplicationFiled: January 3, 2012Publication date: July 4, 2013Inventors: Chiaka Chukwuma Okoroh, Nathaniel A. Moore
-
Publication number: 20130166285Abstract: This specification describes technologies relating to multi core processing for parallel speech-to-text processing. In some implementations, a computer-implemented method is provided that includes the actions of receiving an audio file; analyzing the audio file to identify portions of the audio file as corresponding to one or more audio types; generating a time-ordered classification of the identified portions, the time-ordered classification indicating the one or more audio types and position within the audio file of each portion; generating a queue using the time-ordered classification, the queue including a plurality of jobs where each job includes one or more identifiers of a portion of the audio file classified as belonging to the one or more speech types; distributing the jobs in the queue to a plurality of processors; performing speech-to-text processing on each portion to generate a corresponding text file; and merging the corresponding text files to generate a transcription file.Type: ApplicationFiled: December 10, 2008Publication date: June 27, 2013Applicant: Adobe Systems IncorporatedInventors: Walter Chang, Michael J. Welch
-
Publication number: 20130166292Abstract: A system for accessing content maintains a set of content selections associated with a first user. The system receives first original content from a first content source associated with a first one of the content selections associated with the first user. The system applies, to the first original content, a first rule (such as a parsing rule) that is specific to the first one of the content selections, to produce first derived content. The system changes the state of at least one component of a human-machine dialogue system (such as a text-to-act engine, a dialogue manager, or an act-to-text engine) based on the first derived content. The system may apply a second rule (such as a dialogue rule) to the first derived content to produce rule output and change the state of the human-machine dialogue system based on the rule output.Type: ApplicationFiled: December 23, 2011Publication date: June 27, 2013Inventors: James D. Van Sciver, Christopher Bader, Michael Anthony Aliotti, David Carl Bong
-
Publication number: 20130158991Abstract: Methods and systems are provided for communicating information from an aircraft to a computer system at a ground location. One exemplary method involves obtaining an audio input from an audio input device onboard the aircraft, generating text data comprising a textual representation of the one or more words of the audio input, and communicating the text data to the computer system at the ground location.Type: ApplicationFiled: December 20, 2011Publication date: June 20, 2013Applicant: HONEYWELL INTERNATIONAL INC.Inventors: Xian Qin Dong, Ben Dong, Yunsong Gao
-
Publication number: 20130158992Abstract: An exemplary speech processing method includes extracting voice features from the stored audio files. Next, the method extracts speech(s) of a speaker from one or more audio files that contains voice feature matching one selected voice model, to form a single audio file, implements a speech-to-text algorithm to create a textual file based on the single audio file, and further records time point(s). The method then associates each of the words in the converted text with corresponding recorded time points recorded. Next, the method searches for an input keyword in the converted textual file. The method further obtains a time point associated with a word first appearing in the textual file that matches the keyword, and further controls an audio play device to play the single audio file at the determined time point.Type: ApplicationFiled: December 30, 2011Publication date: June 20, 2013Applicants: HON HAI PRECISION INDUSTRY CO., LTD., FU TAI HUA INDUSTRY (SHENZHEN) CO., LTD.Inventor: XI LIN
-
Publication number: 20130144610Abstract: An automated technique is disclosed for processing audio data and generating one or more actions in response thereto. In particular embodiments, the audio data can be obtained during a phone conversation and post-call actions can be provided to the user with contextually relevant entry points for completion by an associated application. Audio transcription services available on a remote server can be leveraged. The entry points can be generated based on keyword recognition in the transcription and passed to the application in the form of parameters.Type: ApplicationFiled: December 5, 2011Publication date: June 6, 2013Applicant: Microsoft CorporationInventors: Clif Gordon, Kerry D. Woolsey
-
Publication number: 20130144619Abstract: Techniques for ability enhancement are described. Some embodiments provide an ability enhancement facilitator system (“AEFS”) configured to enhance voice conferencing among multiple speakers. In one embodiment, the AEFS receives data that represents utterances of multiple speakers who are engaging in a voice conference with one another. The AEFS then determines speaker-related information, such as by identifying a current speaker, locating an information item (e.g., an email message, document) associated with the speaker, or the like. The AEFS then informs a user of the speaker-related information, such as by presenting the speaker-related information on a display of a conferencing device associated with the user.Type: ApplicationFiled: January 23, 2012Publication date: June 6, 2013Inventors: Richard T. Lord, Robert W. Lord, Nathan P. Myhrvold, Clarence T. Tegreene, Roderick A. Hyde, Lowell L. Wood, JR., Muriel Y. Ishikawa, Victoria Y.H. Wood, Charles Whitmer, Paramvir Bahl, Doughlas C. Burger, Ranveer Chandra, William H. Gates, III, Paul Holman, Jordin T. Kare, Craig J. Mundie, Tim Paek, Desney S. Tan, Lin Zhong, Matthew G. Dyor