Text Analysis, Generation Of Parameters For Speech Synthesis Out Of Text, E.g., Grapheme To Phoneme Translation, Prosody Generation, Stress, Or Intonation Determination, Etc. (epo) Patents (Class 704/E13.011)
-
Patent number: 12136286Abstract: Keypoint extraction is done for extracting keypoints from images of documents. Based on different keypoint extraction approaches used by existing keypoint extraction mechanisms, number of keypoints extracted and related parameters vary. Disclosed herein is a method and system for keypoint extraction from images of one or more documents. In this method, a reference image and a test image of a document are collected as input. During the keypoint extraction, based on types of characters present in words extracted from the document images, a plurality of words are extracted. Further, all connected components in each of the extracted words are identified. Further, it is decided whether keypoints are to be searched in a first component or in a last component of all the identified connected components, and accordingly searches and extracts at least four of the keypoints from the test image and the corresponding four keypoints from the reference image.Type: GrantFiled: September 6, 2020Date of Patent: November 5, 2024Assignee: Tata Consultancy Services LimitedInventors: Kushagra Mahajan, Monika Sharma, Lovekesh Vig
-
Patent number: 12062358Abstract: Systems and methods are disclosed herein for detecting dubbed speech in a media asset and receiving metadata corresponding to the media asset. The systems and methods may determine a plurality of scenes in the media asset based on the metadata, retrieve a portion of the dubbed speech corresponding to the first scene, and process the retrieved portion of the dubbed speech corresponding to the first scene to identify a speech characteristic of a character featured in the first scene. Further, the systems and methods may determine whether the speech characteristic of the character featured in the first scene matches the context of the first scene, and if the match fails, perform a function to adjust the portion of the dubbed speech so that the speech characteristic of the character featured in the first scene matches the context of the first scene.Type: GrantFiled: April 19, 2023Date of Patent: August 13, 2024Assignee: Rovi Guides, Inc.Inventors: Mario Sanchez, Ashleigh Miller, Paul T. Stathacopoulos
-
Patent number: 12051412Abstract: A control device includes at least one memory, and at least one processor configured to detect a voice segment from sound data, the sound data being detected while a controlled object operates, and stop the controlled object based on following conditions: a speaking speed is a predetermined speed threshold or greater, the speaking speed being calculated based on a portion of the sound data in the voice segment; and a length of the voice segment is a predetermined length threshold or less.Type: GrantFiled: August 20, 2021Date of Patent: July 30, 2024Assignee: Preferred Networks, Inc.Inventors: Kenta Yonekura, Hirochika Asai, Kota Nabeshima, Manabu Nagao
-
Patent number: 12050839Abstract: Systems and methods are provided for generating a soundmoji for output. A content item is generated for output at a computing device, and a first input associated with the selection of a soundmoji menu is received. One or more soundmojis are generated for output, and a second input associated with the selection of a first soundmoji of the one or more soundmojis is received. A first timestamp of the content item associated with the selection of the first soundmoji is identified. An indication of a second timestamp of the content item and a second soundmoji is received, and a user interface element associated with the content item is updated to indicate the second soundmoji when the content item is being generated for output at the second timestamp.Type: GrantFiled: September 9, 2022Date of Patent: July 30, 2024Assignee: Rovi Guides, Inc.Inventor: Serhad Doken
-
Patent number: 12032911Abstract: A system and method for training and using a text embedding model may include creating structured phrases from an input text; creating turn input samples from the input text, each turn input sample based on only or consisting of input from a single turn within the text and being formed by removing structure from structured phrases; and training an embedding model using the structured phrases and turn input samples. Call input samples may be created based on input from more than one turn within the text. At each level of resolution (e.g. phrase, speaker, call), a different level of resolution may be used to create input samples. At inference an embedding may be based on a weighted combination of the sub-terms within an input phrase, each weight being based on an inverse document frequency measure for the sub-term associated with the weight.Type: GrantFiled: January 8, 2021Date of Patent: July 9, 2024Assignee: Nice Ltd.Inventor: Stephen Lauber
-
Patent number: 11947752Abstract: Aspects of the present invention facilitate the customization of binary application user interfaces without modification of the underlying source code. Embodiments described herein can provide for hands-free operation or accessibility services, such as touchless operation of compiled applications, with minimal intervention. Described embodiments can automatically interface with the touch-based operating system to generate hands-free commands that, when detected (e.g., voice detection), can cause corresponding touch-based commands to be executed. Embodiments can utilize audio inputs, by way of example, to facilitate hands-free interaction with the touch-based operating system and applications executing thereon.Type: GrantFiled: August 3, 2022Date of Patent: April 2, 2024Assignee: RealWear, Inc.Inventor: Christopher Iain Parkinson
-
Patent number: 11941323Abstract: A meme creation method and apparatus are provided, and relate to the terminal field, to enrich forms and content of memes, and improve user experience. The method includes: displaying a first interface, where the first interface includes a speech input button; receiving, in response to an operation of triggering the speech input button by a user, a speech input by the user; recognizing the speech in a preset manner, where recognition in the preset manner includes at least content recognition, and if the speech includes a target keyword, recommending a first image meme set to the user; obtaining, in response to an operation of selecting one image meme from the first image meme set by the user, a target meme based on the image meme selected by the user and the speech or semantics corresponding to the speech; and sending the target meme.Type: GrantFiled: June 9, 2022Date of Patent: March 26, 2024Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Meng Wang, Zhuo Wang, Fan Fan, Lelin Wang
-
Patent number: 11922923Abstract: A system and method for emotion-enhanced natural speech using dilated convolutional neural networks, wherein an audio processing server receives a raw audio waveform from a dilated convolutional artificial neural network, associates text-based emotion content markers with portions of the raw audio waveform to produce an emotion-enhanced audio waveform, and provides the emotion-enhanced audio waveform to the dilated convolutional artificial neural network for use as a new input data set.Type: GrantFiled: April 30, 2020Date of Patent: March 5, 2024Assignee: VONAGE BUSINESS LIMITEDInventors: Alan McCord, Ashley Unitt, Brian Galvin
-
Patent number: 11907324Abstract: Systems and methods are disclosed herein for generating and modifying a workflow comprising a series of webpages based on an online document. A document management system accesses an online document selected by a user and classifies each field of the online document into one of a set of categories. For each category, the system generates a form webpage comprising questions corresponding to each field classified as the category and combines the generated webpages to create a workflow. The system may modify the workflow by generating and adding one or more additional form webpages based on one or more answers provided by an entity completing the webform page. In response to the entity completing the modified generated workflow, the system generates a completed document based on the online document and the answers provided by the entity.Type: GrantFiled: April 29, 2022Date of Patent: February 20, 2024Assignee: DocuSign, Inc.Inventors: Gustavo Both Bitdinger, Mangesh Prabhakar Bhandarkar, Nipun Dureja, Vasudevan Sampath, Robert Sherwin, Duane Robert Wald, Mark Spencer Seabourne, Claire Marie Small, David Minoru Hirotsu, Dia A. Abulzahab, Li Xu, Brent Weston Robinett, Jerome Levadoux, Ellis David Berner, Jun Gao, Andrew James Ashlock, Jacob Scott Mitchell
-
Patent number: 11886771Abstract: A customizable communication system and method of use are described for providing dialect and language options for users to employ during interactions between the user and a third-party application, thereby enhancing user experience. In some embodiments, the system allows a user to select a plurality of dialect and language preferences while interacting with a third-party application offering voice command technology. The selected dialect and language preference is used during the interaction between the user and the third-party application.Type: GrantFiled: November 25, 2020Date of Patent: January 30, 2024Inventors: Joseph Byers, Corey Blevins, Michael Orr
-
Patent number: 11881205Abstract: The present disclosure relates to a speech synthesis method and device, and a computer-readable storage medium, and relates to the field of computer technology. The method of the present disclosure includes: dividing a text into a plurality of segments according to a language category to which each of the segments belongs; converting each of the segments into a phoneme corresponding to the segment to generate a phoneme sequence of the text according to the language category to which each of the segments belongs; inputting the phoneme sequence into a speech synthesis model trained in advance and converting the phoneme sequence into a vocoder characteristic parameter; and inputting the vocoder characteristic parameter into a vocoder to generate a speech.Type: GrantFiled: March 30, 2020Date of Patent: January 23, 2024Assignees: BEIJING JINGDONG SHANGKE INFORMATION TECHNOLOGY CO, LTD., BEIJING JINGDONG CENTURY TRADING CO., LTD.Inventors: Zhizheng Wu, Zhengchen Zhang, Wei Song, Yonghui Rao, Zhihang Xie, Guanghui Xu, Shuyong Liu, Bosen Ma, Shuangwen Qiu, Junmin Lin
-
Patent number: 11837216Abstract: A method for training a generative adversarial network (GAN)-based text-to-speech (TTS) model and a speech recognition model in unison includes obtaining a plurality of training text utterances. At each of a plurality of output steps for each training text utterance, the method also includes generating, for output by the GAN-Based TTS model, a synthetic speech representation of the corresponding training text utterance, and determining, using an adversarial discriminator of the GAN, an adversarial loss term indicative of an amount of acoustic noise disparity in one of the non-synthetic speech representations selected from the set of spoken training utterances relative to the corresponding synthetic speech representation of the corresponding training text utterance. The method also includes updating parameters of the GAN-based TTS model based on the adversarial loss term determined at each of the plurality of output steps for each training text utterance of the plurality of training text utterances.Type: GrantFiled: February 14, 2023Date of Patent: December 5, 2023Assignee: Google LLCInventors: Zhehuai Chen, Andrew M. Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno Mengibar
-
Patent number: 11837214Abstract: Various embodiments of the present disclosure evaluate transcription accuracy. In some implementations, the system normalizes a first transcription of an audio file and a baseline transcription of the audio file. The baseline transcription can be used as an accurate transcription of the audio file. The system can further determine an error rate of the first transcription by aligning each portion of the first transcription with the portion of the baseline transcription, and assigning a label to each portion based on a comparison of the portion of the first transcription with the portion of the baseline transcription.Type: GrantFiled: October 29, 2020Date of Patent: December 5, 2023Assignee: United Services Automobile Association (USAA)Inventors: Michael J. Szentes, Carlos Chavez, Robert E. Lewis, Nicholas S. Walker
-
Patent number: 11823657Abstract: An example embodiment may involve receiving, from a client device, a selection of text-based articles from newsfeeds. The selection may specify that the text-based articles have been flagged for audible playout. The example embodiment may also involve, possibly in response to receiving the selection of the text-based articles, retrieving text-based articles from the newsfeeds. The example embodiment may also involve causing the text-based articles to be converted into audio files. The example embodiment may also involve receiving a request to stream the audio files to the client device or another device. The example embodiment may also involve causing the audio files to be streamed to the client device or the other device.Type: GrantFiled: December 30, 2022Date of Patent: November 21, 2023Assignee: Gracenote Digital Ventures, LLCInventor: Venkatarama Anilkumar Panguluri
-
Patent number: 11647130Abstract: An information processing system that is configured such that a plurality of speakers are coordinated with one information processing apparatus and is capable of coping with a state of competition between instructions to the speakers. An information processing system includes an image forming apparatus, a first voice control device that receives a voice instruction to the image forming apparatus, and a second voice control device that receives a voice instruction to the image forming apparatus. The image forming apparatus is controlled to execute a job based on information on which of the first voice control device and the second voice control device is a device which is to be preferentially used.Type: GrantFiled: June 1, 2022Date of Patent: May 9, 2023Assignee: CANON KABUSHIKI KAISHAInventors: Hiroshi Sekine, Masato Fukuda
-
Patent number: 11636430Abstract: A computerized system for summarizing agreements between two or more parties, comprises one or more processors. The processors may be configured to capture data relating to the agreement, such as agent screen data during an interaction with a customer. The data may be captured in successive capture operations each in response to an event, such as an agent key press or data entry. The captured data may be used to prepare a continuous text summarizing the agreement. An audio summary of the agreement may be derived from the text and played to at least one of the parties.Type: GrantFiled: October 8, 2020Date of Patent: April 25, 2023Assignee: NICE LTD.Inventors: David Geffen, Eshay Livne, Omer Abramovich, Eyal Eshel
-
Patent number: 11600252Abstract: A performance analysis method according to the present invention includes generating information related to a performance tendency of a user, from observed performance information relating to a performance of a musical piece by the user and inferred performance information that occurs when the musical piece is performed based on a specific tendency.Type: GrantFiled: January 24, 2020Date of Patent: March 7, 2023Assignee: YAMAHA CORPORATIONInventor: Akira Maezawa
-
Patent number: 11551723Abstract: In one aspect, an example method includes (i) receiving a first group of video content items; (ii) identifying from among the first group of video content items, a second group of video content items having a threshold extent of similarity with each other; (iii) determining a quality score for each video content item of the second group; (iv) identifying from among the second group of video content items, a third group of video content items each having a quality score that exceeds a quality score threshold; and (v) based on the identifying of the third group, transmitting at least a portion of at least one video content item of the identified third group to a digital video-effect (DVE) system, wherein the system is configured for using the at least the portion of the at least one video content item of the identified third group to generate a video content item.Type: GrantFiled: May 12, 2021Date of Patent: January 10, 2023Assignee: Gracenote, Inc.Inventors: Dale T. Roberts, Michael Gubman
-
Patent number: 11450313Abstract: Systems and methods of determining phonetic relationships are provided. For instance data indicative of an input text phrase input by a user can be received. An audio output corresponding to a spoken rendering of the input text phrase can be determined. A text transcription of the audio output of the input text phrase can be determined. The text transcription can be a textual representation of the audio output. The text transcription can be compared against a plurality of test phrases to identify a match between the text transcription and at least one test phrase.Type: GrantFiled: April 9, 2020Date of Patent: September 20, 2022Assignee: GOOGLE LLCInventors: Nikhil Chandru Rao, Saisuresh Krishnakumaran
-
Patent number: 8886538Abstract: Systems and methods for speech synthesis and, in particular, text-to-speech systems and methods for converting a text input to a synthetic waveform by processing prosodic and phonetic content of a spoken example of the text input to accurately mimic the input speech style and pronunciation. Systems and methods provide an interface to a TTS system to allow a user to input a text string and a spoken utterance of the text string, extract prosodic parameters from the spoken input, and process the prosodic parameters to derive corresponding markup for the text input to enable a more natural sounding synthesized speech.Type: GrantFiled: September 26, 2003Date of Patent: November 11, 2014Assignee: Nuance Communications, Inc.Inventors: Andy Aaron, Raimo Bakis, Ellen M. Eide, Wael M. Hamza
-
Publication number: 20140074478Abstract: A speech replication system including a speech generation unit having a program running in a memory of the speech generation unit, the program executing the steps of receiving an audio stream, identifying words within the audio stream, analyzing each word to determine the audio characteristics of the speaker's voice, storing the audio characteristics of the speaker's voice in the memory, receiving text information, converting the text information into an output audio stream using the audio characteristics of the speaker stored in the memory, and playing the output audio stream.Type: ApplicationFiled: September 7, 2012Publication date: March 13, 2014Applicant: ISPEECH CORP.Inventors: Heath Ahrens, Florencio Isaac Martin, Tyler A.R. Auten
-
Publication number: 20140067395Abstract: A system and method are described for engaging an audience in a conversational advertisement. A conversational advertising system converses with an audience using spoken words. The conversational advertising system uses a speech recognition application to convert an audience's spoken input into text and a text-to-speech application to transform text of a response to speech that is to be played to the audience. The conversational adverting system follows an advertisement script to guide the audience in a conversation.Type: ApplicationFiled: August 28, 2012Publication date: March 6, 2014Applicant: Nuance Communications, Inc.Inventors: Sundar Balasubramanian, Michael McSherry, Aaron Sheedy
-
Publication number: 20140067397Abstract: Techniques disclosed herein include systems and methods that improve audible emotional characteristics used when synthesizing speech from a text source. Systems and methods herein use emoticons identified from a source text to provide contextual text-to-speech expressivity. In general, techniques herein analyze text and identify emoticons included within the text. The source text is then tagged with corresponding mood indicators. For example, if the system identifies an emoticon at the end of a sentence, then the system can infer that this sentence has a specific tone or mood associated with it. Depending on whether the emoticon is a smiley face, angry face, sad face, laughing face, etc., the system can infer use or mood from the various emoticons and then change or modify the expressivity of the TTS output such as by changing intonation, prosody, speed, pauses, and other expressivity characteristics.Type: ApplicationFiled: August 29, 2012Publication date: March 6, 2014Applicant: NUANCE COMMUNICATIONS, INC.Inventor: Carey Radebaugh
-
Publication number: 20140025381Abstract: Instead of relying on humans to subjectively evaluate speech intelligibility of a subject, a system objectively evaluates the speech intelligibility. The system receives speech input and calculates confidence scores at multiple different levels using a Template Constrained Generalized Posterior Probability algorithm. One or multiple intelligibility classifiers are utilized to classify the desired entities on an intelligibility scale. A specific intelligibility classifier utilizes features such as the various confidence scores. The scale of the intelligibility classification can be adjusted to suit the application scenario. Based on the confidence score distributions and the intelligibility classification results at multiple levels an overall objective intelligibility score is calculated. The objective intelligibility scores can be used to rank different subjects or systems being assessed according to their intelligibility levels. The speech that is below a predetermined intelligibility (e.g.Type: ApplicationFiled: July 20, 2012Publication date: January 23, 2014Applicant: MICROSOFT CORPORATIONInventors: Linfang Wang, Yan Teng, Lijuan Wang, Frank Kao-Ping Soong, Zhe Geng, William Brad Waller, Mark Tillman Hanson
-
Publication number: 20140025366Abstract: TXTVOICETRANS can pronounce the written word in the same language or in another language. TXTVOICETRANS is a Machine Translation computer system that can translate the source text into another language and, at the same time, pronounce the translated text, word by word, preserving fully the accent and the stress of the spoken word and the intonation of a sequence of words. The pronunciation is based on whole words. The computer system can pronounce the most used synonym of the word or the concept the translated word belongs to, instead of the translated word, displayed with the translation.Type: ApplicationFiled: July 20, 2012Publication date: January 23, 2014Inventors: Hristo Tzanev Georgiev, Maria Theresia Georgiev(-Good)
-
Publication number: 20140019135Abstract: A method of speech synthesis including receiving a text input sent by a sender, processing the text input responsive to at least one distinguishing characteristic of the sender to produce synthesized speech that is representative of a voice of the sender, and communicating the synthesized speech to a recipient user of the system.Type: ApplicationFiled: July 16, 2012Publication date: January 16, 2014Applicant: GENERAL MOTORS LLCInventors: Gaurav Talwar, Xufang Zhao, Ron M. Hecht
-
Publication number: 20130238338Abstract: A method and apparatus for improved approaches for uttering the spelling of words and phrases over a communication session is described. The method includes determining a character to produce a first audio signal representing a phonetic utterance of the character, determining a code word that starts with a code word character identical to the character, and generating a second audio signal representing an utterance of the code word, wherein the first audio signal and the second audio signal are provided over a communication session for detection of the character.Type: ApplicationFiled: March 6, 2012Publication date: September 12, 2013Applicant: Verizon Patent and Licensing, Inc.Inventors: Manish G. Kharod, Bhaskar R. Gudlavenkatasiva, Nityanand Sharma, Sutap Chatterjee, Ganesh Bhathivi
-
Publication number: 20130166915Abstract: A method for secure text-to-speech conversion of text using speech or voice synthesis that prevents the originator's voice from being used or distributed inappropriately or in an unauthorized manner is described. Security controls authenticate the sender of the message, and optionally the recipient, and ensure that the message is read in the originator's voice, not the voice of another person. Such controls permit an originator's voiceprint file to be publicly accessible, but limit its use for voice synthesis to text-based content created by the sender, or sent to a trusted recipient. In this way a person can be assured that their voice cannot be used for content they did not write.Type: ApplicationFiled: December 22, 2011Publication date: June 27, 2013Applicant: RESEARCH IN MOTION LIMITEDInventors: Simon Peter DESAI, Neil Patrick ADAMS
-
Publication number: 20130144624Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for reducing latency in web-browsing TTS systems without the use of a plug-in or Flash® module. A system configured according to the disclosed methods allows the browser to send prosodically meaningful sections of text to a web server. A TTS server then converts intonational phrases of the text into audio and responds to the browser with the audio file. The system saves the audio file in a cache, with the file indexed by a unique identifier. As the system continues converting text into speech, when identical text appears the system uses the cached audio corresponding to the identical text without the need for re-synthesis via the TTS server.Type: ApplicationFiled: December 1, 2011Publication date: June 6, 2013Applicant: AT&T Intellectual Property I, L.P.Inventors: Alistair D. CONKIE, Mark Charles Beutnagel, Taniya Mishra
-
Publication number: 20130132087Abstract: Methods, systems, and apparatus are generally described for providing an audio interface.Type: ApplicationFiled: November 21, 2011Publication date: May 23, 2013Applicant: EMPIRE TECHNOLOGY DEVELOPMENT LLCInventors: Noriaki Kuwahara, Tsutomu Miyasato, Yasuyuki Sumi
-
Publication number: 20130102295Abstract: A mobile voice platform for providing a user speech interface to computer-based services includes a mobile device having a processor, communication circuitry that provides access to the computer-based services, an operating system, and one or more applications that are run using the operating system and that utilize one or more of the computer-based services via the communication circuitry.Type: ApplicationFiled: September 27, 2012Publication date: April 25, 2013Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLCInventor: GM Global Technology Operations LLC
-
Publication number: 20130085758Abstract: A telecare and/or telehealth communication method is described. The method comprises providing predetermined voice messages configured to ask questions to or to give instructions to an assisted individual, providing an algorithm configured to communicate with the assisted individual, and communicating at least one of the predetermined voice messages configured to ask questions to or to give instructions to the assisted individual. The method further comprises analyzing a responsiveness and/or compliance characteristics of the assisted individual, and providing the assisted individual with voice messages in a form most acceptable and effective for the assisted individual on the basis of the analyzed responsiveness and/or the analyzed compliance characteristics.Type: ApplicationFiled: September 28, 2012Publication date: April 4, 2013Applicant: GENERAL ELECTRIC COMPANYInventors: Csenge CSOMA, Akos ERDOS, Alan DAVIES
-
Publication number: 20130080175Abstract: According to one embodiment, a markup assistance apparatus includes an acquisition unit, a first calculation unit, a detection unit and a presentation unit. The acquisition unit acquires feature amount for respective tags, each of the tags being used to control text-to-speech processing of a markup text. The first calculation unit calculates, for respective character strings, a variance of feature amounts of the tags which are assigned to the character string in a markup text. The detection unit detects first character string assigned first tag having the variance not less than a first threshold value as a first candidate including the tag to be corrected. The presentation unit presents the first candidate.Type: ApplicationFiled: September 24, 2012Publication date: March 28, 2013Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Kouichirou Mori, Masahiro Morita
-
Publication number: 20130073288Abstract: An email system for mobile devices, such as cellular phones and PDAs, is disclosed which allows email messages to be played back on the mobile device as voice messages on demand by way of a media player, thus eliminating the need for a unified messaging system. Email messages are received by the mobile device in a known manner. In accordance with an important aspect of the invention, the email messages are identified by the mobile device as they are received. After the message is identified, the mobile device sends the email message in text format to a server for conversion to speech or voice format. After the message is converted to speech format, the server sends the messages back to the user's mobile device and notifies the user of the email message and then plays the message back to the user through a media player upon demand.Type: ApplicationFiled: November 15, 2012Publication date: March 21, 2013Applicant: NUANCE COMMUNICATIONS, INC.Inventor: Nuance Communications, Inc.
-
Publication number: 20130073287Abstract: A method, computer program product, and system for voice pronunciation for text communication is described. A selected portion of a text communication is determined. A prompt to record a pronunciation relating to the selected portion of the text communication is provided at a first computing device. The recorded pronunciation is associated with the selected portion of the text communication. A visual indicator, relating to the selected portion of the text communication and the recorded pronunciation, is displayed.Type: ApplicationFiled: September 20, 2011Publication date: March 21, 2013Applicant: International Business Machines CorporationInventors: Kristina Beckley, Vincent Burckhardt, Alexis Yao Pang Song, Smriti Talwar
-
Publication number: 20130066632Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for modifying the prosody of synthesized speech based on an associated speech act. A system configured according to the method embodiment (1) receives text, (2) performs an analysis of the text to determine and assign a speech act label to the text, and (3) converts the text to speech, where the speech prosody is based on the speech act label. The analysis performed compares the text to a corpus of previously tagged utterances to find a close match, determines a confidence score from a correlation of the text and the close match, and, if the confidence score is above a threshold value, retrieving the speech act label of the close match and assigning it to the text.Type: ApplicationFiled: September 14, 2011Publication date: March 14, 2013Applicant: AT&T Intellectual Property I, L.P.Inventors: Alistair D. Conkie, Srinivas Bangalore, Vivek Kumar Rangarajan Sridhar, Ann K. Syrdal
-
Patent number: 8391544Abstract: An image processing apparatus includes: a storage module configured to store a plurality of pieces of comment data; an analyzing module configured to analyze an expression of a person contained in image data; a generating module configured to select a target comment data from among the comment data stored in the storage module based on the expression of the person analyzed by the analyzing module, and to generate voice data using the target comment data; and an output module configured to output reproduction data to be used for displaying the image data together with the voice data generated by the generating module.Type: GrantFiled: June 1, 2010Date of Patent: March 5, 2013Assignee: Kabushiki Kaisha ToshibaInventors: Kousuke Imoji, Yuki Kaneko, Junichi Takahashi
-
Publication number: 20130046541Abstract: An apparatus for assisting visually impaired persons includes a headset. A camera is mounted on the headset. A microprocessor communicates with the camera for receiving an optically read code captured by the camera and converting the optically read code to an audio signal as a function of a trigger contained within the optical code. A speaker communicating with the processor outputs the audio signal.Type: ApplicationFiled: August 3, 2012Publication date: February 21, 2013Inventors: Ronald L. Klein, James A. Kutsch, JR.
-
Publication number: 20130041668Abstract: A voice learning apparatus includes a learning-material voice storage unit that stores learning material voice data including example sentence voice data; a learning text storage unit that stores a learning material text including an example sentence text; a learning-material text display controller that displays the learning material text; a learning-material voice output controller that performs voice output based on the learning material voice data; an example sentence specifying unit that specifies the example sentence text during the voice output; an example-sentence voice output controller that performs voice output based on the example sentence voice data associated with the specified example sentence text; and a learning-material voice output restart unit that restarts the voice output from a position where the voice output is stopped last time, after the voice output is performed based on the example sentence voice data.Type: ApplicationFiled: August 7, 2012Publication date: February 14, 2013Applicant: Casio Computer Co., LtdInventor: Daisuke NAKAJIMA
-
Publication number: 20130041669Abstract: A method, system, and computer program product are provided for speech output with confidence indication. The method includes receiving a confidence score for segments of speech or text to be synthesized to speech. The method includes modifying a speech segment by altering one or more parameters of the speech proportionally to the confidence score.Type: ApplicationFiled: October 17, 2012Publication date: February 14, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: International Business Machines Corporation
-
Publication number: 20130030810Abstract: The present invention provides a frugal method for extraction of speech data and associated transcription from plurality of web resources (internet) for speech corpus creation characterized by an automation of the speech corpus creation and cost reduction. An integration of existing speech corpus with extracted speech data and its transcription from the web resources to build an aggregated rich speech corpus that are effective and easy to adapt for generating acoustic and language models for (Automatic Speech Recognition) ASR systems.Type: ApplicationFiled: June 26, 2012Publication date: January 31, 2013Applicant: Tata Consultancy Services LimitedInventors: Sunil Kumar Kopparapu, Imran Ahmed Sheikh
-
Publication number: 20130018658Abstract: A prompt generation engine operates to dynamically extend prompts of a multimodal application. The prompt generation engine receives a media file having a metadata container. The prompt generation engine operates on a multimodal device that supports a voice mode and a non-voice mode for interacting with the multimodal device. The prompt generation engine retrieves from the metadata container a speech prompt related to content stored in the media file for inclusion in the multimodal application. The prompt generation engine modifies the multimodal application to include the speech prompt.Type: ApplicationFiled: September 12, 2012Publication date: January 17, 2013Applicant: International Business Machiness CorporationInventors: Ciprian Agapi, William K. Bodin, Charles W. Cross, JR.
-
Publication number: 20130006620Abstract: A system and method for providing automatic and coordinated sharing of conversational resources, e.g., functions and arguments, between network-connected servers and devices and their corresponding applications. In one aspect, a system for providing automatic and coordinated sharing of conversational resources includes a network having a first and second network device, the first and second network device each comprising a set of conversational resources, a dialog manager for managing a conversation and executing calls requesting a conversational service, and a communication stack for communicating messages over the network using conversational protocols, wherein the conversational protocols establish coordinated network communication between the dialog managers of the first and second network device to automatically share the set of conversational resources of the first and second network device, when necessary, to perform their respective requested conversational service.Type: ApplicationFiled: September 11, 2012Publication date: January 3, 2013Applicant: Nuance Communications, Inc.Inventors: Stephane H. Maes, Ponani Gopalakrishnan
-
Publication number: 20120330667Abstract: Included in a speech synthesizer, a natural language processing unit divides text data, input from a text input unit, into a plurality of components (particularly, words). An importance prediction unit estimates an importance level of each component according to the degree of how much each component contributes to understanding when a listener hears synthesized speech. Then, the speech synthesizer determines a processing load based on the device state when executing synthesis processing and the importance level. Included in the speech synthesizer, a synthesizing control unit and a wave generation unit reduce the processing time for a phoneme with a low importance level by curtailing its processing load (relatively degrading its sound quality), allocate a part of the processing time, made available by this reduction, to the processing time of a phoneme with a high importance level, and generates synthesized speech in which important words are easily audible.Type: ApplicationFiled: June 20, 2012Publication date: December 27, 2012Inventors: Qinghua Sun, Kenji Nagamatsu, Yusuke Fujita
-
Publication number: 20120330665Abstract: A system is configured to read a prescription label and output audio information corresponding to prescription information present on or linked to the prescription label. The system may have knowledge about prescription labels and prescription information, and use this knowledge to present the audio information in a structured form to the user.Type: ApplicationFiled: June 4, 2012Publication date: December 27, 2012Applicant: Labels That Talk, LTDInventor: Kenneth Berkun
-
Publication number: 20120330666Abstract: A method and system for vocalizing user-selected sporting event scores. A customized spoken score application module can be configured in association with a device. A real-time score can be preselected by a user from an existing sporting event website for automatically vocalizing the score in a multitude of languages utilizing a speech synthesizer and a translation engine. An existing text-to-speech engine can be integrated with the spoken score application module and controlled by the application module to automatically vocalize the preselected scores listed on the sporting event site. The synthetically-voiced, real-time score can be transmitted to the device at a predetermined time interval. Such an approach automatically and instantly pushes the real time vocal alerts thereby permitting the user to continue multitasking without activating the pre-selected vocal alerts.Type: ApplicationFiled: June 6, 2012Publication date: December 27, 2012Inventors: Anthony Verna, Luis M. Ortiz
-
Publication number: 20120330668Abstract: A customized live tile application module can be configured in association with the mobile communication device in order to automatically vocalize the real-time information preselected by a user in a multitude of languages. A text-to-speech application module can be integrated with the customized live tile application module to automatically vocalize the preselected real-time information. The real-time information can be obtained from a tile and/or a website integrated with a remote server and announced after a text to speech conversion process without opening the tile, if the tiles are selected for announcement of information by the device. Such an approach automatically and instantly pushes a vocal alert with respect to the user-selected real-time information on the mobile communication device thereby permitting the user to continue multitasking. Information from tiles can also be rendered on second screens from a mobile device.Type: ApplicationFiled: August 15, 2012Publication date: December 27, 2012Inventors: Anthony Verna, Luis M. Ortiz
-
Publication number: 20120323578Abstract: A sound control section (114) selects and outputs a text-to-speech item from items included in program information multiplexed with a broadcast signal; and starts or stops outputting the text-to-speech item, based on request from a remote controller control section (113). A sound generation section (115) converts the text-to-speech item to a sound signal. A speaker (109) reproduces the sound signal. The sound control section (114) compares each item of information about a program currently selected by user's operation of the remote controller, with each item of information about the previous program selected just before the user's operation. If an item of the currently selected program information is the same as the corresponding item of the operation-prior program information, and text-to-speech processing has been already completed for the item after the last change in the item, the sound control section (114) stops outputting the item to the sound generation section (115).Type: ApplicationFiled: February 23, 2011Publication date: December 20, 2012Applicant: PANASONIC CORPORATIONInventor: Koumei Kubota
-
Publication number: 20120316881Abstract: A normalized spectrum storage unit 204 prestores normalized spectra calculated based on a random number series. A voiced sound generating unit 201 generates voiced sound waveforms based on a plurality of segments of voiced sounds corresponding to an inputted text and the normalized spectra stored in the normalized spectrum storage unit 204. An unvoiced sound generating unit 202 generates unvoiced sound waveforms based on a plurality of segments of unvoiced sounds corresponding to the inputted text. A synthesized speech generating unit 203 generates a synthesized speech based on the voiced sound waveforms generated by the voiced sound generating unit 201 and the unvoiced sound waveforms generated by the unvoiced sound generating unit 202.Type: ApplicationFiled: March 23, 2011Publication date: December 13, 2012Applicant: NEC CORPORATIONInventor: Masanori Kato
-
Publication number: 20120310643Abstract: Techniques for presenting data input as a plurality of data chunks including a first data chunk and a second data chunk. The techniques include converting the plurality of data chunks to a textual representation comprising a plurality of text chunks including a first text chunk corresponding to the first data chunk and a second text chunk corresponding to the second data chunk, respectively, and providing a presentation of at least part of the textual representation such that the first text chunk is presented differently than the second text chunk to, when presented, assist a user in proofing the textual representation.Type: ApplicationFiled: May 23, 2012Publication date: December 6, 2012Applicant: Nuance Communications, Inc.Inventors: Martin Labsky, Jan Kleindienst, Tomas Macek, David Nahamoo, Jan Curin, Lars Koenig, Holger Quast