Text Analysis, Generation Of Parameters For Speech Synthesis Out Of Text, E.g., Grapheme To Phoneme Translation, Prosody Generation, Stress, Or Intonation Determination, Etc. (epo) Patents (Class 704/E13.011)

E Subclasses

Grapheme to phoneme, detection of language (epo) (Class 704/E13.012)

Prosody rules derived from text (epo) (Class 704/E13.013)

Stress or intonation (epo) (Class 704/E13.014)

Systems and methods for dynamic spatial separation of sound objects

Patent number: 12167224

Abstract: Sound objects are identified within a content item and location metadata is extracted from the content item for each sound object. A reference layout is generated, relative to a user position, for the sound objects based on the location metadata. If a first sound object is within a threshold angle, relative to the user position, from a second sound object, a virtual position of either the first sound object or the second sound object is adjusted by an adjustment angle.

Type: Grant

Filed: June 24, 2022

Date of Patent: December 10, 2024

Assignee: Adeia Guides Inc.

Inventor: Warren Keith Edwards
Voice activation for computing devices

Patent number: 12153858

Abstract: In general, techniques are described that enable voice activation for computing devices. A computing device configured to support an audible interface that comprises a memory and one or more processors may be configured to perform the techniques. The memory may store a first audio signal representative of an environment external to a user associated with the computing device and a second audio signal sensed by a microphone coupled to a housing of the computing device. The one or more processors may verify, based on the first audio signal and the second audio signal, that the user activated the audible interface of the computing device, and obtain, based on the verification, additional audio signals representative of one or more audible commands.

Type: Grant

Filed: February 25, 2020

Date of Patent: November 26, 2024

Assignee: QUALCOMM Incorporated

Inventors: Taher Shahbazi Mirzahasanloo, Rogerio Guedes Alves, Lae-Hoon Kim, Erik Visser, Dongmei Wang, Fatemeh Saki
Lattice speech corrections

Patent number: 12154549

Abstract: A method includes receiving audio data corresponding to a query spoken and processing the audio data to generate multiple candidate hypotheses each represented by a respective sequence of hypothesized terms. For each candidate hypothesis, the method includes determining whether the sequence of hypothesized terms includes a source phrase from a list of phrase correction pairs. Each phrase correction pair includes a corresponding source phrase that was misrecognized and a corresponding target phrase replacing the source phrase. When the respective sequence of hypothesized terms includes the source phrase, the method includes generating a corresponding additional candidate hypothesis that replaces the source phrase.

Type: Grant

Filed: December 15, 2021

Date of Patent: November 26, 2024

Assignee: Google LLC

Inventors: Ágoston Weisz, Leonid Velikovich
Method and system for keypoint extraction from images of documents

Patent number: 12136286

Abstract: Keypoint extraction is done for extracting keypoints from images of documents. Based on different keypoint extraction approaches used by existing keypoint extraction mechanisms, number of keypoints extracted and related parameters vary. Disclosed herein is a method and system for keypoint extraction from images of one or more documents. In this method, a reference image and a test image of a document are collected as input. During the keypoint extraction, based on types of characters present in words extracted from the document images, a plurality of words are extracted. Further, all connected components in each of the extracted words are identified. Further, it is decided whether keypoints are to be searched in a first component or in a last component of all the identified connected components, and accordingly searches and extracts at least four of the keypoints from the test image and the corresponding four keypoints from the reference image.

Type: Grant

Filed: September 6, 2020

Date of Patent: November 5, 2024

Assignee: Tata Consultancy Services Limited

Inventors: Kushagra Mahajan, Monika Sharma, Lovekesh Vig
Systems and methods for adjusting dubbed speech based on context of a scene

Patent number: 12062358

Abstract: Systems and methods are disclosed herein for detecting dubbed speech in a media asset and receiving metadata corresponding to the media asset. The systems and methods may determine a plurality of scenes in the media asset based on the metadata, retrieve a portion of the dubbed speech corresponding to the first scene, and process the retrieved portion of the dubbed speech corresponding to the first scene to identify a speech characteristic of a character featured in the first scene. Further, the systems and methods may determine whether the speech characteristic of the character featured in the first scene matches the context of the first scene, and if the match fails, perform a function to adjust the portion of the dubbed speech so that the speech characteristic of the character featured in the first scene matches the context of the first scene.

Type: Grant

Filed: April 19, 2023

Date of Patent: August 13, 2024

Assignee: Rovi Guides, Inc.

Inventors: Mario Sanchez, Ashleigh Miller, Paul T. Stathacopoulos
Control device, system, and control method

Patent number: 12051412

Abstract: A control device includes at least one memory, and at least one processor configured to detect a voice segment from sound data, the sound data being detected while a controlled object operates, and stop the controlled object based on following conditions: a speaking speed is a predetermined speed threshold or greater, the speaking speed being calculated based on a portion of the sound data in the voice segment; and a length of the voice segment is a predetermined length threshold or less.

Type: Grant

Filed: August 20, 2021

Date of Patent: July 30, 2024

Assignee: Preferred Networks, Inc.

Inventors: Kenta Yonekura, Hirochika Asai, Kota Nabeshima, Manabu Nagao
Systems and methods for leveraging soundmojis to convey emotion during multimedia sessions

Patent number: 12050839

Abstract: Systems and methods are provided for generating a soundmoji for output. A content item is generated for output at a computing device, and a first input associated with the selection of a soundmoji menu is received. One or more soundmojis are generated for output, and a second input associated with the selection of a first soundmoji of the one or more soundmojis is received. A first timestamp of the content item associated with the selection of the first soundmoji is identified. An indication of a second timestamp of the content item and a second soundmoji is received, and a user interface element associated with the content item is updated to indicate the second soundmoji when the content item is being generated for output at the second timestamp.

Type: Grant

Filed: September 9, 2022

Date of Patent: July 30, 2024

Assignee: Rovi Guides, Inc.

Inventor: Serhad Doken
Systems and methods for structured phrase embedding and use thereof

Patent number: 12032911

Abstract: A system and method for training and using a text embedding model may include creating structured phrases from an input text; creating turn input samples from the input text, each turn input sample based on only or consisting of input from a single turn within the text and being formed by removing structure from structured phrases; and training an embedding model using the structured phrases and turn input samples. Call input samples may be created based on input from more than one turn within the text. At each level of resolution (e.g. phrase, speaker, call), a different level of resolution may be used to create input samples. At inference an embedding may be based on a weighted combination of the sub-terms within an input phrase, each weight being based on an inverse document frequency measure for the sub-term associated with the weight.

Type: Grant

Filed: January 8, 2021

Date of Patent: July 9, 2024

Assignee: Nice Ltd.

Inventor: Stephen Lauber
Customizing user interfaces of binary applications

Patent number: 11947752

Abstract: Aspects of the present invention facilitate the customization of binary application user interfaces without modification of the underlying source code. Embodiments described herein can provide for hands-free operation or accessibility services, such as touchless operation of compiled applications, with minimal intervention. Described embodiments can automatically interface with the touch-based operating system to generate hands-free commands that, when detected (e.g., voice detection), can cause corresponding touch-based commands to be executed. Embodiments can utilize audio inputs, by way of example, to facilitate hands-free interaction with the touch-based operating system and applications executing thereon.

Type: Grant

Filed: August 3, 2022

Date of Patent: April 2, 2024

Assignee: RealWear, Inc.

Inventor: Christopher Iain Parkinson
Meme creation method and apparatus

Patent number: 11941323

Abstract: A meme creation method and apparatus are provided, and relate to the terminal field, to enrich forms and content of memes, and improve user experience. The method includes: displaying a first interface, where the first interface includes a speech input button; receiving, in response to an operation of triggering the speech input button by a user, a speech input by the user; recognizing the speech in a preset manner, where recognition in the preset manner includes at least content recognition, and if the speech includes a target keyword, recommending a first image meme set to the user; obtaining, in response to an operation of selecting one image meme from the first image meme set by the user, a target meme based on the image meme selected by the user and the speech or semantics corresponding to the speech; and sending the target meme.

Type: Grant

Filed: June 9, 2022

Date of Patent: March 26, 2024

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Meng Wang, Zhuo Wang, Fan Fan, Lelin Wang
Optimal human-machine conversations using emotion-enhanced natural speech using hierarchical neural networks and reinforcement learning

Patent number: 11922923

Abstract: A system and method for emotion-enhanced natural speech using dilated convolutional neural networks, wherein an audio processing server receives a raw audio waveform from a dilated convolutional artificial neural network, associates text-based emotion content markers with portions of the raw audio waveform to produce an emotion-enhanced audio waveform, and provides the emotion-enhanced audio waveform to the dilated convolutional artificial neural network for use as a new input data set.

Type: Grant

Filed: April 30, 2020

Date of Patent: March 5, 2024

Assignee: VONAGE BUSINESS LIMITED

Inventors: Alan McCord, Ashley Unitt, Brian Galvin
Guided form generation in a document management system

Patent number: 11907324

Abstract: Systems and methods are disclosed herein for generating and modifying a workflow comprising a series of webpages based on an online document. A document management system accesses an online document selected by a user and classifies each field of the online document into one of a set of categories. For each category, the system generates a form webpage comprising questions corresponding to each field classified as the category and combines the generated webpages to create a workflow. The system may modify the workflow by generating and adding one or more additional form webpages based on one or more answers provided by an entity completing the webform page. In response to the entity completing the modified generated workflow, the system generates a completed document based on the online document and the answers provided by the entity.

Type: Grant

Filed: April 29, 2022

Date of Patent: February 20, 2024

Assignee: DocuSign, Inc.

Inventors: Gustavo Both Bitdinger, Mangesh Prabhakar Bhandarkar, Nipun Dureja, Vasudevan Sampath, Robert Sherwin, Duane Robert Wald, Mark Spencer Seabourne, Claire Marie Small, David Minoru Hirotsu, Dia A. Abulzahab, Li Xu, Brent Weston Robinett, Jerome Levadoux, Ellis David Berner, Jun Gao, Andrew James Ashlock, Jacob Scott Mitchell
Customizable communication system and method of use

Patent number: 11886771

Abstract: A customizable communication system and method of use are described for providing dialect and language options for users to employ during interactions between the user and a third-party application, thereby enhancing user experience. In some embodiments, the system allows a user to select a plurality of dialect and language preferences while interacting with a third-party application offering voice command technology. The selected dialect and language preference is used during the interaction between the user and the third-party application.

Type: Grant

Filed: November 25, 2020

Date of Patent: January 30, 2024

Inventors: Joseph Byers, Corey Blevins, Michael Orr
Speech synthesis method, device and computer readable storage medium

Patent number: 11881205

Abstract: The present disclosure relates to a speech synthesis method and device, and a computer-readable storage medium, and relates to the field of computer technology. The method of the present disclosure includes: dividing a text into a plurality of segments according to a language category to which each of the segments belongs; converting each of the segments into a phoneme corresponding to the segment to generate a phoneme sequence of the text according to the language category to which each of the segments belongs; inputting the phoneme sequence into a speech synthesis model trained in advance and converting the phoneme sequence into a vocoder characteristic parameter; and inputting the vocoder characteristic parameter into a vocoder to generate a speech.

Type: Grant

Filed: March 30, 2020

Date of Patent: January 23, 2024

Assignees: BEIJING JINGDONG SHANGKE INFORMATION TECHNOLOGY CO, LTD., BEIJING JINGDONG CENTURY TRADING CO., LTD.

Inventors: Zhizheng Wu, Zhengchen Zhang, Wei Song, Yonghui Rao, Zhihang Xie, Guanghui Xu, Shuyong Liu, Bosen Ma, Shuangwen Qiu, Junmin Lin
Speech recognition using unspoken text and speech synthesis

Patent number: 11837216

Abstract: A method for training a generative adversarial network (GAN)-based text-to-speech (TTS) model and a speech recognition model in unison includes obtaining a plurality of training text utterances. At each of a plurality of output steps for each training text utterance, the method also includes generating, for output by the GAN-Based TTS model, a synthetic speech representation of the corresponding training text utterance, and determining, using an adversarial discriminator of the GAN, an adversarial loss term indicative of an amount of acoustic noise disparity in one of the non-synthetic speech representations selected from the set of spoken training utterances relative to the corresponding synthetic speech representation of the corresponding training text utterance. The method also includes updating parameters of the GAN-based TTS model based on the adversarial loss term determined at each of the plurality of output steps for each training text utterance of the plurality of training text utterances.

Type: Grant

Filed: February 14, 2023

Date of Patent: December 5, 2023

Assignee: Google LLC

Inventors: Zhehuai Chen, Andrew M. Rosenberg, Bhuvana Ramabhadran, Pedro J. Moreno Mengibar
Transcription analysis platform

Patent number: 11837214

Abstract: Various embodiments of the present disclosure evaluate transcription accuracy. In some implementations, the system normalizes a first transcription of an audio file and a baseline transcription of the audio file. The baseline transcription can be used as an accurate transcription of the audio file. The system can further determine an error rate of the first transcription by aligning each portion of the first transcription with the portion of the baseline transcription, and assigning a label to each portion based on a comparison of the portion of the first transcription with the portion of the baseline transcription.

Type: Grant

Filed: October 29, 2020

Date of Patent: December 5, 2023

Assignee: United Services Automobile Association (USAA)

Inventors: Michael J. Szentes, Carlos Chavez, Robert E. Lewis, Nicholas S. Walker
Audio streaming of text-based articles from newsfeeds

Patent number: 11823657

Abstract: An example embodiment may involve receiving, from a client device, a selection of text-based articles from newsfeeds. The selection may specify that the text-based articles have been flagged for audible playout. The example embodiment may also involve, possibly in response to receiving the selection of the text-based articles, retrieving text-based articles from the newsfeeds. The example embodiment may also involve causing the text-based articles to be converted into audio files. The example embodiment may also involve receiving a request to stream the audio files to the client device or another device. The example embodiment may also involve causing the audio files to be streamed to the client device or the other device.

Type: Grant

Filed: December 30, 2022

Date of Patent: November 21, 2023

Assignee: Gracenote Digital Ventures, LLC

Inventor: Venkatarama Anilkumar Panguluri
Information processing system capable of connecting a plurality of voice control devices, method of controlling information processing system, and storage medium

Patent number: 11647130

Abstract: An information processing system that is configured such that a plurality of speakers are coordinated with one information processing apparatus and is capable of coping with a state of competition between instructions to the speakers. An information processing system includes an image forming apparatus, a first voice control device that receives a voice instruction to the image forming apparatus, and a second voice control device that receives a voice instruction to the image forming apparatus. The image forming apparatus is controlled to execute a job based on information on which of the first voice control device and the second voice control device is a device which is to be preferentially used.

Type: Grant

Filed: June 1, 2022

Date of Patent: May 9, 2023

Assignee: CANON KABUSHIKI KAISHA

Inventors: Hiroshi Sekine, Masato Fukuda
Device, system and method for summarizing agreements

Patent number: 11636430

Abstract: A computerized system for summarizing agreements between two or more parties, comprises one or more processors. The processors may be configured to capture data relating to the agreement, such as agent screen data during an interaction with a customer. The data may be captured in successive capture operations each in response to an event, such as an agent key press or data entry. The captured data may be used to prepare a continuous text summarizing the agreement. An audio summary of the agreement may be derived from the text and played to at least one of the parties.

Type: Grant

Filed: October 8, 2020

Date of Patent: April 25, 2023

Assignee: NICE LTD.

Inventors: David Geffen, Eshay Livne, Omer Abramovich, Eyal Eshel
Performance analysis method

Patent number: 11600252

Abstract: A performance analysis method according to the present invention includes generating information related to a performance tendency of a user, from observed performance information relating to a performance of a musical piece by the user and inferred performance information that occurs when the musical piece is performed based on a specific tendency.

Type: Grant

Filed: January 24, 2020

Date of Patent: March 7, 2023

Assignee: YAMAHA CORPORATION

Inventor: Akira Maezawa
Computing system with DVE template selection and video content item generation feature

Patent number: 11551723

Abstract: In one aspect, an example method includes (i) receiving a first group of video content items; (ii) identifying from among the first group of video content items, a second group of video content items having a threshold extent of similarity with each other; (iii) determining a quality score for each video content item of the second group; (iv) identifying from among the second group of video content items, a third group of video content items each having a quality score that exceeds a quality score threshold; and (v) based on the identifying of the third group, transmitting at least a portion of at least one video content item of the identified third group to a digital video-effect (DVE) system, wherein the system is configured for using the at least the portion of the at least one video content item of the identified third group to generate a video content item.

Type: Grant

Filed: May 12, 2021

Date of Patent: January 10, 2023

Assignee: Gracenote, Inc.

Inventors: Dale T. Roberts, Michael Gubman
Determining phonetic relationships

Patent number: 11450313

Abstract: Systems and methods of determining phonetic relationships are provided. For instance data indicative of an input text phrase input by a user can be received. An audio output corresponding to a spoken rendering of the input text phrase can be determined. A text transcription of the audio output of the input text phrase can be determined. The text transcription can be a textual representation of the audio output. The text transcription can be compared against a plurality of test phrases to identify a match between the text transcription and at least one test phrase.

Type: Grant

Filed: April 9, 2020

Date of Patent: September 20, 2022

Assignee: GOOGLE LLC

Inventors: Nikhil Chandru Rao, Saisuresh Krishnakumaran
Systems and methods for text-to-speech synthesis using spoken example

Patent number: 8886538

Abstract: Systems and methods for speech synthesis and, in particular, text-to-speech systems and methods for converting a text input to a synthetic waveform by processing prosodic and phonetic content of a spoken example of the text input to accurately mimic the input speech style and pronunciation. Systems and methods provide an interface to a TTS system to allow a user to input a text string and a spoken utterance of the text string, extract prosodic parameters from the spoken input, and process the prosodic parameters to derive corresponding markup for the text input to enable a more natural sounding synthesized speech.

Type: Grant

Filed: September 26, 2003

Date of Patent: November 11, 2014

Assignee: Nuance Communications, Inc.

Inventors: Andy Aaron, Raimo Bakis, Ellen M. Eide, Wael M. Hamza
SYSTEM AND METHOD FOR DIGITALLY REPLICATING SPEECH

Publication number: 20140074478

Abstract: A speech replication system including a speech generation unit having a program running in a memory of the speech generation unit, the program executing the steps of receiving an audio stream, identifying words within the audio stream, analyzing each word to determine the audio characteristics of the speaker's voice, storing the audio characteristics of the speaker's voice in the memory, receiving text information, converting the text information into an output audio stream using the audio characteristics of the speaker stored in the memory, and playing the output audio stream.

Type: Application

Filed: September 7, 2012

Publication date: March 13, 2014

Applicant: ISPEECH CORP.

Inventors: Heath Ahrens, Florencio Isaac Martin, Tyler A.R. Auten
SYSTEMS AND METHODS FOR ENGAGING AN AUDIENCE IN A CONVERSATIONAL ADVERTISEMENT

Publication number: 20140067395

Abstract: A system and method are described for engaging an audience in a conversational advertisement. A conversational advertising system converses with an audience using spoken words. The conversational advertising system uses a speech recognition application to convert an audience's spoken input into text and a text-to-speech application to transform text of a response to speech that is to be played to the audience. The conversational adverting system follows an advertisement script to guide the audience in a conversation.

Type: Application

Filed: August 28, 2012

Publication date: March 6, 2014

Applicant: Nuance Communications, Inc.

Inventors: Sundar Balasubramanian, Michael McSherry, Aaron Sheedy
USING EMOTICONS FOR CONTEXTUAL TEXT-TO-SPEECH EXPRESSIVITY

Publication number: 20140067397

Abstract: Techniques disclosed herein include systems and methods that improve audible emotional characteristics used when synthesizing speech from a text source. Systems and methods herein use emoticons identified from a source text to provide contextual text-to-speech expressivity. In general, techniques herein analyze text and identify emoticons included within the text. The source text is then tagged with corresponding mood indicators. For example, if the system identifies an emoticon at the end of a sentence, then the system can infer that this sentence has a specific tone or mood associated with it. Depending on whether the emoticon is a smiley face, angry face, sad face, laughing face, etc., the system can infer use or mood from the various emoticons and then change or modify the expressivity of the TTS output such as by changing intonation, prosody, speed, pauses, and other expressivity characteristics.

Type: Application

Filed: August 29, 2012

Publication date: March 6, 2014

Applicant: NUANCE COMMUNICATIONS, INC.

Inventor: Carey Radebaugh
EVALUATING TEXT-TO-SPEECH INTELLIGIBILITY USING TEMPLATE CONSTRAINED GENERALIZED POSTERIOR PROBABILITY

Publication number: 20140025381

Abstract: Instead of relying on humans to subjectively evaluate speech intelligibility of a subject, a system objectively evaluates the speech intelligibility. The system receives speech input and calculates confidence scores at multiple different levels using a Template Constrained Generalized Posterior Probability algorithm. One or multiple intelligibility classifiers are utilized to classify the desired entities on an intelligibility scale. A specific intelligibility classifier utilizes features such as the various confidence scores. The scale of the intelligibility classification can be adjusted to suit the application scenario. Based on the confidence score distributions and the intelligibility classification results at multiple levels an overall objective intelligibility score is calculated. The objective intelligibility scores can be used to rank different subjects or systems being assessed according to their intelligibility levels. The speech that is below a predetermined intelligibility (e.g.

Type: Application

Filed: July 20, 2012

Publication date: January 23, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Linfang Wang, Yan Teng, Lijuan Wang, Frank Kao-Ping Soong, Zhe Geng, William Brad Waller, Mark Tillman Hanson
TXTVOICETRANS

Publication number: 20140025366

Abstract: TXTVOICETRANS can pronounce the written word in the same language or in another language. TXTVOICETRANS is a Machine Translation computer system that can translate the source text into another language and, at the same time, pronounce the translated text, word by word, preserving fully the accent and the stress of the spoken word and the intonation of a sequence of words. The pronunciation is based on whole words. The computer system can pronounce the most used synonym of the word or the concept the translated word belongs to, instead of the translated word, displayed with the translation.

Type: Application

Filed: July 20, 2012

Publication date: January 23, 2014

Inventors: Hristo Tzanev Georgiev, Maria Theresia Georgiev(-Good)
SENDER-RESPONSIVE TEXT-TO-SPEECH PROCESSING

Publication number: 20140019135

Abstract: A method of speech synthesis including receiving a text input sent by a sender, processing the text input responsive to at least one distinguishing characteristic of the sender to produce synthesized speech that is representative of a voice of the sender, and communicating the synthesized speech to a recipient user of the system.

Type: Application

Filed: July 16, 2012

Publication date: January 16, 2014

Applicant: GENERAL MOTORS LLC

Inventors: Gaurav Talwar, Xufang Zhao, Ron M. Hecht
METHOD AND APPARATUS FOR PHONETIC CHARACTER CONVERSION

Publication number: 20130238338

Abstract: A method and apparatus for improved approaches for uttering the spelling of words and phrases over a communication session is described. The method includes determining a character to produce a first audio signal representing a phonetic utterance of the character, determining a code word that starts with a code word character identical to the character, and generating a second audio signal representing an utterance of the code word, wherein the first audio signal and the second audio signal are provided over a communication session for detection of the character.

Type: Application

Filed: March 6, 2012

Publication date: September 12, 2013

Applicant: Verizon Patent and Licensing, Inc.

Inventors: Manish G. Kharod, Bhaskar R. Gudlavenkatasiva, Nityanand Sharma, Sutap Chatterjee, Ganesh Bhathivi
SECURE TEXT-TO-SPEECH SYNTHESIS IN PORTABLE ELECTRONIC DEVICES

Publication number: 20130166915

Abstract: A method for secure text-to-speech conversion of text using speech or voice synthesis that prevents the originator's voice from being used or distributed inappropriately or in an unauthorized manner is described. Security controls authenticate the sender of the message, and optionally the recipient, and ensure that the message is read in the originator's voice, not the voice of another person. Such controls permit an originator's voiceprint file to be publicly accessible, but limit its use for voice synthesis to text-based content created by the sender, or sent to a trusted recipient. In this way a person can be assured that their voice cannot be used for content they did not write.

Type: Application

Filed: December 22, 2011

Publication date: June 27, 2013

Applicant: RESEARCH IN MOTION LIMITED

Inventors: Simon Peter DESAI, Neil Patrick ADAMS
SYSTEM AND METHOD FOR LOW-LATENCY WEB-BASED TEXT-TO-SPEECH WITHOUT PLUGINS

Publication number: 20130144624

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for reducing latency in web-browsing TTS systems without the use of a plug-in or Flash® module. A system configured according to the disclosed methods allows the browser to send prosodically meaningful sections of text to a web server. A TTS server then converts intonational phrases of the text into audio and responds to the browser with the audio file. The system saves the audio file in a cache, with the file indexed by a unique identifier. As the system continues converting text into speech, when identical text appears the system uses the cached audio corresponding to the identical text without the need for re-synthesis via the TTS server.

Type: Application

Filed: December 1, 2011

Publication date: June 6, 2013

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Alistair D. CONKIE, Mark Charles Beutnagel, Taniya Mishra
AUDIO INTERFACE

Publication number: 20130132087

Abstract: Methods, systems, and apparatus are generally described for providing an audio interface.

Type: Application

Filed: November 21, 2011

Publication date: May 23, 2013

Applicant: EMPIRE TECHNOLOGY DEVELOPMENT LLC

Inventors: Noriaki Kuwahara, Tsutomu Miyasato, Yasuyuki Sumi
MOBILE VOICE PLATFORM ARCHITECTURE WITH REMOTE SERVICE INTERFACES

Publication number: 20130102295

Abstract: A mobile voice platform for providing a user speech interface to computer-based services includes a mobile device having a processor, communication circuitry that provides access to the computer-based services, an operating system, and one or more applications that are run using the operating system and that utilize one or more of the computer-based services via the communication circuitry.

Type: Application

Filed: September 27, 2012

Publication date: April 25, 2013

Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLC

Inventor: GM Global Technology Operations LLC
Telecare and/or telehealth communication method and system

Publication number: 20130085758

Abstract: A telecare and/or telehealth communication method is described. The method comprises providing predetermined voice messages configured to ask questions to or to give instructions to an assisted individual, providing an algorithm configured to communicate with the assisted individual, and communicating at least one of the predetermined voice messages configured to ask questions to or to give instructions to the assisted individual. The method further comprises analyzing a responsiveness and/or compliance characteristics of the assisted individual, and providing the assisted individual with voice messages in a form most acceptable and effective for the assisted individual on the basis of the analyzed responsiveness and/or the analyzed compliance characteristics.

Type: Application

Filed: September 28, 2012

Publication date: April 4, 2013

Applicant: GENERAL ELECTRIC COMPANY

Inventors: Csenge CSOMA, Akos ERDOS, Alan DAVIES
MARKUP ASSISTANCE APPARATUS, METHOD AND PROGRAM

Publication number: 20130080175

Abstract: According to one embodiment, a markup assistance apparatus includes an acquisition unit, a first calculation unit, a detection unit and a presentation unit. The acquisition unit acquires feature amount for respective tags, each of the tags being used to control text-to-speech processing of a markup text. The first calculation unit calculates, for respective character strings, a variance of feature amounts of the tags which are assigned to the character string in a markup text. The detection unit detects first character string assigned first tag having the variance not less than a first threshold value as a first candidate including the tag to be corrected. The presentation unit presents the first candidate.

Type: Application

Filed: September 24, 2012

Publication date: March 28, 2013

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Kouichirou Mori, Masahiro Morita
VOICE PRONUNCIATION FOR TEXT COMMUNICATION

Publication number: 20130073287

Abstract: A method, computer program product, and system for voice pronunciation for text communication is described. A selected portion of a text communication is determined. A prompt to record a pronunciation relating to the selected portion of the text communication is provided at a first computing device. The recorded pronunciation is associated with the selected portion of the text communication. A visual indicator, relating to the selected portion of the text communication and the recorded pronunciation, is displayed.

Type: Application

Filed: September 20, 2011

Publication date: March 21, 2013

Applicant: International Business Machines Corporation

Inventors: Kristina Beckley, Vincent Burckhardt, Alexis Yao Pang Song, Smriti Talwar
Wireless Server Based Text to Speech Email

Publication number: 20130073288

Abstract: An email system for mobile devices, such as cellular phones and PDAs, is disclosed which allows email messages to be played back on the mobile device as voice messages on demand by way of a media player, thus eliminating the need for a unified messaging system. Email messages are received by the mobile device in a known manner. In accordance with an important aspect of the invention, the email messages are identified by the mobile device as they are received. After the message is identified, the mobile device sends the email message in text format to a server for conversion to speech or voice format. After the message is converted to speech format, the server sends the messages back to the user's mobile device and notifies the user of the email message and then plays the message back to the user through a media player upon demand.

Type: Application

Filed: November 15, 2012

Publication date: March 21, 2013

Applicant: NUANCE COMMUNICATIONS, INC.

Inventor: Nuance Communications, Inc.
SYSTEM AND METHOD FOR ENRICHING TEXT-TO-SPEECH SYNTHESIS WITH AUTOMATIC DIALOG ACT TAGS

Publication number: 20130066632

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for modifying the prosody of synthesized speech based on an associated speech act. A system configured according to the method embodiment (1) receives text, (2) performs an analysis of the text to determine and assign a speech act label to the text, and (3) converts the text to speech, where the speech prosody is based on the speech act label. The analysis performed compares the text to a corpus of previously tagged utterances to find a close match, determines a confidence score from a correlation of the text and the close match, and, if the confidence score is above a threshold value, retrieving the speech act label of the close match and assigning it to the text.

Type: Application

Filed: September 14, 2011

Publication date: March 14, 2013

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Alistair D. Conkie, Srinivas Bangalore, Vivek Kumar Rangarajan Sridhar, Ann K. Syrdal
Image processing apparatus and method for processing image

Patent number: 8391544

Abstract: An image processing apparatus includes: a storage module configured to store a plurality of pieces of comment data; an analyzing module configured to analyze an expression of a person contained in image data; a generating module configured to select a target comment data from among the comment data stored in the storage module based on the expression of the person analyzed by the analyzing module, and to generate voice data using the target comment data; and an output module configured to output reproduction data to be used for displaying the image data together with the voice data generated by the generating module.

Type: Grant

Filed: June 1, 2010

Date of Patent: March 5, 2013

Assignee: Kabushiki Kaisha Toshiba

Inventors: Kousuke Imoji, Yuki Kaneko, Junichi Takahashi
APPARATUS FOR ASSISTING VISUALLY IMPAIRED PERSONS TO IDENTIFY PERSONS AND OBJECTS AND METHOD FOR OPERATION THEREOF

Publication number: 20130046541

Abstract: An apparatus for assisting visually impaired persons includes a headset. A camera is mounted on the headset. A microprocessor communicates with the camera for receiving an optically read code captured by the camera and converting the optically read code to an audio signal as a function of a trigger contained within the optical code. A speaker communicating with the processor outputs the audio signal.

Type: Application

Filed: August 3, 2012

Publication date: February 21, 2013

Inventors: Ronald L. Klein, James A. Kutsch, JR.
SPEECH OUTPUT WITH CONFIDENCE INDICATION

Publication number: 20130041669

Abstract: A method, system, and computer program product are provided for speech output with confidence indication. The method includes receiving a confidence score for segments of speech or text to be synthesized to speech. The method includes modifying a speech segment by altering one or more parameters of the speech proportionally to the confidence score.

Type: Application

Filed: October 17, 2012

Publication date: February 14, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: International Business Machines Corporation
VOICE LEARNING APPARATUS, VOICE LEARNING METHOD, AND STORAGE MEDIUM STORING VOICE LEARNING PROGRAM

Publication number: 20130041668

Abstract: A voice learning apparatus includes a learning-material voice storage unit that stores learning material voice data including example sentence voice data; a learning text storage unit that stores a learning material text including an example sentence text; a learning-material text display controller that displays the learning material text; a learning-material voice output controller that performs voice output based on the learning material voice data; an example sentence specifying unit that specifies the example sentence text during the voice output; an example-sentence voice output controller that performs voice output based on the example sentence voice data associated with the specified example sentence text; and a learning-material voice output restart unit that restarts the voice output from a position where the voice output is stopped last time, after the voice output is performed based on the example sentence voice data.

Type: Application

Filed: August 7, 2012

Publication date: February 14, 2013

Applicant: Casio Computer Co., Ltd

Inventor: Daisuke NAKAJIMA
FRUGAL METHOD AND SYSTEM FOR CREATING SPEECH CORPUS

Publication number: 20130030810

Abstract: The present invention provides a frugal method for extraction of speech data and associated transcription from plurality of web resources (internet) for speech corpus creation characterized by an automation of the speech corpus creation and cost reduction. An integration of existing speech corpus with extracted speech data and its transcription from the web resources to build an aggregated rich speech corpus that are effective and easy to adapt for generating acoustic and language models for (Automatic Speech Recognition) ASR systems.

Type: Application

Filed: June 26, 2012

Publication date: January 31, 2013

Applicant: Tata Consultancy Services Limited

Inventors: Sunil Kumar Kopparapu, Imran Ahmed Sheikh
DYNAMICALLY EXTENDING THE SPEECH PROMPTS OF A MULTIMODAL APPLICATION

Publication number: 20130018658

Abstract: A prompt generation engine operates to dynamically extend prompts of a multimodal application. The prompt generation engine receives a media file having a metadata container. The prompt generation engine operates on a multimodal device that supports a voice mode and a non-voice mode for interacting with the multimodal device. The prompt generation engine retrieves from the metadata container a speech prompt related to content stored in the media file for inclusion in the multimodal application. The prompt generation engine modifies the multimodal application to include the speech prompt.

Type: Application

Filed: September 12, 2012

Publication date: January 17, 2013

Applicant: International Business Machiness Corporation

Inventors: Ciprian Agapi, William K. Bodin, Charles W. Cross, JR.
SYSTEM AND METHOD FOR PROVIDING NETWORK COORDINATED CONVERSATIONAL SERVICES

Publication number: 20130006620

Abstract: A system and method for providing automatic and coordinated sharing of conversational resources, e.g., functions and arguments, between network-connected servers and devices and their corresponding applications. In one aspect, a system for providing automatic and coordinated sharing of conversational resources includes a network having a first and second network device, the first and second network device each comprising a set of conversational resources, a dialog manager for managing a conversation and executing calls requesting a conversational service, and a communication stack for communicating messages over the network using conversational protocols, wherein the conversational protocols establish coordinated network communication between the dialog managers of the first and second network device to automatically share the set of conversational resources of the first and second network device, when necessary, to perform their respective requested conversational service.

Type: Application

Filed: September 11, 2012

Publication date: January 3, 2013

Applicant: Nuance Communications, Inc.

Inventors: Stephane H. Maes, Ponani Gopalakrishnan
AUTOMATED METHOD AND SYSTEM FOR OBTAINING USER-SELECTED REAL-TIME INFORMATION ON A MOBILE COMMUNICATION DEVICE

Publication number: 20120330668

Abstract: A customized live tile application module can be configured in association with the mobile communication device in order to automatically vocalize the real-time information preselected by a user in a multitude of languages. A text-to-speech application module can be integrated with the customized live tile application module to automatically vocalize the preselected real-time information. The real-time information can be obtained from a tile and/or a website integrated with a remote server and announced after a text to speech conversion process without opening the tile, if the tiles are selected for announcement of information by the device. Such an approach automatically and instantly pushes a vocal alert with respect to the user-selected real-time information on the mobile communication device thereby permitting the user to continue multitasking. Information from tiles can also be rendered on second screens from a mobile device.

Type: Application

Filed: August 15, 2012

Publication date: December 27, 2012

Inventors: Anthony Verna, Luis M. Ortiz
PRESCRIPTION LABEL READER

Publication number: 20120330665

Abstract: A system is configured to read a prescription label and output audio information corresponding to prescription information present on or linked to the prescription label. The system may have knowledge about prescription labels and prescription information, and use this knowledge to present the audio information in a structured form to the user.

Type: Application

Filed: June 4, 2012

Publication date: December 27, 2012

Applicant: Labels That Talk, LTD

Inventor: Kenneth Berkun
METHOD, SYSTEM AND PROCESSOR-READABLE MEDIA FOR AUTOMATICALLY VOCALIZING USER PRE-SELECTED SPORTING EVENT SCORES

Publication number: 20120330666

Abstract: A method and system for vocalizing user-selected sporting event scores. A customized spoken score application module can be configured in association with a device. A real-time score can be preselected by a user from an existing sporting event website for automatically vocalizing the score in a multitude of languages utilizing a speech synthesizer and a translation engine. An existing text-to-speech engine can be integrated with the spoken score application module and controlled by the application module to automatically vocalize the preselected scores listed on the sporting event site. The synthetically-voiced, real-time score can be transmitted to the device at a predetermined time interval. Such an approach automatically and instantly pushes the real time vocal alerts thereby permitting the user to continue multitasking without activating the pre-selected vocal alerts.

Type: Application

Filed: June 6, 2012

Publication date: December 27, 2012

Inventors: Anthony Verna, Luis M. Ortiz
SPEECH SYNTHESIZER, NAVIGATION APPARATUS AND SPEECH SYNTHESIZING METHOD

Publication number: 20120330667

Abstract: Included in a speech synthesizer, a natural language processing unit divides text data, input from a text input unit, into a plurality of components (particularly, words). An importance prediction unit estimates an importance level of each component according to the degree of how much each component contributes to understanding when a listener hears synthesized speech. Then, the speech synthesizer determines a processing load based on the device state when executing synthesis processing and the importance level. Included in the speech synthesizer, a synthesizing control unit and a wave generation unit reduce the processing time for a phoneme with a low importance level by curtailing its processing load (relatively degrading its sound quality), allocate a part of the processing time, made available by this reduction, to the processing time of a phoneme with a high importance level, and generates synthesized speech in which important words are easily audible.

Type: Application

Filed: June 20, 2012

Publication date: December 27, 2012

Inventors: Qinghua Sun, Kenji Nagamatsu, Yusuke Fujita

1 2 3 4 next