Image To Speech Patents (Class 704/260)
  • Patent number: 9305542
    Abstract: A customized live the application module can be configured in association with the mobile communication device in order to automatically vocalize the information preselected by a user in a multitude of languages. A text-to-speech application module can be integrated with the customized live tile application module to automatically vocalize the preselected information. The information can be obtained from a tile and/or a website integrated with a remote server and announced after a text to speech conversion process without opening the tile, if the tiles are selected for announcement of information by the device. The information can be obtained in real-time. Such an approach automatically and instantly pushes a vocal alert with respect to the user-selected information on the mobile communication device thereby permitting the user to continue multitasking. Information from tiles can also be rendered on second screens from a mobile device.
    Type: Grant
    Filed: October 9, 2013
    Date of Patent: April 5, 2016
    Assignee: Verna IP Holdings, LLC
    Inventors: Anthony Verna, Luis M. Ortiz
  • Patent number: 9305286
    Abstract: Methods and systems for model-driven candidate sorting for evaluating digital evaluations are described. In one embodiment, a sorting tool selects a data set of digital evaluation data for sorting. The data set includes candidate for evaluation candidates. The sorting tool analyzes the candidate data for the respective evaluation candidate to identify digital evaluation cues and applies the digital evaluation cues to a prediction model to predict an achievement index for the respective evaluation candidate. The list of evaluation candidates is sorted according the predicted achievement indices and the sorted list is presented to the reviewer in a user interface.
    Type: Grant
    Filed: March 25, 2015
    Date of Patent: April 5, 2016
    Assignee: HireVue, Inc.
    Inventors: Loren Larsen, Benjamin Taylor
  • Patent number: 9292499
    Abstract: The present invention relates to an automatic translation and interpretation apparatus and method. The apparatus includes a speech input unit for receiving a speech signal in a first language. A text input unit receives text in the first language. A sentence recognition unit recognizes a sentence in the first language desired to be translated by extracting speech features from the speech signal received from the speech input unit or measuring a similarity of each word of the text received from the text input unit. A translation unit translates the recognized sentence in the first language into a sentence in a second language. A speech output unit outputs uttered sound of the translated sentence in the second language in speech. A text output unit converts the uttered sound of the translated sentence in the second language into text transcribed in the first language and outputs the text.
    Type: Grant
    Filed: January 22, 2014
    Date of Patent: March 22, 2016
    Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Soo-Jong Lee, Sang-Hun Kim, Jeong-Se Kim, Seung Yun, Min-Kyu Lee, Sang-Kyu Park
  • Patent number: 9286886
    Abstract: Techniques for predicting prosody in speech synthesis may make use of a data set of example text fragments with corresponding aligned spoken audio. To predict prosody for synthesizing an input text, the input text may be compared with the data set of example text fragments to select a best matching sequence of one or more example text fragments, each example text fragment in the sequence being paired with a portion of the input text. The selected example text fragment sequence may be aligned with the input text, e.g., at the word level, such that prosody may be extracted from the audio aligned with the example text fragments, and the extracted prosody may be applied to the synthesis of the input text using the alignment between the input text and the example text fragments.
    Type: Grant
    Filed: January 24, 2011
    Date of Patent: March 15, 2016
    Assignee: Nuance Communications, Inc.
    Inventors: Stephen Minnis, Andrew P. Breen
  • Patent number: 9280967
    Abstract: According to one embodiment, an apparatus for supporting reading of a document includes a model storage unit, a document acquisition unit, a feature information extraction, and an utterance style estimation unit. The model storage unit is configured to store a model which has trained a correspondence relationship between first feature information and an utterance style. The first feature information is extracted from a plurality of sentences in a training document. The document acquisition unit is configured to acquire a document to be read. The feature information extraction unit is configured to extract second feature information from each sentence in the document to be read. The utterance style estimation unit is configured to compare the second feature information of a plurality of sentences in the document to be read with the model, and to estimate an utterance style of the each sentence of the document to be read.
    Type: Grant
    Filed: September 14, 2011
    Date of Patent: March 8, 2016
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Kosei Fume, Masaru Suzuki, Masahiro Morita, Kentaro Tachibana, Kouichirou Mori, Yuji Shimizu, Takehiko Kagoshima, Masatsune Tamura, Tomohiro Yamasaki
  • Patent number: 9274748
    Abstract: A contextual input device includes a plurality of tactually discernable keys disposed in a predetermined configuration which replicates a particular relationship among a plurality of items associated with a known physical object. The tactually discernable keys are typically labeled with Braille type. The known physical object is typically a collection of related items grouped together by some common relationship. A computer-implemented process determines whether an input signal represents a selection of an item from among a plurality of items or an attribute pertaining to an item among the plurality of items. Once the selected item or attribute pertaining to an item is determined, the computer-implemented process transforms a user's selection from the input signal into an analog audio signal which is then audibly output as human speech with an electro-acoustic transducer.
    Type: Grant
    Filed: July 3, 2013
    Date of Patent: March 1, 2016
    Assignee: Cal Poly Corporation
    Inventors: Dennis Fantin, C. Arthur MacCarley
  • Patent number: 9270916
    Abstract: A method for improving quality of video beaming of any content by a beaming application that is running on a computerized mobile device is provided herein. The method comprising the steps of: (iii) selecting content for beaming; (ii) beaming by utilizing a beaming application; (iii) identifying in real time a pattern change in beamed video, wherein the pattern change signifies transition from a previous rate of a predefined size of chunk of delivered data to a higher rate of currently delivered data in content displaying pattern by the application which starts a set of critical frames; (iv) performing quality improvement of the video beaming of the set of critical frames based on the identified pattern changes; and (v) identifying in real time a second pattern change in display video to a lower delivered data rate which ends the set of critical frames.
    Type: Grant
    Filed: February 8, 2013
    Date of Patent: February 23, 2016
    Assignee: SCREENOVATE TECHNOLOGIES LTD.
    Inventor: Matan Shapira
  • Patent number: 9269346
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.
    Type: Grant
    Filed: February 16, 2015
    Date of Patent: February 23, 2016
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 9251717
    Abstract: A method and device for providing a language system. The method includes displaying a home screen including a first plurality of user selectable communication keys, wherein each communication key represents a first word. In response to receiving a first indication of a user selection of a first key, it is determined whether the first key is linked to a secondary screen. If the first key is not linked to a secondary screen, an audible signal representing the first word is output. Otherwise, a secondary screen is displayed including a second plurality of communication keys that are related to the first word and a communication key that represents the first word. The device includes appropriate hardware for performing the method. In both the method and device, the language system is configured to output an audible output of a user selected word after no more than two user selections.
    Type: Grant
    Filed: November 12, 2012
    Date of Patent: February 2, 2016
    Inventors: Heidi LoStracco, Renee Collender
  • Patent number: 9251791
    Abstract: A computer-implemented input-method editor process includes receiving a request from a user for an application-independent input method editor having written and spoken input capabilities, identifying that the user is about to provide spoken input to the application-independent input method editor, and receiving a spoken input from the user. The spoken input corresponds to input to an application and is converted to text that represents the spoken input. The text is provided as input to the application.
    Type: Grant
    Filed: June 9, 2014
    Date of Patent: February 2, 2016
    Assignee: Google Inc.
    Inventors: Brandon M. Ballinger, Johan Schalkwyk, Michael H. Cohen, William J. Byrne, Gudmundur Hafsteinsson, Michael J. LeBeau
  • Patent number: 9240180
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for reducing latency in web-browsing TTS systems without the use of a plug-in or Flash® module. A system configured according to the disclosed methods allows the browser to send prosodically meaningful sections of text to a web server. A TTS server then converts intonational phrases of the text into audio and responds to the browser with the audio file. The system saves the audio file in a cache, with the file indexed by a unique identifier. As the system continues converting text into speech, when identical text appears the system uses the cached audio corresponding to the identical text without the need for re-synthesis via the TTS server.
    Type: Grant
    Filed: December 1, 2011
    Date of Patent: January 19, 2016
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Alistair D. Conkie, Mark Charles Beutnagel, Taniya Mishra
  • Patent number: 9240177
    Abstract: A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice.
    Type: Grant
    Filed: March 4, 2014
    Date of Patent: January 19, 2016
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Srinivas Bangalore, Junlan Feng, Mazin G. Rahim, Juergen Schroeter, Ann K. Syrdal, David Schulz
  • Patent number: 9229635
    Abstract: A wireless handheld device able to accept text input, including a display screen and a virtual keypad having a plurality of keys displayed on the display screen, wherein a first key of the virtual keypad is operable to display a first character associated with the first key in a text passage shown on the display screen in response to a first contact of a pointer with a first area on the display screen corresponding to the first key, the first contact including the pointer contacting and moving from the first area along a first direction while the pointer is in continual contact with the display screen, and wherein the first key of the virtual keypad is also operable to display a second character associated with the first key in the text passage shown on the display screen in response to a second contact of the pointer with the first area the display screen corresponding to the first key, the second contact including the pointer contacting and moving fro the first area along a second direction while the pointe
    Type: Grant
    Filed: September 16, 2008
    Date of Patent: January 5, 2016
    Assignee: Creative Technology Ltd
    Inventor: Wong Hoo Sim
  • Patent number: 9214154
    Abstract: A personalized text-to-speech (pTTS) system provides a method for converting text data to speech data utilizing a pTTS template representing the voice characteristics of an individual. A memory stores executable program code that converts text data to speech data. Text data represents a textual message directed to a system user and speech data represents a spoken form of text data having the characteristics of an individual's voice. A processor executes the program code, and a storage device stores a pTTS template and may store speech data. The pTTS system can be used to provide various services that provide immediate spoken presentation of the speech data converted from text data and/or combine stored speech data with generated speech data for spoken presentation.
    Type: Grant
    Filed: December 10, 2014
    Date of Patent: December 15, 2015
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Edmund Gale Acker, Frederick Murray Burg
  • Patent number: 9183849
    Abstract: System, apparatus and method for determining semantic information from audio, where incoming audio is sampled and processed to extract audio features, including temporal, spectral, harmonic and rhythmic features. The extracted audio features are compared to stored audio templates that include ranges and/or values for certain features and are tagged for specific ranges and/or values. The semantic information may be associated with audio signature data Extracted audio features that are most similar to one or more templates from the comparison are identified according to the tagged information. The tags are used to determine the semantic audio data that includes genre, instrumentation, style, acoustical dynamics, and emotive descriptor for the audio signal.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: November 10, 2015
    Assignee: The Nielsen Company (US), LLC
    Inventors: Alan Neuhauser, John Stavropoulos
  • Patent number: 9183831
    Abstract: A digital work of literature is vocalized using enhanced text-to-speech (TTS) controls by analyzing a digital work of literature using natural language processing to identify speaking character voice characteristics associated with context of each quote as extracted from the first work of literature; converting the character voice characteristics to audio metadata to control text-to-speech audio synthesis for each quote; transforming the audio metadata into text-to-speech engine commands, each quote being associated with audio synthesis control parameters for the TTS in the context of each the quotes in the work of literature; and inputting the commands to a text-to-speech engine to cause vocalization of the work of literature according to the words of each quote, character voice characteristics of corresponding to each quote, and context corresponding to each quote.
    Type: Grant
    Filed: March 27, 2014
    Date of Patent: November 10, 2015
    Assignee: International Business Machines Corporation
    Inventors: Donna Karen Byron, Alexander Pikovsky, Eric Woods
  • Patent number: 9172808
    Abstract: Systems and methods for authenticating callers are disclosed that may obtain identifying data and a voice print, and compare the voice print to one or more stored voice prints. Further, one or more initial scores may be calculated based on the data and the comparison, and a confidence interval score may also be calculated. The systems and methods may determine whether to authenticate based on the one or more initial scores and the confidence interval score.
    Type: Grant
    Filed: February 24, 2014
    Date of Patent: October 27, 2015
    Assignee: Verint Americas Inc.
    Inventors: Torsten Zeppenfeld, Joseph James Schmid, Manish Brijkishor Sharma, Lisa Marie Guerra, Richard Gutierrez, Mark Andrew Lazar, Vipul Niranjan Vyas
  • Patent number: 9167038
    Abstract: Method, systems, and computer-readable mediums for securing uploaded content are presented. User content can be uploaded from a user device to a computer system, where the user content is dissected into a number of content portions. After dissection, the computer system can transmit each of the content portions to a corresponding storage server. The storage servers may be independent from each other, independent from the user device, and independent from the computer system itself. Any portion of the user content can then be removed from the computer system, such that the computer system does not own or store the user content, and such that that no single entity in the system can compromise the user content. In some cases, the storage servers can be operated by non-profit entities that are not privately owned.
    Type: Grant
    Filed: December 18, 2012
    Date of Patent: October 20, 2015
    Inventor: Arash Esmailzadeh
  • Patent number: 9159314
    Abstract: In a text-to-speech (TTS) system, a database including sample speech units for unit selection may be configured for use by a local device. The local unit database may be created from a more comprehensive unit database. The local unit database may include units which provide sufficient TTS results for frequently input text. Speech synthesis may then be performed by concatenating locally available units with units from a remote device including the comprehensive unit database. Aspects of the speech synthesis may be performed by the remote device and/or the local device.
    Type: Grant
    Filed: January 14, 2013
    Date of Patent: October 13, 2015
    Assignee: AMAZON TECHNOLOGIES, INC.
    Inventors: Lukasz M. Osowski, Michal T. Kaszczuk
  • Patent number: 9158759
    Abstract: The invention provides a computer system for interacting with a user. A set of concepts initially forms a target set of concepts. An input module receives a language input from the user. An analysis system executes a plurality of narrowing cycles until a concept packet having at least one concept has been identified. Each narrowing cycle includes identifying at least one portion of the language and determining a subset of concepts from the target set of concepts to form a new target subset. An action item identifier identifies an action item from the action items based on the concept packet. An action executer that executes an action based on the action item that has been identified.
    Type: Grant
    Filed: November 21, 2012
    Date of Patent: October 13, 2015
    Assignee: Zero Labs, Inc.
    Inventors: Rajesh Pradhan, Amit Pradhan
  • Patent number: 9159313
    Abstract: A playback control apparatus includes a playback controller configured to control playback of first content and second content. The first content is to output first sound which is generated based on text information using speech synthesis processing. The second content is to output second sound which is generated not using the speech synthesis processing. The playback controller causes an attribute of content to be played back to be displayed on the screen, the attribute indicating whether or not the content is to output sound which is generated based on text information using speech synthesis processing.
    Type: Grant
    Filed: November 28, 2012
    Date of Patent: October 13, 2015
    Assignee: SONY CORPORATION
    Inventors: Takaaki Saeki, Yukiyoshi Hirose
  • Patent number: 9147393
    Abstract: Speech is modeled as a cognitively-driven sensory-motor activity where the form of speech is the result of categorization processes that any given subject recreates by focusing on creating sound patterns that are represented by syllables. These syllables are then combined in characteristic patterns to form words, which are in turn, combined in characteristic patterns to form utterances. A speech recognition process first identifies syllables in an electronic waveform representing ongoing speech. The pattern of syllables is then deconstructed into a standard form that is used to identify words. The words are then concatenated to identify an utterance. Similarly, a speech synthesis process converts written words into patterns of syllables. The pattern of syllables is then processed to produce the characteristic rhythmic sound of naturally spoken words. The words are then assembled into an utterance which is also processed to produce a natural sounding speech.
    Type: Grant
    Filed: February 15, 2013
    Date of Patent: September 29, 2015
    Inventor: Boris Fridman-Mintz
  • Patent number: 9141867
    Abstract: Some examples include segmenting text of a content item to include a plurality of segments or words. For instance, a module for segmenting a content item using a context-based segmenter into a plurality of segments, identifying segment boundary hints stored in the content item, and adjusting segments of the plurality of segments based on the identified segment boundary hints. Some additional examples include inserting segment boundary hints into a content item. For instance a module that segments the content item using a first segmenter and a second segmenter and inserting segment boundary hints into the content item where the results of the first and second segmenter differ.
    Type: Grant
    Filed: December 6, 2012
    Date of Patent: September 22, 2015
    Assignee: Amazon Technologies, Inc.
    Inventors: Shinobu Matsuzuka, Patrick J. Stammerjohn, Venkata Krishnan Ramamoorthy, Christopher A. Suver, Lokesh Joshi, Robert Wai-Chi Chu
  • Patent number: 9141606
    Abstract: Systems and methods for multi-engine machine translations are disclosed. Exemplary methods and systems involve normalizing and/or tokenizing a source string using user-specific translation data. The user-specific translation data may include glossary data, translation memory data, and rule data for use in customizing translations and sequestering sensitive data during the translation process. The disclosed methods and systems also involve using one or more machine translation engines to obtain a translation of the normalized and/or tokenized source string.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: September 22, 2015
    Assignee: Lionbridge Technologies, Inc.
    Inventors: James Peter Marciano, Dean Scott Blodgett
  • Patent number: 9131207
    Abstract: There is provided a video recording apparatus including a content accumulation part accumulating video content, a feature extraction processing part extracting an image or voice as a feature from the video content accumulated by the content accumulation part, and obtaining word information from the extracted image or the extracted voice, a word information acquisition part acquiring sorted word information obtained using clustering processing on word information identified from an image captured by a camera, and a content retrieval part retrieving relevant video content from the video content accumulated by the content accumulation part based on the sorted word information acquired by the word information acquisition part and the word information acquired by the feature extraction processing part.
    Type: Grant
    Filed: June 18, 2013
    Date of Patent: September 8, 2015
    Assignee: Sony Corporation
    Inventors: Tsuyoshi Takagi, Noboru Murabayashi
  • Patent number: 9129596
    Abstract: Apparatus for creating a dictionary for speech synthesis includes a sentence storage unit configured to store N sentences, a sentence display unit configured to selectively display a first sentence which is one of the N sentences, a recording unit configured to record each user speech, a necessity determination unit configured to make a determination of whether to create the dictionary, a dictionary creation unit configured to create the dictionary by utilizing the user speech, and a speech synthesis unit configured to convert a second sentence to a synthesized speech with the dictionary. The display unit is configured to stop displaying the currently displayed sentence according to an evaluation of a quality of its synthesis. The determination unit makes the determination under a condition that the recording unit records the user speech of M first sentences (M is less than N) and the determination is based on at least one of an instruction from the user, M and an amount of the recorded user speech.
    Type: Grant
    Filed: June 28, 2012
    Date of Patent: September 8, 2015
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Kentaro Tachibana, Masahiro Morita, Takehiko Kagoshima
  • Patent number: 9129609
    Abstract: A speech speed conversion factor determining device has a physical index calculation unit including a sound/silence judgment unit that distinguishes between sound and silent intervals of an input signal, a fundamental frequency calculation unit that calculates a fundamental frequency of the signal in the sound intervals and determines stable and unstable intervals, a frequency smoothing unit that smoothes the fundamental frequency in the stable intervals, a pseudo fundamental frequency calculation unit that calculates, for the intervals, a pseudo fundamental frequency by interpolation , and a fundamental frequency general shape connection unit that connects the smoothed and pseudo frequencies to obtain sampled values of a general shape of the frequency, such that the sampled values are output as an index, based on which conversion factor are calculated.
    Type: Grant
    Filed: January 27, 2012
    Date of Patent: September 8, 2015
    Assignee: NIPPON HOSO KYOKAI
    Inventors: Tohru Takagi, Atsushi Imai, Nobumasa Seiyama, Reiko Saitou
  • Patent number: 9131062
    Abstract: A mobile terminal device able to automatically set suitable field break positions in accordance with the situation, able to realize a skip operation and back skip operation by specific operations, able to efficiently utilize a readout function, and able to improve convenience to a user is provided. It has an operation unit 19 for instructing a readout function, a memory 12 storing text, a text-to-speech unit 20 for converting text data stored in the memory 12 to speech data at the time of readout, an audio output unit 21 for outputting the speech data, and a control unit 26 for recognizing predetermined breaks from the text to be read out when outputting the speech data at the audio output unit 21 and performing control so as to output the words from either a break position before or after the readout target text at the point of time of the input of instruction as the speech data by the audio output unit when there is a predetermined instruction by the operation unit 19.
    Type: Grant
    Filed: June 29, 2005
    Date of Patent: September 8, 2015
    Assignee: KYOCERA Corporation
    Inventors: Tetsushi Wakasa, Takashi Kobiki, Kiyofumi Miyake
  • Patent number: 9111457
    Abstract: A method, computer program product, and system for voice pronunciation for text communication is described. A selected portion of a text communication is determined. A prompt to record a pronunciation relating to the selected portion of the text communication is provided at a first computing device. The recorded pronunciation is associated with the selected portion of the text communication. A visual indicator, relating to the selected portion of the text communication and the recorded pronunciation, is displayed.
    Type: Grant
    Filed: September 20, 2011
    Date of Patent: August 18, 2015
    Assignee: International Business Machines Corporation
    Inventors: Kristina Beckley, Vincent Burckhardt, Alexis Yao Pang Song, Smriti Talwar
  • Patent number: 9105262
    Abstract: Architecture for playing a document converted into an audio format to a user of an audio-output capable device. The user can interact with the device to control play of the audio document such as pause, rewind, forward, etc. In more robust implementation, the audio-output capable device is a mobile device (e.g., cell phone) having a microphone for processing voice input. Voice commands can then be input to control play (“reading”) of the document audio file to pause, rewind, read paragraph, read next chapter, fast forward, etc. A communications server (e.g., email, attachments to email, etc.) transcodes text-based document content into an audio format by leveraging a text-to-speech (TTS) engine. The transcoded audio files are then transferred to mobile devices through viable transmission channels. Users can then play the audio-formatted document while freeing hand and eye usage for other tasks.
    Type: Grant
    Filed: January 9, 2012
    Date of Patent: August 11, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Sheng-Yao Shih, Yun-Chiang Kung, Chiwei Che, Chih-Chung Wang
  • Patent number: 9100457
    Abstract: Method and apparatus for framing in a wireless transmission system supporting broadcast transmissions. A framing format incorporates fields specific to a uni-directional transmission and reduces the overhead of the system. One embodiment employs a version of HDLC having a start of frame field and an error checking mechanism attached to the payload of each frame, wherein protocol information is not transmitted with each individual frame.
    Type: Grant
    Filed: August 20, 2001
    Date of Patent: August 4, 2015
    Assignee: QUALCOMM Incorporated
    Inventor: Raymond T. Hsu
  • Patent number: 9093067
    Abstract: The subject matter of this specification can be implemented in a computer-implemented method that includes receiving utterances and transcripts thereof. The method includes analyzing the utterances and transcripts to determine certain attributes, such as distances between prosodic contours for pairs of utterances. A model can be generated that can be used to estimate a distance between a determined prosodic contour for a received utterance and an unknown prosodic contour for a synthesized utterance when given a distance between attributes for text associated with the received utterance and the synthesized utterance.
    Type: Grant
    Filed: November 26, 2012
    Date of Patent: July 28, 2015
    Assignee: Google Inc.
    Inventors: Martin Jansche, Michael D. Riley, Andrew M. Rosenberg, Terry Tai
  • Patent number: 9092421
    Abstract: An example system and method elicits reviews and opinions from users via an online system or a web crawl. Opinions on topics are processed in real time to determine orientation. Each topic is analyzed sentence by sentence to find a central tendency of user orientation toward a given topic. Automatic topic orientation is used to provide a common comparable rating value between reviewers and potentially other systems on similar topics. Facets of the topics are extracted via a submission/acquisition process to determine the key variables of interest for users.
    Type: Grant
    Filed: October 30, 2009
    Date of Patent: July 28, 2015
    Inventors: Abdur Chowdhury, Gregory Scott Pass, Ajaipal Singh Virdy, Ophir Frieder
  • Patent number: 9087331
    Abstract: A system and method for serving contextually relevant advertisements is provided, including monitoring a media stream to indentify an audio or video asset, extracting and storing corresponding text from the asset, retrieving stored text when the asset is selected by a user, analyzing the text and identifying relevant advertisements that are then displayed to a user as a clickable text next to the playing video or audio asset. In further embodiments, the method may include the steps of retrieving a variable length portion of the text corresponding to the portion of the asset being played by the user, analyzing the portion of the text to identify advertisements relevant to the corresponding portion of the asset, displaying the advertisements during the playback of the portion of the asset, and then repeating the steps until the playback of the whole asset is completed.
    Type: Grant
    Filed: August 29, 2008
    Date of Patent: July 21, 2015
    Assignee: TVEyes Inc.
    Inventors: David J. Ives, David B. Seltzer
  • Patent number: 9087519
    Abstract: Systems and methods are provided for scoring speech. A speech sample is received, where the speech sample is associated with a script. The speech sample is aligned with the script. An event recognition metric of the speech sample is extracted, and locations of prosodic events are detected in the speech sample based on the event recognition metric. The locations of the detected prosodic events are compared with locations of model prosodic events, where the locations of model prosodic events identify expected locations of prosodic events of a fluent, native speaker speaking the script. A prosodic event metric is calculated based on the comparison, and the speech sample is scored using a scoring model based upon the prosodic event metric.
    Type: Grant
    Filed: March 20, 2012
    Date of Patent: July 21, 2015
    Assignee: Educational Testing Service
    Inventors: Klaus Zechner, Xiaoming Xi
  • Patent number: 9087512
    Abstract: A speech synthesis method for an electronic system and a speech synthesis apparatus are provided. In the speech synthesis method, a speech signal file including text content is received. The speech signal file is analyzed to obtain prosodic information of the speech signal file. The text content and the corresponding prosodic information are automatically tagged to obtain a text tag file. A speech synthesis file is obtained by synthesizing a human voice profile and the text tag file.
    Type: Grant
    Filed: January 10, 2013
    Date of Patent: July 21, 2015
    Assignee: ASUSTeK COMPUTER INC.
    Inventors: Yu-Chieh Chen, Chih-Kai Yu, Sung-Shen Wu, Tai-Ming Parng
  • Patent number: 9066046
    Abstract: Apparatus and methods conforming to the present invention comprise a method of controlling playback of an audio signal through analysis of a corresponding close caption signal in conjunction with analysis of the corresponding audio signal. Objection text or other specified text in the close caption signal is identified through comparison with user identified objectionable text. Upon identification of the objectionable text, the audio signal is analyzed to identify the audio portion corresponding to the objectionable text. Upon identification of the audio portion, the audio signal may be controlled to mute the audible objectionable text.
    Type: Grant
    Filed: April 20, 2009
    Date of Patent: June 23, 2015
    Assignee: ClearPlay, Inc.
    Inventors: Matthew T. Jarman, William S. Meisel
  • Publication number: 20150149180
    Abstract: A mobile terminal and a control method of the mobile terminal are provided. The mobile terminal includes: a memory configured to store event information; and a controller configured to retrieve at least one event information entered for the time between specified points from the memory, create a frame screen for displaying the retrieved at least one event information and a notepad for storing at least one keyword extracted from each of the retrieved at least one event information contained in the frame screen, and create a diary by interfacing the frame screen with the notepad.
    Type: Application
    Filed: May 13, 2014
    Publication date: May 28, 2015
    Applicant: LG ELECTRONICS INC.
    Inventor: Junseok LEE
  • Publication number: 20150149178
    Abstract: Systems, methods, and computer-readable storage media for text-to-speech processing having an improved intonation. The system first receives text to be converted to speech, the text having a first segment and a second segment. The system then compares the text to a database of stored utterances, identifying in the database a first utterance corresponding to the first segment and determining an intonation of the first utterance. When the database does not contain a second utterance corresponding to the second segment, the system generates the speech corresponding to the text by combining the first utterance with a generated second utterance corresponding to the second segment, the generated second utterance having the intonation matching, or based on, the first utterance. These actions lead to an improved, smoother, more human-like synthetic speech output from the system.
    Type: Application
    Filed: November 22, 2013
    Publication date: May 28, 2015
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Yeon-Jun KIM, Mark Charles BEUTNAGEL, Alistair D. CONKIE, Taniya MISHRA
  • Publication number: 20150149179
    Abstract: Methods and systems are described herein for generating an audible presentation of a communication received from a remote server. A presentation of a media asset on a user equipment device is generated for a first user. A textual-based communication is received, at the user equipment device from the remote server. The textual-based communication is transmitted to the remote server by a second user and the remote server transmits the textual-based communication to the user equipment device responsive to determining that the second user is on a list of users associated with the first user. An engagement level of the first user with the user equipment device is determined. Responsive to determining that the engagement level does not exceed a threshold value, a presentation of the textual-based communication is generated in audible form.
    Type: Application
    Filed: November 25, 2013
    Publication date: May 28, 2015
    Applicant: United Video Properties, Inc.
    Inventor: William Korbecki
  • Publication number: 20150149181
    Abstract: Method and system for generating audio signals (9) representative of a text (3) to be converted, the method includes the steps of: providing a database (1) of acoustic units, identifying a list of pre-calculated expressions (10), and recording, for each pre-calculated expression, an acoustic frame (7) corresponding to it being pronounced, decomposing, by virtue of correlation calculations, each recorded acoustic frame into a sequenced table (5) including a series of acoustic unit references modulated by amplitude (?(i)A) and temporal (?(i)T) form factors , identifying in the text the pre-calculated expressions and decomposing the rest (12) into phonemes, inserting in place of each pre-calculated expression the corresponding sequenced table, and preparing a concatenation of acoustic units (19) according to the text to be converted.
    Type: Application
    Filed: July 2, 2013
    Publication date: May 28, 2015
    Inventor: Vincent Delahaye
  • Patent number: 9043213
    Abstract: A speech recognition method including the steps of receiving a speech input from a known speaker of a sequence of observations and determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model. The acoustic model has a plurality of model parameters describing probability distributions which relate a word or part thereof to an observation and has been trained using first training data and adapted using second training data to said speaker. The speech recognition method also determines the likelihood of a sequence of observations occurring in a given language using a language model and combines the likelihoods determined by the acoustic model and the language model and outputs a sequence of words identified from said speech input signal. The acoustic model is context based for the speaker, the context based information being contained in the model using a plurality of decision trees and the structure of the decision trees is based on second training data.
    Type: Grant
    Filed: January 26, 2011
    Date of Patent: May 26, 2015
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Byung Ha Chun
  • Publication number: 20150142444
    Abstract: A method includes loading text content into at least one user device; applying at least one reading order to at least one text section of the text content to change a presentation order; converting the at least one text section to an audio output based upon the presentation order; and playing the audio output using the presentation order on the at least one user device.
    Type: Application
    Filed: November 15, 2013
    Publication date: May 21, 2015
    Applicant: International Business Machines Corporation
    Inventors: Gregory Jensen Boss, Andrew R. Jones, Charles Steven Lignafelt, Kevin C. McConnell, John Elbert Moore, JR.
  • Patent number: 9037466
    Abstract: Methods, systems, and computer program products are provided for email administration for rendering email on a digital audio player. Embodiments include retrieving an email message; extracting text from the email message; creating a media file; and storing the extracted text of the email message as metadata associated with the media file. Embodiments may also include storing the media file on a digital audio player and displaying the metadata describing the media file, the metadata containing the extracted text of the email message.
    Type: Grant
    Filed: March 9, 2006
    Date of Patent: May 19, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: William K. Bodin, David Jaramillo, Jerry W. Redman, Derral C. Thorson
  • Patent number: 9037467
    Abstract: A method of complementing a spoken text. The method including receiving text data representative of a natural language text, receiving effect control data including at least one effect control record, each effect control record being associated with a respective location in the natural language text, receiving a stream of audio data, analyzing the stream of audio data for natural language utterances that correlate with the natural language text at a respective one of the locations, and outputting, in response to a determination by the analyzing that a natural language utterance in the stream of audio data correlates with a respective one of the locations, at least one effect control signal based on the effect control record associated with the respective location.
    Type: Grant
    Filed: December 18, 2012
    Date of Patent: May 19, 2015
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Thomas H. Gnech, Steffen Koenig, Oliver Petrik
  • Publication number: 20150134338
    Abstract: Provided are a foreign language learning apparatus and method using a function of reading an input sentence in voice via a Text To Speech (TTS) engine. The foreign language learning apparatus and method correct pronunciation through sentence input. The foreign language learning apparatus includes a sentence input unit for receiving a first sentence from a user; a linked letter detection unit for detecting at least one letter corresponding to at least one linking rule; a linked letter removal unit for removing the letter and generating a second sentence by inserting a linking code; a partial waveform generation unit for generating one or more partial waveforms using the TTS engine; an input waveform generation unit for converting a voice corresponding to the first sentence into an input waveform; and a matching degree calculation unit for calculating a matching degree and a partial matching degree.
    Type: Application
    Filed: November 22, 2013
    Publication date: May 14, 2015
    Applicant: WEAVERSMIND INC.
    Inventor: SUNGEUN JUNG
  • Publication number: 20150134339
    Abstract: A device may determine a representation of text that includes a first linguistic term associated with a first set of speech sounds and a second linguistic term associated with a second set of speech sounds. The device may determine a plurality of joins between the first set and the second set. A given join may be indicative of concatenating a first speech sound from the first set with a second speech sound from the second set. A given local cost of the given join may correspond to a weighted sum of individual cost. A given individual cost may be weighted based on a variability of the given individual cost in the plurality of joins. The device may provide a sequence of speech sounds indicative of a pronunciation of the text based on a minimization of a sum of local costs of adjacent speech sounds in the sequence.
    Type: Application
    Filed: November 22, 2013
    Publication date: May 14, 2015
    Applicant: Google Inc
    Inventors: Ioannis Agiomyrgiannakis, Ibrahim Badr
  • Publication number: 20150127348
    Abstract: Workflows are provided that enable documents to be distributed, assented to, and otherwise interacted with on an aural and/or oral basis. Such workflows can be implemented so as to allow a recipient to receive, understand, and interact with a document using conventional components such as the microphone and speaker provided by a telephone. For instance, in one embodiment a document originator may send a document to a recipient with a request for an electronic signature. The document may include an audio version of the document terms. The recipient can listen to the audio version of the document terms and record an electronic signature that represents assent to such terms. An electronic signature server can record the recipient's electronic signature and incorporate it into the document, such that it forms part of the electronic document just as a traditional handwritten signature forms part of a signed paper document.
    Type: Application
    Filed: November 1, 2013
    Publication date: May 7, 2015
    Applicant: Adobe Systems Incorporated
    Inventor: Benjamin D. Follis
  • Patent number: 9026445
    Abstract: A system and method to allow an author of an instant message to enable and control the production of audible speech to the recipient of the message. The voice of the author of the message is characterized into parameters compatible with a formative or articulative text-to-speech engine such that upon receipt, the receiving client device can generate audible speech signals from the message text according to the characterization of the author's voice. Alternatively, the author can store samples of his or her actual voice in a server so that, upon transmission of a message by the author to a recipient, the server extracts the samples needed only to synthesize the words in the text message, and delivers those to the receiving client device so that they are used by a client-side concatenative text-to-speech engine to generate audible speech signals having a close likeness to the actual voice of the author.
    Type: Grant
    Filed: March 20, 2013
    Date of Patent: May 5, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: Terry Wade Niemeyer, Liliana Orozco
  • Patent number: 9020821
    Abstract: An acquisition unit analyzes a text, and acquires phonemic and prosodic information. An editing unit edits a part of the phonemic and prosodic information. A speech synthesis unit converts the phonemic and prosodic information before editing the part to a first speech waveform, and converts the phonemic and prosodic information after editing the part to a second speech waveform. A period calculation unit calculates a contrast period corresponding to the part in the first speech waveform and the second speech waveform. A speech generation unit generates an output waveform by connecting a first partial waveform and a second partial waveform. The first partial waveform contains the contrast period of the first speech waveform. The second partial waveform contains the contrast period of the second speech waveform.
    Type: Grant
    Filed: September 19, 2011
    Date of Patent: April 28, 2015
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Osamu Nishiyama