Specialized Model Patents (Class 704/266)

STREAMING ENCODER, PROSODY INFORMATION ENCODING DEVICE, PROSODY-ANALYZING DEVICE, AND DEVICE AND METHOD FOR SPEECH SYNTHESIZING

Publication number: 20140222421

Abstract: A speech-synthesizing device includes a hierarchical prosodic module, a prosody-analyzing device, and a prosody-synthesizing unit. The hierarchical prosodic module generates at least a first hierarchical prosodic model. The prosody-analyzing device receives a low-level linguistic feature, a high-level linguistic feature and a first prosodic feature, and generates at least a prosodic tag based on the low-level linguistic feature, the high-level linguistic feature, the first prosodic feature and the first hierarchical prosodic model. The prosody-synthesizing unit synthesizes a second prosodic feature based on the hierarchical prosodic module, the low-level linguistic feature and the prosodic tag.

Type: Application

Filed: January 30, 2014

Publication date: August 7, 2014

Applicant: National Chiao Tung University

Inventors: Sin-Horng Chen, Yih-Ru Wang, Chen-Yu Chiang, Chiao-Hua Hsieh
Methods and apparatuses for facilitating speech synthesis

Patent number: 8781835

Abstract: Methods and apparatuses are provided for facilitating speech synthesis. A method may include generating a plurality of input models representing an input by using a statistical model synthesizer to statistically model the input. The method may further include determining a speech unit sequence representing at least a portion of the input by using the input models to influence selection of one or more pre-recorded speech units having parameter representations. The method may additionally include identifying one or more bad units in the unit sequence. The method may also include replacing the identified one or more bad units with one or more parameters generated by the statistical model synthesizer. Corresponding apparatuses are also provided.

Type: Grant

Filed: May 2, 2011

Date of Patent: July 15, 2014

Assignee: Nokia Corporation

Inventors: Jani Kristian Nurminen, Hanna Margareeta Silen, Elina Helander
Devices and methods for speech unit reduction in text-to-speech synthesis systems

Patent number: 8751236

Abstract: A device may receive a plurality of speech sounds that are indicative of pronunciations of a first linguistic term. The device may determine concatenation features of the plurality of speech sounds. The concatenation features may be indicative of an acoustic transition between a first speech sound and a second speech sound when the first speech sound and the second speech sound are concatenated. The first speech sound may be included in the plurality of speech sounds and the second speech sound may be indicative of a pronunciation of a second linguistic term. The device may cluster the plurality of speech sounds into one or more clusters based on the concatenation features. The device may provide a representative speech sound of the given cluster as the first speech sound when the first speech sound and the second speech sound are concatenated.

Type: Grant

Filed: October 23, 2013

Date of Patent: June 10, 2014

Assignee: Google Inc.

Inventors: Javier Gonzalvo Fructuoso, Alexander Gutkin, Ioannis Agiomyrgiannakis
Method, apparatus and computer program product for providing text independent voice conversion

Patent number: 8751239

Abstract: An apparatus for providing text independent voice conversion may include a first voice conversion model and a second voice conversion model. The first voice conversion model may be trained with respect to conversion of training source speech to synthetic speech corresponding to the training source speech. The second voice conversion model may be trained with respect to conversion to training target speech from synthetic speech corresponding to the training target speech. An output of the first voice conversion model may be communicated to the second voice conversion model to process source speech input into the first voice conversion model into target speech corresponding to the source speech as the output of the second voice conversion model.

Type: Grant

Filed: October 4, 2007

Date of Patent: June 10, 2014

Assignee: Core Wireless Licensing, S.a.r.l.

Inventors: Jilei Tian, Victor Popa, Jani K. Nurminen
Method and system for enhancing a speech database

Patent number: 8744851

Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

Type: Grant

Filed: August 13, 2013

Date of Patent: June 3, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Alistair Conkie, Ann K Syrdal
SYSTEM AND METHOD FOR VOICE TRANSFORMATION

Publication number: 20140142946

Abstract: The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary waveforms, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition.

Type: Application

Filed: September 24, 2012

Publication date: May 22, 2014

Inventor: Chengjun Julian Chen
Speech synthesis apparatus and method utilizing acquisition of at least two speech unit waveforms acquired from a continuous memory region by one access

Patent number: 8731933

Abstract: A speech synthesizing apparatus includes a selector configured to select a plurality of speech units for synthesizing a speech of a phoneme sequence by referring to speech unit information stored in an information memory. Speech unit waveforms corresponding to the speech units are acquired from a plurality of speech unit waveforms stored in a waveform memory, and the speech is synthesized by utilizing the speech unit waveforms acquired. When acquiring the speech unit waveforms, at least two speech unit waveforms from a continuous region of the waveform memory are copied onto a buffer by one access, wherein a data quantity of the at least two speech unit waveforms is less than or equal to a size of the buffer.

Type: Grant

Filed: April 10, 2013

Date of Patent: May 20, 2014

Assignee: Kabushiki Kaisha Toshiba

Inventor: Takehiko Kagoshima
System and method for speech synthesis

Patent number: 8719030

Abstract: The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary acoustic waves, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition.

Type: Grant

Filed: December 3, 2012

Date of Patent: May 6, 2014

Inventor: Chengjun Julian Chen
Speech signal restoration device and speech signal restoration method

Patent number: 8706497

Abstract: A synthesis filter 106 synthesizes a plurality of wide-band speech signals by combining wide-band phoneme signals and sound source signals from a speech signal code book 105, and a distortion evaluation unit 107 selects one of the wide-band speech signals with a minimum waveform distortion with respect to an up-sampled narrow-band speech signal output from a sampling conversion unit 101. A first bandpass filter 103 extracts a frequency component outside a narrow-band of the wide-band speech signal and a band synthesis unit 104 combines it with the up-sampled narrow-band speech signal.

Type: Grant

Filed: October 22, 2010

Date of Patent: April 22, 2014

Assignee: Mitsubishi Electric Corporation

Inventors: Satoru Furuta, Hirohisa Tasaki
Text presentation apparatus, text presentation method, and computer program product

Patent number: 8655664

Abstract: According to an embodiment, a text presentation apparatus presenting text for a speaker to read aloud for voice recording includes: a text storing unit for storing first text; a presenting unit for presenting the first text; a determination unit for determining whether or not the first text needs to be replaced, on the basis of a speaker's input for the first text presented; a preliminary text storing unit for storing preliminary text; a select unit configured to select, if it is determined that the first text needs to be replaced, second text to replace the first text from among the preliminary text, the selecting being performed on the basis of attribute information describing an attribute of the first text and on the basis of at least one of attribute information describing pronunciation of the first text and attribute information describing a stress type of the first text; and a control unit configured to control the presenting unit so that the presenting unit presents the second text.

Type: Grant

Filed: August 11, 2011

Date of Patent: February 18, 2014

Assignee: Kabushiki Kaisha Toshiba

Inventors: Kentaro Tachibana, Gou Hirabayashi, Takehiko Kagoshima
Personalized text-to-speech synthesis and personalized speech feature extraction

Patent number: 8655659

Abstract: A personalized text-to-speech synthesizing device includes: a personalized speech feature library creator, configured to recognize personalized speech features of a specific speaker by comparing a random speech fragment of the specific speaker with preset keywords, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker and created by the personalized speech feature library creator, thereby to generate and output a speech fragment having pronunciation characteristics of the specific speaker.

Type: Grant

Filed: August 12, 2010

Date of Patent: February 18, 2014

Assignees: Sony Corporation, Sony Mobile Communications AB

Inventors: Qingfang Wang, Shouchun He
Robot, method and program of correcting a robot voice in accordance with head movement

Patent number: 8639511

Abstract: A robot may include a driving control unit configured to control a driving of a movable unit that is connected movably to a body unit, a voice generating unit configured to generate a voice, and a voice output unit configured to output the voice, which has been generated by the voice generating unit. The voice generating unit may correct the voice, which is generated, based on a bearing of the movable unit, which is controlled by the driving control unit, to the body unit.

Type: Grant

Filed: September 14, 2010

Date of Patent: January 28, 2014

Assignee: Honda Motor Co., Ltd.

Inventors: Kazuhiro Nakadai, Takuma Otsuka, Hiroshi Okuno
Speech synthesizing apparatus, method, and program

Patent number: 8630857

Abstract: Disclosed is a speech synthesizing apparatus including a segment selection unit that selects a segment suited to a target segment environment from candidate segments, includes a prosody change amount calculation unit that calculates prosody change amount of each candidate segment based on prosody information of candidate segments and the target segment environment, a selection criterion calculation unit that calculates a selection criterion based on the prosody change amount, a candidate selection unit that narrows down selection candidates based on the prosody change amount and the selection criterion, and an optimum segment search unit than searches for an optimum segment from among the narrowed-down candidate segments.

Type: Grant

Filed: February 15, 2008

Date of Patent: January 14, 2014

Assignee: NEC Corporation

Inventors: Masanori Kato, Reishi Kondo, Yasuyuki Mitsui
System and method of using Multi Pattern Viterbi Algorithm for joint decoding of multiple patterns

Patent number: 8630971

Abstract: Systems, devices, and methods for using Multi-Pattern Viterbi Algorithm for joint decoding of multiple patterns are disclosed. An exemplary method may receive a plurality of sets of time-sequential signal observations for each of a number K of signal repetitions. Further, each set of signal observations is associated with a respective dimension of a K-dimensional time grid having time-indexed points. Moreover, at each of a plurality of the time-indexed points, a state cost metric is calculated with a processor for each state in a set of states of a hidden Markov model (HMM). In addition, each state in the set of states and for a given time-indexed point, the state cost metric calculation provides a most-likely predecessor state and a corresponding most-likely predecessor time-indexed point. The exemplary method may also determine a sequence of states using the calculated state cost metrics and determine a corresponding cumulative probability measure for the HMM.

Type: Grant

Filed: January 5, 2010

Date of Patent: January 14, 2014

Assignee: Indian Institute of Science

Inventors: Nishanth Ulhas Nair, Thippur Venkatanarasaiah Sreenivas
Method and apparatus for combining text to speech and recorded prompts

Patent number: 8600753

Abstract: An arrangement provides for improved synthesis of speech arising from a message text. The arrangement stores prerecorded prompts and speech related characteristics for those prompts. A message is parsed to determine if any message portions have been recorded previously. If so then speech related characteristics for those portions are retrieved. The arrangement generates speech related characteristics for those parties not previously stored. The retrieved and generated characteristics are combined. The combination of characteristics is then used as the input to a speech synthesizer.

Type: Grant

Filed: December 30, 2005

Date of Patent: December 3, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventor: Alistair Conkie
Speech synthesis with incremental databases of speech waveforms on user terminals over a communications network

Patent number: 8583437

Abstract: Service architecture for providing to a user terminal of a communications network textual information and relative speech synthesis, the user terminal being provided with a speech synthesis engine and a basic database of speech waveforms includes: a content server for downloading textual information requested by means of a browser application on the user terminal; a context manager for extracting context information from the textual information requested by the user terminal; a context selector for selecting an incremental database of speech waveforms associated with extracted context information and for downloading the incremental database into the user terminal; a database manager on the user terminal for managing the composition of an enlarged database of speech waveforms for the speech synthesis engine including the basic and the incremental databases of speech waveforms.

Type: Grant

Filed: May 31, 2005

Date of Patent: November 12, 2013

Assignee: Telecom Italia S.p.A.

Inventors: Alessio Cervone, Ivano Salvatore Collotta, Paolo Coppo, Donato Ettorre, Maurizio Fodrini, Maura Turolla
System and method for enriching spoken language translation with prosodic information

Patent number: 8571849

Abstract: Disclosed herein are systems, methods, and computer readable-media for enriching spoken language translation with prosodic information in a statistical speech translation framework. The method includes receiving speech for translation to a target language, generating pitch accent labels representing segments of the received speech which are prosodically prominent, and injecting pitch accent labels with word tokens within the translation engine to create enriched target language output text. A further step may be added of synthesizing speech in the target language based on the prosody enriched target language output text. An automatic prosody labeler can generate pitch accent labels. An automatic prosody labeler can exploit lexical, syntactic, and prosodic information of the speech. A maximum entropy model may be used to determine which segments of the speech are prosodically prominent.

Type: Grant

Filed: September 30, 2008

Date of Patent: October 29, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Srinivas Bangalore, Vivek Kumar Rangarajan Sridhar
Tabulating triphone sequences by 5-phoneme contexts for speech synthesis

Patent number: 8566099

Abstract: A system and method for improving the response time of text-to-speech synthesis using triphone contexts. The method includes identifying a set of triphone sequences, tabulating the set of triphone sequences using a plurality of contexts, where each context specific triphone sequence of the plurality of context specific triphone sequences has a top N triphone units made of the triphone units having lowest target costs when each triphone unit is individually combined into a 5-phoneme combination. Input texts having one of the contexts are received, and one of the context specific triphone sequences is selected based on the context. Input text is then synthesized using the context specific triphone sequence.

Type: Grant

Filed: July 16, 2012

Date of Patent: October 22, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventor: Alistair D. Conkie
Voice recognition apparatus and recording medium storing voice recognition program

Patent number: 8560317

Abstract: A vocabulary dictionary storing unit for storing a plurality of words in advance, a vocabulary dictionary managing unit for extracting recognition target words, a matching unit for calculating a degree of matching with the recognition target words based on an accepted voice, a result output unit for outputting, as a recognition result, a word having a best score from a result of calculating the degree of matching, and an extraction criterion information managing unit for changing extraction criterion information according to a result of monitoring by a monitor control unit are provided. The vocabulary dictionary storing unit further includes a scale information storing unit for storing scale information serving as a scale at the time of extracting the recognition target words, and an extraction criterion information storing unit for storing extraction criterion information indicating a criterion of the recognition target words at the time of extracting the recognition target words.

Type: Grant

Filed: September 18, 2006

Date of Patent: October 15, 2013

Assignee: Fujitsu Limited

Inventor: Kenji Abe
Training and applying prosody models

Patent number: 8554566

Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.

Type: Grant

Filed: November 29, 2012

Date of Patent: October 8, 2013

Assignee: Morphism LLC

Inventor: James H. Stephens, Jr.
Proactive completion of input fields for automated voice enablement of a web page

Patent number: 8543404

Abstract: Embodiments of the present invention provide a method and computer program product for the proactive completion of input fields for automated voice enablement of a Web page. In an embodiment of the invention, a method for proactively completing empty input fields for voice enabling a Web page can be provided. The method can include receiving speech input for an input field in a Web page and inserting a textual equivalent to the speech input into the input field in a Web page. The method further can include locating an empty input field remaining in the Web page and generating a speech grammar for the input field based upon permitted terms in a core attribute of the empty input field and prompting for speech input for the input field. Finally, the method can include posting the received speech input and the grammar to an automatic speech recognition (ASR) engine and inserting a textual equivalent to the speech input provided by the ASR engine into the empty input field.

Type: Grant

Filed: April 7, 2008

Date of Patent: September 24, 2013

Assignee: Nuance Communications, Inc.

Inventors: Victor S. Moore, Wendi L. Nusbickel
Animation retargeting

Patent number: 8537164

Abstract: Systems and methods are described, which create a mapping from a space of a source object (e.g., source facial expressions) to a space of a target object (e.g., target facial expressions). In certain implementations, the mapping is learned based a training set composed of corresponding shapes (e.g. facial expressions) in each space. The user can create the training set by selecting expressions from, for example, captured source performance data, and by sculpting corresponding target expressions. Additional target shapes (e.g., target facial expressions) can be interpolated and extrapolated from the shapes in the training set to generate corresponding shapes for potential source shapes (e.g., facial expressions).

Type: Grant

Filed: October 10, 2011

Date of Patent: September 17, 2013

Assignee: Lucasfilm Entertainment Company Ltd.

Inventors: Frederic P. Pighin, Cary Phillips, Steve Sullivan
Method and system for enhancing a speech database

Patent number: 8510112

Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, modifying the identified segments in the primary speech database using selected mappings, enhancing the primary speech database by substituting the modified segments for the corresponding identified database segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

Type: Grant

Filed: August 31, 2006

Date of Patent: August 13, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Alistair Conkie, Ann Syrdal
Method and apparatus for transmitting speech data to a remote device in a distributed speech recognition system

Patent number: 8494849

Abstract: A method of transmitting speech data to a remote device in a distributed speech recognition system, includes the steps of: dividing an input speech signal into frames; calculating, for each frame, a voice activity value representative of the presence of speech activity in the frame; grouping the frames into multiframes, each multiframe including a predetermined number of frames; calculating, for each multiframe, a voice activity marker representative of the number of frames in the multiframe representing speech activity; and selectively transmitting, on the basis of the voice activity marker associated with each multiframe, the multiframes to the remote device.

Type: Grant

Filed: June 20, 2005

Date of Patent: July 23, 2013

Assignee: Telecom Italia S.p.A.

Inventors: Ivano Salvatore Collotta, Donato Ettorre, Maurizio Fodrini, Pierluigi Gallo, Roberto Spagnolo
CAPTCHA using challenges optimized for distinguishing between humans and machines

Patent number: 8494854

Abstract: An audible based electronic challenge system is used to control access to a computing resource by using a test to identify an origin of a voice. The test is based on analyzing a spoken utterance using optimized challenge items selected for their discrimination capability to determine if it was articulated by an unauthorized human or a text to speech (TTS) system.

Type: Grant

Filed: June 15, 2009

Date of Patent: July 23, 2013

Assignee: John Nicholas and Kristin Gross

Inventor: John Nicholas Gross
System and method for verifying origin of input through spoken language analysis

Patent number: 8489399

Abstract: An audible based electronic challenge system is used to control access to a computing resource by using a test to identify an origin of a voice. The test is based on analyzing a spoken utterance to determine if it was articulated by an unauthorized human or a text to speech (TTS) system.

Type: Grant

Filed: June 15, 2009

Date of Patent: July 16, 2013

Assignee: John Nicholas and Kristin Gross Trust

Inventor: John Nicholas Gross
Fundamental frequency pattern generation apparatus and fundamental frequency pattern generation method

Patent number: 8478595

Abstract: A fundamental frequency pattern generation apparatus includes a first storage including representative vectors each corresponding to a prosodic control unit and having a section for changing the number of phonemes, a second storage unit including a rule to select a vector corresponding to an input context, a selection unit configured to select a vector from the representative vectors by applying the rule to the context and output the selected vector, a calculation unit configured to calculate an expansion/contraction ratio of the section of the selected vector in a time-axis direction based on a designated value for a specific feature amount related to a length of a fundamental frequency pattern to be generated, the designated value of the feature amount being required of the fundamental frequency pattern to be generated, and an expansion/contraction unit configured to expand/contract the selected vector based on the expansion/contraction ratio to generate the fundamental frequency pattern.

Type: Grant

Filed: September 5, 2008

Date of Patent: July 2, 2013

Assignee: Kabushiki Kaisha Toshiba

Inventor: Nobuaki Mizutani
Speech synthesis apparatus and method wherein more than one speech unit is acquired from continuous memory region by one access

Patent number: 8468020

Abstract: An apparatus for synthesizing a speech including a waveform memory that stores a plurality of speech unit waveforms, an information memory that correspondingly stores speech unit information and an address of each of the speech unit waveforms, a selector that selects a speech unit sequence corresponding to the input phoneme sequence by referring to the speech unit information, a speech unit waveform acquisition unit that acquires a speech unit waveform corresponding to each speech unit of the speech unit sequence from the waveform memory by referring to the address, a speech unit concatenation unit that generates the speech by concatenating the speech unit waveform acquired.

Type: Grant

Filed: May 8, 2007

Date of Patent: June 18, 2013

Assignee: Kabushiki Kaisha Toshiba

Inventor: Takehiko Kagoshima
System for tuning synthesized speech

Patent number: 8438032

Abstract: An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats.

Type: Grant

Filed: January 9, 2007

Date of Patent: May 7, 2013

Assignee: Nuance Communications, Inc.

Inventors: Raimo Bakis, Ellen M. Eide, Roberto Pieraccini, Maria E. Smith, Jie Zeng
Text-to-speech user's voice cooperative server for instant messaging clients

Patent number: 8428952

Abstract: A system and method to allow an author of an instant message to enable and control the production of audible speech to the recipient of the message. The voice of the author of the message is characterized into parameters compatible with a formative or articulative text-to-speech engine such that upon receipt, the receiving client device can generate audible speech signals from the message text according to the characterization of the author's voice. Alternatively, the author can store samples of his or her actual voice in a server so that, upon transmission of a message by the author to a recipient, the server extracts the samples needed only to synthesize the words in the text message, and delivers those to the receiving client device so that they are used by a client-side concatenative text-to-speech engine to generate audible speech signals having a close likeness to the actual voice of the author.

Type: Grant

Filed: June 12, 2012

Date of Patent: April 23, 2013

Assignee: Nuance Communications, Inc.

Inventors: Terry Wade Niemeyer, Liliana Orozco
Automatically training speech synthesizers

Patent number: 8423366

Abstract: A method includes receiving, by a system, a voice recording associated with a user, transcribing, the voice recording into text that includes a group of words, and storing an association between a portion of each respective word and a corresponding portion of the voice recording. The corresponding portion of the voice recording is the portion of the voice recording from which the portion of the respective word was transcribed. The method may also include determining a modification to a speech synthesis voice associated with the user based at least in part on the association.

Type: Grant

Filed: July 18, 2012

Date of Patent: April 16, 2013

Assignee: Google Inc.

Inventors: Marcus Alexander Foster, Richard Zarek Cohen
Method and system for enhancing verbal communication sessions

Patent number: 8412529

Abstract: An approach is provided for enhancing verbal communication sessions. A verbal component of a communication session is converted into textual information. The converted textual information is scanned for a text string to trigger an application. The application is invoked to provide supplemental information about the textual information or to perform an action in response to the textual information for or on behalf of a party of the communication session. The supplemental information or a confirmation of the action is transmitted to the party.

Type: Grant

Filed: October 29, 2008

Date of Patent: April 2, 2013

Assignee: Verizon Patent and Licensing Inc.

Inventors: Martin W. McKee, Paul T. Schultz, Robert A. Sartini
Speech synthesis device, speech synthesis method, and speech synthesis program

Patent number: 8407054

Abstract: A speech synthesis device is provided with: a central segment selection unit for selecting a central segment from among a plurality of speech segments; a prosody generation unit for generating prosody information based on the central segment; a non-central segment selection unit for selecting a non-central segment, which is a segment outside of a central segment section, based on the central segment and the prosody information; and a waveform generation unit for generating a synthesized speech waveform based on the prosody information, the central segment, and the non-central segment. The speech synthesis device first selects a central segment that forms a basis for prosody generation and generates prosody information based on the central segment so that it is possible to sufficiently reduce both concatenation distortion and sound quality degradation accompanying prosody control in the section of the central segment.

Type: Grant

Filed: April 28, 2008

Date of Patent: March 26, 2013

Assignee: NEC Corporation

Inventors: Masanori Kato, Yasuyuki Mitsui, Reishi Kondo
Training and applying prosody models

Patent number: 8374873

Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.

Type: Grant

Filed: August 11, 2009

Date of Patent: February 12, 2013

Assignee: Morphism, LLC

Inventor: James H. Stephens, Jr.
Speech synthesis system, speech synthesis program product, and speech synthesis method

Patent number: 8370149

Abstract: Waveform concatenation speech synthesis with high sound quality. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. An accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values.

Type: Grant

Filed: August 15, 2008

Date of Patent: February 5, 2013

Assignee: Nuance Communications, Inc.

Inventors: Ryuki Tachibana, Masafumi Nishimura
Systems and methods for text to speech synthesis

Patent number: 8352272

Abstract: Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.

Type: Grant

Filed: September 29, 2008

Date of Patent: January 8, 2013

Assignee: Apple Inc.

Inventors: Matthew Rogers, Kim Silverman, Devang Naik, Kevin Lenzo, Benjamin Rottler
Rich context modeling for text-to-speech engines

Patent number: 8340965

Abstract: Embodiments of rich context modeling for speech synthesis are disclosed. In operation, a text-to-speech engine refines a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models. The text-to-speech engine then generates synthesized speech for an input text based at least on some of the plurality of refined rich context models.

Type: Grant

Filed: December 2, 2009

Date of Patent: December 25, 2012

Assignee: Microsoft Corporation

Inventors: Zhi-Jie Yan, Yao Qian, Frank Kao-Ping Soong
Methods and apparatus for rapid acoustic unit selection from a large speech corpus

Patent number: 8315872

Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice.

Type: Grant

Filed: November 29, 2011

Date of Patent: November 20, 2012

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Mark Charles Beutnagel, Mehryar Mohri, Michael Dennis Riley
Hidden Markov model based text to speech systems employing rope-jumping algorithm

Patent number: 8315871

Abstract: A rope-jumping algorithm is employed in a Hidden Markov Model based text to speech system to determine start and end models and to modify the start and end models by setting small co-variances. Disordered acoustic parameters due to violation of parameter constraints are avoided through the modification and result in stable line frequency spectrum for the generated speech.

Type: Grant

Filed: June 4, 2009

Date of Patent: November 20, 2012

Assignee: Microsoft Corporation

Inventors: Wenlin Wang, Guoliang Zhang, Jingyang Xu
Determining text to speech pronunciation based on an utterance from a user

Patent number: 8275621

Abstract: Systems and methods are provided for automatically building a native phonetic lexicon for a speech-based application trained to process a native (base) language, wherein the native phonetic lexicon includes native phonetic transcriptions (base forms) for non-native (foreign) words which are automatically derived from non-native phonetic transcriptions of the non-native words.

Type: Grant

Filed: May 18, 2011

Date of Patent: September 25, 2012

Assignee: Nuance Communications, Inc.

Inventors: Neal J. Alewine, Eric William Janke, Paul Sharp, Robert Sicconi
Generating a web podcast interview by selecting interview voices through text-to-speech synthesis

Patent number: 8255221

Abstract: Disclosed is a system and method for generating a web podcast interview that allows a single user to create his own multi-voices interview from his computer. The method allows the user to enter a set of questions from a text file using a text editor. (Answers may also be entered from a text file although this is not the more preferred embodiment.) For each question, the user may select one particular interviewer voice among a plurality of predefined interviewer voices, and by using a text-to-speech module in a text-to-speech server, each question is converted into an audio question having the selected interviewer voice. Then, the user preferably records answers to each audio question using a telephone. And a questions/answers sequence in a podcast compliant format is generated.

Type: Grant

Filed: December 1, 2008

Date of Patent: August 28, 2012

Assignee: International Business Machines Corporation

Inventors: Steve Groeger, Brian Heasman, Christopher von Koschembahr, Yuk-Lun Wong
Lexical correction of erroneous text by transformation into a voice message

Patent number: 8249869

Abstract: The method is suitable for dysorthographic or partially sighted persons, to facilitate the semantic, syntactic and/or lexical correction of an erroneous expression in a digital text input by a user. The method comprises the sequence of: a step (74) of transforming the digital text into a digital voice message, in which the graphemes of the erroneous textual expression are converted into phoneme(s) constituting an intelligible vocal expression, then a step (78) of processing the digital voice message obtained at the end of the transformation step (74), in which the phoneme or phonemes constituting the intelligible vocal expression are converted into grapheme(s) constituting a corrected textual expression of the erroneous textual expression, with the aid of pre-established writing rules.

Type: Grant

Filed: June 15, 2007

Date of Patent: August 21, 2012

Assignee: Logolexie

Inventors: Gilles Vessiere, Joël Bachelerie
Synthesizing speech from text

Patent number: 8249874

Abstract: Speech is synthesized for a given text by determining a sequence of phonetic components based on the text, determining a sequence of target phonetic elements associated phonetic components, determining a sequence of target event types associated with the phonetic components and determining a sequence of speech units from a plurality of stored speech unit candidates by use of a cost function. The cost function comprises a unit cost, a concatenation cost, and an event type cost for each speech unit in the sequence of speech units. The unit cost of a speech unit is determined with respect to the corresponding target phonetic element, while the concatenation cost of a speech unit is determined with respect to adjacent speech units and the event type cost of each speech unit is determined with respect to the corresponding target event type.

Type: Grant

Filed: February 25, 2008

Date of Patent: August 21, 2012

Assignee: Nuance Communications, Inc.

Inventors: Gregor Moehler, Andreas Zehnpfenning
Class detection scheme and time mediated averaging of class dependent models

Patent number: 8229744

Abstract: A method, system, and computer program for class detection and time mediated averaging of class dependent models. A technique is described to take advantage of gender information in training data and how obtain female, male, and gender independent models from this information. By using a probability value to average male and female Gaussian Mixture Models (GMMs), dramatic deterioration in cross gender decoding performance is avoided.

Type: Grant

Filed: August 26, 2003

Date of Patent: July 24, 2012

Assignee: Nuance Communications, Inc.

Inventors: Satyanarayana Dharanipragada, Peder A. Olsen
Method and system for preselection of suitable units for concatenative speech

Patent number: 8224645

Abstract: A system and method for improving the response time of text-to-speech synthesis using triphone contexts. The method includes receiving input text, selecting a plurality of N phoneme units from a triphone unit selection database as candidate phonemes for synthesized speech based on the input text, wherein the triphone unit selection database comprises triphone units each comprising three phones and if the candidate phonemes are available in the triphone unit selection database, applying a cost process to select a set of phonemes from the candidate phonemes. If no candidate phonemes are available in the triphone unit selection database, the method includes applying a single phoneme approach to select single phonemes for synthesis, which single phonemes are used in synthesis independent of a triphone structure. The method also includes synthesizing speech using at least one of the set of phonemes from the candidate phonemes and the selected single phonemes for synthesis from the single phoneme approach.

Type: Grant

Filed: December 1, 2008

Date of Patent: July 17, 2012

Assignee: AT+T Intellectual Property II, L.P.

Inventor: Alistair D. Conkie
Text-to-speech user's voice cooperative server for instant messaging clients

Patent number: 8224647

Abstract: A system and method to allow an author of an instant message to enable and control the production of audible speech to the recipient of the message. The voice of the author of the message is characterized into parameters compatible with a formative or articulative text-to-speech engine such that upon receipt, the receiving client device can generate audible speech signals from the message text according to the characterization of the author's voice. Alternatively, the author can store samples of his or her actual voice in a server so that, upon transmission of a message by the author to a recipient, the server extracts the samples needed only to synthesize the words in the text message, and delivers those to the receiving client device so that they are used by a client-side concatenative text-to-speech engine to generate audible speech signals having a close likeness to the actual voice of the author.

Type: Grant

Filed: October 3, 2005

Date of Patent: July 17, 2012

Assignee: Nuance Communications, Inc.

Inventors: Terry Wade Niemeyer, Liliana Orozco
Method and apparatus for recovering line spectrum pair parameter and speech decoding apparatus using same

Patent number: 8214203

Abstract: A method and an apparatus for recovering a line spectrum pair (LSP) parameter of a spectrum region when frame loss occurs during speech decoding and a speech decoding apparatus adopting the same are provided. The method of recovering an LSP parameter in speech decoding includes: if it is determined that a received speech packet has an erased frame, converting an LSP parameter of a previous good frame (PGF) of the erased frame or LSP parameters of the PGF and a next good frame (NGF) of the erased frame into a spectrum region and obtaining a spectrum envelope of the PGF or spectrum envelopes of the PGF and NGF; recovering a spectrum envelope of the erased frame using the spectrum envelope of the PGF or the spectrum envelopes of the PGF and NGF; and converting the recovered spectrum envelope of the erased frame into an LSP parameter of the erased frame.

Type: Grant

Filed: March 25, 2010

Date of Patent: July 3, 2012

Assignee: Samsung Electronics Co., Ltd.

Inventors: Hosang Sung, Seungho Choi, Kihyun Choo
Method and apparatus of generating text script for a corpus-based text-to speech system

Patent number: 8175865

Abstract: A method of text script generation for a corpus-based text-to-speech system includes searching in a source corpus having L sentences, selecting N sentences with a best integrated efficiency as N best cases, and setting iteration k to be 1; for each case n of the N best cases, selecting Mk+1 best sentences with the best integrated efficiency from the unselected sentences in the source corpus; keeping N best cases out of the total unselected sentences for next iteration, and increasing iteration k by 1; and if a termination criterion being reached, setting the best case in the N traced cases as the text script, otherwise, returning to the (k+1)th iteration of searching in the unselected sentences for (k+1)th sentence; wherein the best integrated efficiency depends on a function of combining the covering rate of the synthesis unit type, the hit rate of the synthesis unit type, and the text script size.

Type: Grant

Filed: December 14, 2007

Date of Patent: May 8, 2012

Assignee: Industrial Technology Research Institute

Inventors: Chih-Chung Kuo, Jing-Yi Huang
Generalized object recognition for portable reading machine

Patent number: 8160880

Abstract: Techniques for operating a reading machine are disclosed. The techniques include forming an N-dimensional features vector based on features of an image, the features corresponding to characteristics of at least one object depicted in the image, representing the features vector as a point in n-dimensional space, where n corresponds to N, the number of features in the features vector and comparing the point in n-dimensional space to a centroid that represents a cluster of points in the n-dimensional space corresponding to a class of objects to determine whether the point belongs in the class of objects corresponding to the centroid.

Type: Grant

Filed: April 28, 2008

Date of Patent: April 17, 2012

Assignee: K-NFB Reading Technology, Inc.

Inventors: Paul Albrecht, Rafael Maya Zetune, Lucy Gibson, Raymond C. Kurzweil
Speech information processing apparatus and method

Patent number: 8160882

Abstract: A temporary child set is generated. An elastic ratio of an elastic section of a model pattern is calculated. A temporary typical pattern of the set is generated by combining the pattern belonging to the set with the model pattern having the elastic pattern expanded or contracted. A distortion between the temporary typical pattern of the set and the pattern belonging to the set is calculated, and a child set is determined as the set when the distortion is below a threshold. A typical pattern as the temporary typical pattern of the child set is stored with a classification rule as the classification item of the context of the pattern belonging to the child set.

Type: Grant

Filed: January 23, 2009

Date of Patent: April 17, 2012

Assignee: Kabushiki Kaisha Toshiba

Inventor: Nobuaki Mizutani

prev 1 2 3 4 5 6 next