Specialized Model Patents (Class 704/266)
  • Publication number: 20140222421
    Abstract: A speech-synthesizing device includes a hierarchical prosodic module, a prosody-analyzing device, and a prosody-synthesizing unit. The hierarchical prosodic module generates at least a first hierarchical prosodic model. The prosody-analyzing device receives a low-level linguistic feature, a high-level linguistic feature and a first prosodic feature, and generates at least a prosodic tag based on the low-level linguistic feature, the high-level linguistic feature, the first prosodic feature and the first hierarchical prosodic model. The prosody-synthesizing unit synthesizes a second prosodic feature based on the hierarchical prosodic module, the low-level linguistic feature and the prosodic tag.
    Type: Application
    Filed: January 30, 2014
    Publication date: August 7, 2014
    Applicant: National Chiao Tung University
    Inventors: Sin-Horng Chen, Yih-Ru Wang, Chen-Yu Chiang, Chiao-Hua Hsieh
  • Patent number: 8781835
    Abstract: Methods and apparatuses are provided for facilitating speech synthesis. A method may include generating a plurality of input models representing an input by using a statistical model synthesizer to statistically model the input. The method may further include determining a speech unit sequence representing at least a portion of the input by using the input models to influence selection of one or more pre-recorded speech units having parameter representations. The method may additionally include identifying one or more bad units in the unit sequence. The method may also include replacing the identified one or more bad units with one or more parameters generated by the statistical model synthesizer. Corresponding apparatuses are also provided.
    Type: Grant
    Filed: May 2, 2011
    Date of Patent: July 15, 2014
    Assignee: Nokia Corporation
    Inventors: Jani Kristian Nurminen, Hanna Margareeta Silen, Elina Helander
  • Patent number: 8751236
    Abstract: A device may receive a plurality of speech sounds that are indicative of pronunciations of a first linguistic term. The device may determine concatenation features of the plurality of speech sounds. The concatenation features may be indicative of an acoustic transition between a first speech sound and a second speech sound when the first speech sound and the second speech sound are concatenated. The first speech sound may be included in the plurality of speech sounds and the second speech sound may be indicative of a pronunciation of a second linguistic term. The device may cluster the plurality of speech sounds into one or more clusters based on the concatenation features. The device may provide a representative speech sound of the given cluster as the first speech sound when the first speech sound and the second speech sound are concatenated.
    Type: Grant
    Filed: October 23, 2013
    Date of Patent: June 10, 2014
    Assignee: Google Inc.
    Inventors: Javier Gonzalvo Fructuoso, Alexander Gutkin, Ioannis Agiomyrgiannakis
  • Patent number: 8751239
    Abstract: An apparatus for providing text independent voice conversion may include a first voice conversion model and a second voice conversion model. The first voice conversion model may be trained with respect to conversion of training source speech to synthetic speech corresponding to the training source speech. The second voice conversion model may be trained with respect to conversion to training target speech from synthetic speech corresponding to the training target speech. An output of the first voice conversion model may be communicated to the second voice conversion model to process source speech input into the first voice conversion model into target speech corresponding to the source speech as the output of the second voice conversion model.
    Type: Grant
    Filed: October 4, 2007
    Date of Patent: June 10, 2014
    Assignee: Core Wireless Licensing, S.a.r.l.
    Inventors: Jilei Tian, Victor Popa, Jani K. Nurminen
  • Patent number: 8744851
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: August 13, 2013
    Date of Patent: June 3, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair Conkie, Ann K Syrdal
  • Publication number: 20140142946
    Abstract: The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary waveforms, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition.
    Type: Application
    Filed: September 24, 2012
    Publication date: May 22, 2014
    Inventor: Chengjun Julian Chen
  • Patent number: 8731933
    Abstract: A speech synthesizing apparatus includes a selector configured to select a plurality of speech units for synthesizing a speech of a phoneme sequence by referring to speech unit information stored in an information memory. Speech unit waveforms corresponding to the speech units are acquired from a plurality of speech unit waveforms stored in a waveform memory, and the speech is synthesized by utilizing the speech unit waveforms acquired. When acquiring the speech unit waveforms, at least two speech unit waveforms from a continuous region of the waveform memory are copied onto a buffer by one access, wherein a data quantity of the at least two speech unit waveforms is less than or equal to a size of the buffer.
    Type: Grant
    Filed: April 10, 2013
    Date of Patent: May 20, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Takehiko Kagoshima
  • Patent number: 8719030
    Abstract: The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary acoustic waves, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition.
    Type: Grant
    Filed: December 3, 2012
    Date of Patent: May 6, 2014
    Inventor: Chengjun Julian Chen
  • Patent number: 8706497
    Abstract: A synthesis filter 106 synthesizes a plurality of wide-band speech signals by combining wide-band phoneme signals and sound source signals from a speech signal code book 105, and a distortion evaluation unit 107 selects one of the wide-band speech signals with a minimum waveform distortion with respect to an up-sampled narrow-band speech signal output from a sampling conversion unit 101. A first bandpass filter 103 extracts a frequency component outside a narrow-band of the wide-band speech signal and a band synthesis unit 104 combines it with the up-sampled narrow-band speech signal.
    Type: Grant
    Filed: October 22, 2010
    Date of Patent: April 22, 2014
    Assignee: Mitsubishi Electric Corporation
    Inventors: Satoru Furuta, Hirohisa Tasaki
  • Patent number: 8655664
    Abstract: According to an embodiment, a text presentation apparatus presenting text for a speaker to read aloud for voice recording includes: a text storing unit for storing first text; a presenting unit for presenting the first text; a determination unit for determining whether or not the first text needs to be replaced, on the basis of a speaker's input for the first text presented; a preliminary text storing unit for storing preliminary text; a select unit configured to select, if it is determined that the first text needs to be replaced, second text to replace the first text from among the preliminary text, the selecting being performed on the basis of attribute information describing an attribute of the first text and on the basis of at least one of attribute information describing pronunciation of the first text and attribute information describing a stress type of the first text; and a control unit configured to control the presenting unit so that the presenting unit presents the second text.
    Type: Grant
    Filed: August 11, 2011
    Date of Patent: February 18, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Kentaro Tachibana, Gou Hirabayashi, Takehiko Kagoshima
  • Patent number: 8655659
    Abstract: A personalized text-to-speech synthesizing device includes: a personalized speech feature library creator, configured to recognize personalized speech features of a specific speaker by comparing a random speech fragment of the specific speaker with preset keywords, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker and created by the personalized speech feature library creator, thereby to generate and output a speech fragment having pronunciation characteristics of the specific speaker.
    Type: Grant
    Filed: August 12, 2010
    Date of Patent: February 18, 2014
    Assignees: Sony Corporation, Sony Mobile Communications AB
    Inventors: Qingfang Wang, Shouchun He
  • Patent number: 8639511
    Abstract: A robot may include a driving control unit configured to control a driving of a movable unit that is connected movably to a body unit, a voice generating unit configured to generate a voice, and a voice output unit configured to output the voice, which has been generated by the voice generating unit. The voice generating unit may correct the voice, which is generated, based on a bearing of the movable unit, which is controlled by the driving control unit, to the body unit.
    Type: Grant
    Filed: September 14, 2010
    Date of Patent: January 28, 2014
    Assignee: Honda Motor Co., Ltd.
    Inventors: Kazuhiro Nakadai, Takuma Otsuka, Hiroshi Okuno
  • Patent number: 8630857
    Abstract: Disclosed is a speech synthesizing apparatus including a segment selection unit that selects a segment suited to a target segment environment from candidate segments, includes a prosody change amount calculation unit that calculates prosody change amount of each candidate segment based on prosody information of candidate segments and the target segment environment, a selection criterion calculation unit that calculates a selection criterion based on the prosody change amount, a candidate selection unit that narrows down selection candidates based on the prosody change amount and the selection criterion, and an optimum segment search unit than searches for an optimum segment from among the narrowed-down candidate segments.
    Type: Grant
    Filed: February 15, 2008
    Date of Patent: January 14, 2014
    Assignee: NEC Corporation
    Inventors: Masanori Kato, Reishi Kondo, Yasuyuki Mitsui
  • Patent number: 8630971
    Abstract: Systems, devices, and methods for using Multi-Pattern Viterbi Algorithm for joint decoding of multiple patterns are disclosed. An exemplary method may receive a plurality of sets of time-sequential signal observations for each of a number K of signal repetitions. Further, each set of signal observations is associated with a respective dimension of a K-dimensional time grid having time-indexed points. Moreover, at each of a plurality of the time-indexed points, a state cost metric is calculated with a processor for each state in a set of states of a hidden Markov model (HMM). In addition, each state in the set of states and for a given time-indexed point, the state cost metric calculation provides a most-likely predecessor state and a corresponding most-likely predecessor time-indexed point. The exemplary method may also determine a sequence of states using the calculated state cost metrics and determine a corresponding cumulative probability measure for the HMM.
    Type: Grant
    Filed: January 5, 2010
    Date of Patent: January 14, 2014
    Assignee: Indian Institute of Science
    Inventors: Nishanth Ulhas Nair, Thippur Venkatanarasaiah Sreenivas
  • Patent number: 8600753
    Abstract: An arrangement provides for improved synthesis of speech arising from a message text. The arrangement stores prerecorded prompts and speech related characteristics for those prompts. A message is parsed to determine if any message portions have been recorded previously. If so then speech related characteristics for those portions are retrieved. The arrangement generates speech related characteristics for those parties not previously stored. The retrieved and generated characteristics are combined. The combination of characteristics is then used as the input to a speech synthesizer.
    Type: Grant
    Filed: December 30, 2005
    Date of Patent: December 3, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Alistair Conkie
  • Patent number: 8583437
    Abstract: Service architecture for providing to a user terminal of a communications network textual information and relative speech synthesis, the user terminal being provided with a speech synthesis engine and a basic database of speech waveforms includes: a content server for downloading textual information requested by means of a browser application on the user terminal; a context manager for extracting context information from the textual information requested by the user terminal; a context selector for selecting an incremental database of speech waveforms associated with extracted context information and for downloading the incremental database into the user terminal; a database manager on the user terminal for managing the composition of an enlarged database of speech waveforms for the speech synthesis engine including the basic and the incremental databases of speech waveforms.
    Type: Grant
    Filed: May 31, 2005
    Date of Patent: November 12, 2013
    Assignee: Telecom Italia S.p.A.
    Inventors: Alessio Cervone, Ivano Salvatore Collotta, Paolo Coppo, Donato Ettorre, Maurizio Fodrini, Maura Turolla
  • Patent number: 8571849
    Abstract: Disclosed herein are systems, methods, and computer readable-media for enriching spoken language translation with prosodic information in a statistical speech translation framework. The method includes receiving speech for translation to a target language, generating pitch accent labels representing segments of the received speech which are prosodically prominent, and injecting pitch accent labels with word tokens within the translation engine to create enriched target language output text. A further step may be added of synthesizing speech in the target language based on the prosody enriched target language output text. An automatic prosody labeler can generate pitch accent labels. An automatic prosody labeler can exploit lexical, syntactic, and prosodic information of the speech. A maximum entropy model may be used to determine which segments of the speech are prosodically prominent.
    Type: Grant
    Filed: September 30, 2008
    Date of Patent: October 29, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Srinivas Bangalore, Vivek Kumar Rangarajan Sridhar
  • Patent number: 8566099
    Abstract: A system and method for improving the response time of text-to-speech synthesis using triphone contexts. The method includes identifying a set of triphone sequences, tabulating the set of triphone sequences using a plurality of contexts, where each context specific triphone sequence of the plurality of context specific triphone sequences has a top N triphone units made of the triphone units having lowest target costs when each triphone unit is individually combined into a 5-phoneme combination. Input texts having one of the contexts are received, and one of the context specific triphone sequences is selected based on the context. Input text is then synthesized using the context specific triphone sequence.
    Type: Grant
    Filed: July 16, 2012
    Date of Patent: October 22, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Alistair D. Conkie
  • Patent number: 8560317
    Abstract: A vocabulary dictionary storing unit for storing a plurality of words in advance, a vocabulary dictionary managing unit for extracting recognition target words, a matching unit for calculating a degree of matching with the recognition target words based on an accepted voice, a result output unit for outputting, as a recognition result, a word having a best score from a result of calculating the degree of matching, and an extraction criterion information managing unit for changing extraction criterion information according to a result of monitoring by a monitor control unit are provided. The vocabulary dictionary storing unit further includes a scale information storing unit for storing scale information serving as a scale at the time of extracting the recognition target words, and an extraction criterion information storing unit for storing extraction criterion information indicating a criterion of the recognition target words at the time of extracting the recognition target words.
    Type: Grant
    Filed: September 18, 2006
    Date of Patent: October 15, 2013
    Assignee: Fujitsu Limited
    Inventor: Kenji Abe
  • Patent number: 8554566
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: November 29, 2012
    Date of Patent: October 8, 2013
    Assignee: Morphism LLC
    Inventor: James H. Stephens, Jr.
  • Patent number: 8543404
    Abstract: Embodiments of the present invention provide a method and computer program product for the proactive completion of input fields for automated voice enablement of a Web page. In an embodiment of the invention, a method for proactively completing empty input fields for voice enabling a Web page can be provided. The method can include receiving speech input for an input field in a Web page and inserting a textual equivalent to the speech input into the input field in a Web page. The method further can include locating an empty input field remaining in the Web page and generating a speech grammar for the input field based upon permitted terms in a core attribute of the empty input field and prompting for speech input for the input field. Finally, the method can include posting the received speech input and the grammar to an automatic speech recognition (ASR) engine and inserting a textual equivalent to the speech input provided by the ASR engine into the empty input field.
    Type: Grant
    Filed: April 7, 2008
    Date of Patent: September 24, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Victor S. Moore, Wendi L. Nusbickel
  • Patent number: 8537164
    Abstract: Systems and methods are described, which create a mapping from a space of a source object (e.g., source facial expressions) to a space of a target object (e.g., target facial expressions). In certain implementations, the mapping is learned based a training set composed of corresponding shapes (e.g. facial expressions) in each space. The user can create the training set by selecting expressions from, for example, captured source performance data, and by sculpting corresponding target expressions. Additional target shapes (e.g., target facial expressions) can be interpolated and extrapolated from the shapes in the training set to generate corresponding shapes for potential source shapes (e.g., facial expressions).
    Type: Grant
    Filed: October 10, 2011
    Date of Patent: September 17, 2013
    Assignee: Lucasfilm Entertainment Company Ltd.
    Inventors: Frederic P. Pighin, Cary Phillips, Steve Sullivan
  • Patent number: 8510112
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, modifying the identified segments in the primary speech database using selected mappings, enhancing the primary speech database by substituting the modified segments for the corresponding identified database segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: August 31, 2006
    Date of Patent: August 13, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair Conkie, Ann Syrdal
  • Patent number: 8494849
    Abstract: A method of transmitting speech data to a remote device in a distributed speech recognition system, includes the steps of: dividing an input speech signal into frames; calculating, for each frame, a voice activity value representative of the presence of speech activity in the frame; grouping the frames into multiframes, each multiframe including a predetermined number of frames; calculating, for each multiframe, a voice activity marker representative of the number of frames in the multiframe representing speech activity; and selectively transmitting, on the basis of the voice activity marker associated with each multiframe, the multiframes to the remote device.
    Type: Grant
    Filed: June 20, 2005
    Date of Patent: July 23, 2013
    Assignee: Telecom Italia S.p.A.
    Inventors: Ivano Salvatore Collotta, Donato Ettorre, Maurizio Fodrini, Pierluigi Gallo, Roberto Spagnolo
  • Patent number: 8494854
    Abstract: An audible based electronic challenge system is used to control access to a computing resource by using a test to identify an origin of a voice. The test is based on analyzing a spoken utterance using optimized challenge items selected for their discrimination capability to determine if it was articulated by an unauthorized human or a text to speech (TTS) system.
    Type: Grant
    Filed: June 15, 2009
    Date of Patent: July 23, 2013
    Assignee: John Nicholas and Kristin Gross
    Inventor: John Nicholas Gross
  • Patent number: 8489399
    Abstract: An audible based electronic challenge system is used to control access to a computing resource by using a test to identify an origin of a voice. The test is based on analyzing a spoken utterance to determine if it was articulated by an unauthorized human or a text to speech (TTS) system.
    Type: Grant
    Filed: June 15, 2009
    Date of Patent: July 16, 2013
    Assignee: John Nicholas and Kristin Gross Trust
    Inventor: John Nicholas Gross
  • Patent number: 8478595
    Abstract: A fundamental frequency pattern generation apparatus includes a first storage including representative vectors each corresponding to a prosodic control unit and having a section for changing the number of phonemes, a second storage unit including a rule to select a vector corresponding to an input context, a selection unit configured to select a vector from the representative vectors by applying the rule to the context and output the selected vector, a calculation unit configured to calculate an expansion/contraction ratio of the section of the selected vector in a time-axis direction based on a designated value for a specific feature amount related to a length of a fundamental frequency pattern to be generated, the designated value of the feature amount being required of the fundamental frequency pattern to be generated, and an expansion/contraction unit configured to expand/contract the selected vector based on the expansion/contraction ratio to generate the fundamental frequency pattern.
    Type: Grant
    Filed: September 5, 2008
    Date of Patent: July 2, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Nobuaki Mizutani
  • Patent number: 8468020
    Abstract: An apparatus for synthesizing a speech including a waveform memory that stores a plurality of speech unit waveforms, an information memory that correspondingly stores speech unit information and an address of each of the speech unit waveforms, a selector that selects a speech unit sequence corresponding to the input phoneme sequence by referring to the speech unit information, a speech unit waveform acquisition unit that acquires a speech unit waveform corresponding to each speech unit of the speech unit sequence from the waveform memory by referring to the address, a speech unit concatenation unit that generates the speech by concatenating the speech unit waveform acquired.
    Type: Grant
    Filed: May 8, 2007
    Date of Patent: June 18, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Takehiko Kagoshima
  • Patent number: 8438032
    Abstract: An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats.
    Type: Grant
    Filed: January 9, 2007
    Date of Patent: May 7, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Raimo Bakis, Ellen M. Eide, Roberto Pieraccini, Maria E. Smith, Jie Zeng
  • Patent number: 8428952
    Abstract: A system and method to allow an author of an instant message to enable and control the production of audible speech to the recipient of the message. The voice of the author of the message is characterized into parameters compatible with a formative or articulative text-to-speech engine such that upon receipt, the receiving client device can generate audible speech signals from the message text according to the characterization of the author's voice. Alternatively, the author can store samples of his or her actual voice in a server so that, upon transmission of a message by the author to a recipient, the server extracts the samples needed only to synthesize the words in the text message, and delivers those to the receiving client device so that they are used by a client-side concatenative text-to-speech engine to generate audible speech signals having a close likeness to the actual voice of the author.
    Type: Grant
    Filed: June 12, 2012
    Date of Patent: April 23, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Terry Wade Niemeyer, Liliana Orozco
  • Patent number: 8423366
    Abstract: A method includes receiving, by a system, a voice recording associated with a user, transcribing, the voice recording into text that includes a group of words, and storing an association between a portion of each respective word and a corresponding portion of the voice recording. The corresponding portion of the voice recording is the portion of the voice recording from which the portion of the respective word was transcribed. The method may also include determining a modification to a speech synthesis voice associated with the user based at least in part on the association.
    Type: Grant
    Filed: July 18, 2012
    Date of Patent: April 16, 2013
    Assignee: Google Inc.
    Inventors: Marcus Alexander Foster, Richard Zarek Cohen
  • Patent number: 8412529
    Abstract: An approach is provided for enhancing verbal communication sessions. A verbal component of a communication session is converted into textual information. The converted textual information is scanned for a text string to trigger an application. The application is invoked to provide supplemental information about the textual information or to perform an action in response to the textual information for or on behalf of a party of the communication session. The supplemental information or a confirmation of the action is transmitted to the party.
    Type: Grant
    Filed: October 29, 2008
    Date of Patent: April 2, 2013
    Assignee: Verizon Patent and Licensing Inc.
    Inventors: Martin W. McKee, Paul T. Schultz, Robert A. Sartini
  • Patent number: 8407054
    Abstract: A speech synthesis device is provided with: a central segment selection unit for selecting a central segment from among a plurality of speech segments; a prosody generation unit for generating prosody information based on the central segment; a non-central segment selection unit for selecting a non-central segment, which is a segment outside of a central segment section, based on the central segment and the prosody information; and a waveform generation unit for generating a synthesized speech waveform based on the prosody information, the central segment, and the non-central segment. The speech synthesis device first selects a central segment that forms a basis for prosody generation and generates prosody information based on the central segment so that it is possible to sufficiently reduce both concatenation distortion and sound quality degradation accompanying prosody control in the section of the central segment.
    Type: Grant
    Filed: April 28, 2008
    Date of Patent: March 26, 2013
    Assignee: NEC Corporation
    Inventors: Masanori Kato, Yasuyuki Mitsui, Reishi Kondo
  • Patent number: 8374873
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: August 11, 2009
    Date of Patent: February 12, 2013
    Assignee: Morphism, LLC
    Inventor: James H. Stephens, Jr.
  • Patent number: 8370149
    Abstract: Waveform concatenation speech synthesis with high sound quality. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. An accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values.
    Type: Grant
    Filed: August 15, 2008
    Date of Patent: February 5, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Ryuki Tachibana, Masafumi Nishimura
  • Patent number: 8352272
    Abstract: Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
    Type: Grant
    Filed: September 29, 2008
    Date of Patent: January 8, 2013
    Assignee: Apple Inc.
    Inventors: Matthew Rogers, Kim Silverman, Devang Naik, Kevin Lenzo, Benjamin Rottler
  • Patent number: 8340965
    Abstract: Embodiments of rich context modeling for speech synthesis are disclosed. In operation, a text-to-speech engine refines a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models. The text-to-speech engine then generates synthesized speech for an input text based at least on some of the plurality of refined rich context models.
    Type: Grant
    Filed: December 2, 2009
    Date of Patent: December 25, 2012
    Assignee: Microsoft Corporation
    Inventors: Zhi-Jie Yan, Yao Qian, Frank Kao-Ping Soong
  • Patent number: 8315872
    Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice.
    Type: Grant
    Filed: November 29, 2011
    Date of Patent: November 20, 2012
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Mark Charles Beutnagel, Mehryar Mohri, Michael Dennis Riley
  • Patent number: 8315871
    Abstract: A rope-jumping algorithm is employed in a Hidden Markov Model based text to speech system to determine start and end models and to modify the start and end models by setting small co-variances. Disordered acoustic parameters due to violation of parameter constraints are avoided through the modification and result in stable line frequency spectrum for the generated speech.
    Type: Grant
    Filed: June 4, 2009
    Date of Patent: November 20, 2012
    Assignee: Microsoft Corporation
    Inventors: Wenlin Wang, Guoliang Zhang, Jingyang Xu
  • Patent number: 8275621
    Abstract: Systems and methods are provided for automatically building a native phonetic lexicon for a speech-based application trained to process a native (base) language, wherein the native phonetic lexicon includes native phonetic transcriptions (base forms) for non-native (foreign) words which are automatically derived from non-native phonetic transcriptions of the non-native words.
    Type: Grant
    Filed: May 18, 2011
    Date of Patent: September 25, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Neal J. Alewine, Eric William Janke, Paul Sharp, Robert Sicconi
  • Patent number: 8255221
    Abstract: Disclosed is a system and method for generating a web podcast interview that allows a single user to create his own multi-voices interview from his computer. The method allows the user to enter a set of questions from a text file using a text editor. (Answers may also be entered from a text file although this is not the more preferred embodiment.) For each question, the user may select one particular interviewer voice among a plurality of predefined interviewer voices, and by using a text-to-speech module in a text-to-speech server, each question is converted into an audio question having the selected interviewer voice. Then, the user preferably records answers to each audio question using a telephone. And a questions/answers sequence in a podcast compliant format is generated.
    Type: Grant
    Filed: December 1, 2008
    Date of Patent: August 28, 2012
    Assignee: International Business Machines Corporation
    Inventors: Steve Groeger, Brian Heasman, Christopher von Koschembahr, Yuk-Lun Wong
  • Patent number: 8249869
    Abstract: The method is suitable for dysorthographic or partially sighted persons, to facilitate the semantic, syntactic and/or lexical correction of an erroneous expression in a digital text input by a user. The method comprises the sequence of: a step (74) of transforming the digital text into a digital voice message, in which the graphemes of the erroneous textual expression are converted into phoneme(s) constituting an intelligible vocal expression, then a step (78) of processing the digital voice message obtained at the end of the transformation step (74), in which the phoneme or phonemes constituting the intelligible vocal expression are converted into grapheme(s) constituting a corrected textual expression of the erroneous textual expression, with the aid of pre-established writing rules.
    Type: Grant
    Filed: June 15, 2007
    Date of Patent: August 21, 2012
    Assignee: Logolexie
    Inventors: Gilles Vessiere, Joël Bachelerie
  • Patent number: 8249874
    Abstract: Speech is synthesized for a given text by determining a sequence of phonetic components based on the text, determining a sequence of target phonetic elements associated phonetic components, determining a sequence of target event types associated with the phonetic components and determining a sequence of speech units from a plurality of stored speech unit candidates by use of a cost function. The cost function comprises a unit cost, a concatenation cost, and an event type cost for each speech unit in the sequence of speech units. The unit cost of a speech unit is determined with respect to the corresponding target phonetic element, while the concatenation cost of a speech unit is determined with respect to adjacent speech units and the event type cost of each speech unit is determined with respect to the corresponding target event type.
    Type: Grant
    Filed: February 25, 2008
    Date of Patent: August 21, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Gregor Moehler, Andreas Zehnpfenning
  • Patent number: 8229744
    Abstract: A method, system, and computer program for class detection and time mediated averaging of class dependent models. A technique is described to take advantage of gender information in training data and how obtain female, male, and gender independent models from this information. By using a probability value to average male and female Gaussian Mixture Models (GMMs), dramatic deterioration in cross gender decoding performance is avoided.
    Type: Grant
    Filed: August 26, 2003
    Date of Patent: July 24, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Satyanarayana Dharanipragada, Peder A. Olsen
  • Patent number: 8224645
    Abstract: A system and method for improving the response time of text-to-speech synthesis using triphone contexts. The method includes receiving input text, selecting a plurality of N phoneme units from a triphone unit selection database as candidate phonemes for synthesized speech based on the input text, wherein the triphone unit selection database comprises triphone units each comprising three phones and if the candidate phonemes are available in the triphone unit selection database, applying a cost process to select a set of phonemes from the candidate phonemes. If no candidate phonemes are available in the triphone unit selection database, the method includes applying a single phoneme approach to select single phonemes for synthesis, which single phonemes are used in synthesis independent of a triphone structure. The method also includes synthesizing speech using at least one of the set of phonemes from the candidate phonemes and the selected single phonemes for synthesis from the single phoneme approach.
    Type: Grant
    Filed: December 1, 2008
    Date of Patent: July 17, 2012
    Assignee: AT+T Intellectual Property II, L.P.
    Inventor: Alistair D. Conkie
  • Patent number: 8224647
    Abstract: A system and method to allow an author of an instant message to enable and control the production of audible speech to the recipient of the message. The voice of the author of the message is characterized into parameters compatible with a formative or articulative text-to-speech engine such that upon receipt, the receiving client device can generate audible speech signals from the message text according to the characterization of the author's voice. Alternatively, the author can store samples of his or her actual voice in a server so that, upon transmission of a message by the author to a recipient, the server extracts the samples needed only to synthesize the words in the text message, and delivers those to the receiving client device so that they are used by a client-side concatenative text-to-speech engine to generate audible speech signals having a close likeness to the actual voice of the author.
    Type: Grant
    Filed: October 3, 2005
    Date of Patent: July 17, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Terry Wade Niemeyer, Liliana Orozco
  • Patent number: 8214203
    Abstract: A method and an apparatus for recovering a line spectrum pair (LSP) parameter of a spectrum region when frame loss occurs during speech decoding and a speech decoding apparatus adopting the same are provided. The method of recovering an LSP parameter in speech decoding includes: if it is determined that a received speech packet has an erased frame, converting an LSP parameter of a previous good frame (PGF) of the erased frame or LSP parameters of the PGF and a next good frame (NGF) of the erased frame into a spectrum region and obtaining a spectrum envelope of the PGF or spectrum envelopes of the PGF and NGF; recovering a spectrum envelope of the erased frame using the spectrum envelope of the PGF or the spectrum envelopes of the PGF and NGF; and converting the recovered spectrum envelope of the erased frame into an LSP parameter of the erased frame.
    Type: Grant
    Filed: March 25, 2010
    Date of Patent: July 3, 2012
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hosang Sung, Seungho Choi, Kihyun Choo
  • Patent number: 8175865
    Abstract: A method of text script generation for a corpus-based text-to-speech system includes searching in a source corpus having L sentences, selecting N sentences with a best integrated efficiency as N best cases, and setting iteration k to be 1; for each case n of the N best cases, selecting Mk+1 best sentences with the best integrated efficiency from the unselected sentences in the source corpus; keeping N best cases out of the total unselected sentences for next iteration, and increasing iteration k by 1; and if a termination criterion being reached, setting the best case in the N traced cases as the text script, otherwise, returning to the (k+1)th iteration of searching in the unselected sentences for (k+1)th sentence; wherein the best integrated efficiency depends on a function of combining the covering rate of the synthesis unit type, the hit rate of the synthesis unit type, and the text script size.
    Type: Grant
    Filed: December 14, 2007
    Date of Patent: May 8, 2012
    Assignee: Industrial Technology Research Institute
    Inventors: Chih-Chung Kuo, Jing-Yi Huang
  • Patent number: 8160880
    Abstract: Techniques for operating a reading machine are disclosed. The techniques include forming an N-dimensional features vector based on features of an image, the features corresponding to characteristics of at least one object depicted in the image, representing the features vector as a point in n-dimensional space, where n corresponds to N, the number of features in the features vector and comparing the point in n-dimensional space to a centroid that represents a cluster of points in the n-dimensional space corresponding to a class of objects to determine whether the point belongs in the class of objects corresponding to the centroid.
    Type: Grant
    Filed: April 28, 2008
    Date of Patent: April 17, 2012
    Assignee: K-NFB Reading Technology, Inc.
    Inventors: Paul Albrecht, Rafael Maya Zetune, Lucy Gibson, Raymond C. Kurzweil
  • Patent number: 8160882
    Abstract: A temporary child set is generated. An elastic ratio of an elastic section of a model pattern is calculated. A temporary typical pattern of the set is generated by combining the pattern belonging to the set with the model pattern having the elastic pattern expanded or contracted. A distortion between the temporary typical pattern of the set and the pattern belonging to the set is calculated, and a child set is determined as the set when the distortion is below a threshold. A typical pattern as the temporary typical pattern of the child set is stored with a classification rule as the classification item of the context of the pattern belonging to the child set.
    Type: Grant
    Filed: January 23, 2009
    Date of Patent: April 17, 2012
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Nobuaki Mizutani