Specialized Model Patents (Class 704/266)
  • Patent number: 10979762
    Abstract: Systems and methods are described herein for a media guidance application that can cause a specific portion of a media asset to be stored based on a user command. For example, if the user requests the closing scene from a given movie, the media guidance application may detect the command, determine that it comprises an instruction to store a portion of a media asset, identify a source of the portion of the media asset, and cause the portion of the media asset to be stored. The media guidance application may also cause the entirety of the media asset to be stored and initiate playback at the start of the requested portion. This may allow users to store and watch portions of particular interest without requiring that the users seek through the entire media asset on their own.
    Type: Grant
    Filed: July 22, 2019
    Date of Patent: April 13, 2021
    Assignee: Rovi Guides, Inc.
    Inventors: Paul Maltar, Milan Patel, Yong Gong
  • Patent number: 10978067
    Abstract: A home appliance including an audio input unit including at least one microphone and into which a voice command formed of natural language is inputted; a communication unit which transmits the voice command to a voice recognition system as voice data, and receives a response signal from the voice recognition system; a controller which sets an operation corresponding to the response signal, outputs an operation state, and outputs a guidance announcement, according to a result of voice recognition of the voice recognition system; and an audio output unit which outputs the guidance announcement corresponding to the operation, wherein the controller determines whether it is possible to set the operation or to support a function for the operation in response to the response signal and an operation state of the home appliance, and performs the operation.
    Type: Grant
    Filed: January 8, 2019
    Date of Patent: April 13, 2021
    Assignee: LG ELECTRONICS INC.
    Inventors: Junhee An, Hyojeong Kang, Jaehoon Lee, Heungkyu Lee
  • Patent number: 10699695
    Abstract: During text-to-speech processing, audio data corresponding to a word part, word, or group of words is generated using a trained model and used by a unit selection engine to create output audio. The audio data is generated at least when an input word is unrecognized or when a cost of a unit selection is too high.
    Type: Grant
    Filed: June 29, 2018
    Date of Patent: June 30, 2020
    Assignee: Amazon Washington, Inc.
    Inventors: Adam Franciszek Nadolski, Daniel Korzekwa, Thomas Edward Merritt, Marco Nicolis, Bartosz Putrycz, Roberto Barra Chicote, Rafal Kuklinski, Wiktor Dolecki
  • Patent number: 10546062
    Abstract: A token is extracted from a Natural Language input. A phonetic pattern is computed corresponding to the token, the phonetic pattern including a sound pattern that represents a part of the token when the token is spoken. New data is created from data of the phonetic pattern, the new data including a syllable sequence corresponding to the phonetic pattern. A state of a data storage device is changed by storing the new data in a matrix of syllable sequences corresponding to the token. An option is selected that corresponds to the token by executing a fuzzy matching algorithm using a processor and a memory, the selecting of the option is based on a syllable sequence in the matrix.
    Type: Grant
    Filed: November 15, 2017
    Date of Patent: January 28, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sean M. Fuoco, John M. Ganci, Jr., Craig M. Trim, Jie Zeng
  • Patent number: 10467333
    Abstract: Methods, apparatuses, and computer program products are described herein that are configured to enable updating of an output text. In some example embodiments, a method is provided that comprises generating a new message for each updateable data element based on a predetermined indication. The method of this embodiment may also include determining a classification for each new message by comparing each new message with a corresponding message that describes the updateable data element. The method of this embodiment may also include generating an additional document plan tree that contains at least a portion of the new messages. The method of this embodiment may also include combining the additional document plan tree with an original document plan tree.
    Type: Grant
    Filed: April 7, 2016
    Date of Patent: November 5, 2019
    Assignee: ARRIA DATA2TEXT LIMITED
    Inventors: Alasdair James Logan, Ehud Baruch Reiter
  • Patent number: 10339465
    Abstract: During a training phase of a machine learning model, representations of at least some nodes of a decision tree are generated and stored on persistent storage in depth-first order. A respective predictive utility metric (PUM) value is determined for one or more nodes, indicating expected contributions of the nodes to a prediction of the model. A particular node is selected for removal from the tree based at least partly on its PUM value. A modified version of the tree, with the particular node removed, is stored for obtaining a prediction.
    Type: Grant
    Filed: August 19, 2014
    Date of Patent: July 2, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Robert Matthias Steele, Tarun Agarwal, Leo Parker Dirac, Jun Qian
  • Patent number: 10249289
    Abstract: Methods, systems, and computer-readable media for text-to-speech synthesis using an autoencoder. In some implementations, data indicating a text for text-to-speech synthesis is obtained. Data indicating a linguistic unit of the text is provided as input to an encoder. The encoder is configured to output speech unit representations indicative of acoustic characteristics based on linguistic information. A speech unit representation that the encoder outputs is received. A speech unit is selected to represent the linguistic unit, the speech unit being selected from among a collection of speech units based on the speech unit representation output by the encoder. Audio data for a synthesized utterance of the text that includes the selected speech unit is provided.
    Type: Grant
    Filed: July 13, 2017
    Date of Patent: April 2, 2019
    Assignee: Google LLC
    Inventors: Byung Ha Chun, Javier Gonzalvo, Chun-an Chan, Ioannis Agiomyrgiannakis, Vincent Ping Leung Wan, Robert Andrew James Clark, Jakub Vit
  • Patent number: 9756446
    Abstract: An audio receiver that performs crosstalk cancellation using a speaker array is described. The audio receiver detects the location of a listener in a room and processes a piece of sound program content to be output through the speaker array using one or more beam pattern matrices. The beam pattern matrices are generated according to one or more constraints. The constraints may include increasing a right channel and decreasing a left channel at the right ear of the listener, increasing a left channel and decreasing a right channel at the left ear of the listener, and decreasing sound in all other areas of the room. These constraints cause the audio receiver to beam sound primarily towards the listener and not in other areas of the room such that crosstalk cancellation is achieved with minimal effects due to changes to the frequency response of the room. Other embodiments are also described.
    Type: Grant
    Filed: March 13, 2014
    Date of Patent: September 5, 2017
    Assignee: Apple Inc.
    Inventors: Martin E. Johnson, Ronald N. Isaac
  • Patent number: 9436891
    Abstract: A method for identifying synonymous expressions includes determining synonymous expression candidates for a target expression. A plurality of target images related to the target expression and a plurality of candidate images related to each of the synonymous expression candidates are identified. Features extracted from the plurality of target images are compared with features extracted from the plurality of candidate images using a processor to identify a synonymous expression of the target expression.
    Type: Grant
    Filed: July 30, 2013
    Date of Patent: September 6, 2016
    Assignee: GlobalFoundries, Inc.
    Inventors: Chiaki Oishi, Tetsuya Nasukawa, Shoko Suzuki
  • Patent number: 9275633
    Abstract: Technologies are described herein for providing validated text-to-speech correction hints from aggregated pronunciation corrections received from text-to-speech applications. A number of pronunciation corrections are received by a Web service. The pronunciation corrections may be provided by users of text-to-speech applications executing on a variety of user computer systems. Each of the plurality of pronunciation corrections includes a specification of a word or phrase and a suggested pronunciation provided by the user. The pronunciation corrections are analyzed to generate validated correction hints, and the validated correction hints are provided back to the text-to-speech applications to be used to correct pronunciation of words and phrases in the text-to-speech applications.
    Type: Grant
    Filed: January 9, 2012
    Date of Patent: March 1, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jeremy Edward Cath, Timothy Edwin Harris, James Oliver Tisdale, III
  • Patent number: 9195312
    Abstract: An information processing apparatus includes a voice acquiring unit configured to acquire voice information remarked by a user; an image information acquiring unit configured to acquire image information of the user; a user specifying unit configured to specify the user having remarked the voice information based on the image information when the voice information is acquired; a word extracting unit configured to extract a word from the voice information; a priority calculating unit configured to increase priority of the user having remarked the word for operation of a display device when the extracted word matches any one of keywords; a gesture detecting unit configured to detect a gesture made by the user based on the image information; and an operation permission determining unit configured to permit the user to operate the display device when the priority of the user having made the gesture is highest.
    Type: Grant
    Filed: June 4, 2013
    Date of Patent: November 24, 2015
    Assignee: RICOH COMPANY, LIMITED
    Inventors: Akinori Itoh, Hidekuni Annaka
  • Patent number: 9154880
    Abstract: A first filter coefficient and a second filter coefficient are calculated by using spatial transfer characteristics from a first speaker and a second speaker to a first control point and a second control point, and a first sound increase ratio na at the first control point and a second sound increase ratio nb at the second control point, so that, when the first filter coefficient is a through characteristic, a first composite sound pressure from the first speaker and the second speaker to the first control point is na times a first sound pressure from the first speaker to the first control point, and a second composite sound pressure from the first speaker and the second speaker to the second control point is nb times a second sound pressure from the first speaker to the second control point.
    Type: Grant
    Filed: June 19, 2013
    Date of Patent: October 6, 2015
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Takahiro Hiruma, Akihiko Enamito, Osamu Nishimura
  • Patent number: 9087507
    Abstract: Computer-based skimming and scrolling of aurally presented information is described. Different levels of skimming are achieved in aural presentations with allowing a user to navigate an aural presentation according to significant points identified within an information source. The significant points are identified using various indicia that suggest logical arrangements for the information contained within the source, such as semantics, syntax, typography, formatting, named entities, and markup tags. The identified significant points signal changes in playback mode for the audio presentation, such as different tones, pitches, volumes, or voices. Similar indicia may be used to generate identifying markers from the information source that can be aurally presented in lieu of the information source itself to allow for aural scrolling of the information.
    Type: Grant
    Filed: November 15, 2006
    Date of Patent: July 21, 2015
    Assignee: Yahoo! Inc.
    Inventor: Srinivasan H. Sengamedu
  • Patent number: 9043213
    Abstract: A speech recognition method including the steps of receiving a speech input from a known speaker of a sequence of observations and determining the likelihood of a sequence of words arising from the sequence of observations using an acoustic model. The acoustic model has a plurality of model parameters describing probability distributions which relate a word or part thereof to an observation and has been trained using first training data and adapted using second training data to said speaker. The speech recognition method also determines the likelihood of a sequence of observations occurring in a given language using a language model and combines the likelihoods determined by the acoustic model and the language model and outputs a sequence of words identified from said speech input signal. The acoustic model is context based for the speaker, the context based information being contained in the model using a plurality of decision trees and the structure of the decision trees is based on second training data.
    Type: Grant
    Filed: January 26, 2011
    Date of Patent: May 26, 2015
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Byung Ha Chun
  • Publication number: 20150127349
    Abstract: A method and system for is disclosed for cross-lingual voice conversion. A speech-to-speech system may include hidden Markov model (HMM) HMM based speech modeling for both recognizing input speech and synthesizing output speech. A cross-lingual HMM may be initially set to an output HMM trained with a voice of an output speaker in an output language. An auxiliary HMM may be trained with a voice of an auxiliary speaker in an input language. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the output HMM to a HMM state of the auxiliary HMM. The HMM states of the cross-lingual HMM may be replaced with the matched states. Transforms may be applied to adapt the cross-lingual HMM to the voices of the auxiliary speaker and of an input speaker. The cross-lingual HMM may be used for speech synthesis.
    Type: Application
    Filed: November 1, 2013
    Publication date: May 7, 2015
    Applicant: Google Inc.
    Inventor: Ioannis Agiomyrgiannakis
  • Publication number: 20150127350
    Abstract: A method and system is disclosed for non-parametric speech conversion. A text-to-speech (TTS) synthesis system may include hidden Markov model (HMM) HMM based speech modeling for both synthesizing output speech. A converted HMM may be initially set to a source HMM trained with a voice of a source speaker. A parametric representation of speech may be extract from speech of a target speaker to generate a set of target-speaker vectors. A matching procedure, carried out under a transform that compensates for speaker differences, may be used to match each HMM state of the source HMM to a target-speaker vector. The HMM states of the converted HMM may be replaced with the matched target-speaker vectors. Transforms may be applied to further adapt the converted HMM to the voice of target speaker. The converted HMM may be used to synthesize speech with voice characteristics of the target speaker.
    Type: Application
    Filed: November 1, 2013
    Publication date: May 7, 2015
    Applicant: Google Inc.
    Inventor: Ioannis Agiomyrgiannakis
  • Patent number: 9026438
    Abstract: A method for detecting barge-in in a speech dialog system comprising determining whether a speech prompt is output by the speech dialog system, and detecting whether speech activity is present in an input signal based on a time-varying sensitivity threshold of a speech activity detector and/or based on speaker information, where the sensitivity threshold is increased if output of a speech prompt is determined and decreased if no output of a speech prompt is determined. If speech activity is detected in the input signal, the speech prompt may be interrupted or faded out. A speech dialog system configured to detect barge-in is also disclosed.
    Type: Grant
    Filed: March 31, 2009
    Date of Patent: May 5, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: Markus Buck, Franz Gerl, Tim Haulick, Tobias Herbig, Gerhard Uwe Schmidt, Matthias Schulz
  • Patent number: 9026445
    Abstract: A system and method to allow an author of an instant message to enable and control the production of audible speech to the recipient of the message. The voice of the author of the message is characterized into parameters compatible with a formative or articulative text-to-speech engine such that upon receipt, the receiving client device can generate audible speech signals from the message text according to the characterization of the author's voice. Alternatively, the author can store samples of his or her actual voice in a server so that, upon transmission of a message by the author to a recipient, the server extracts the samples needed only to synthesize the words in the text message, and delivers those to the receiving client device so that they are used by a client-side concatenative text-to-speech engine to generate audible speech signals having a close likeness to the actual voice of the author.
    Type: Grant
    Filed: March 20, 2013
    Date of Patent: May 5, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: Terry Wade Niemeyer, Liliana Orozco
  • Patent number: 9020812
    Abstract: Disclosed is an audio signal processing method comprising the steps of: receiving an audio signal containing current frame data; generating a first temporary output signal for the current frame when an error occurs in the current frame data, by carrying out frame error concealment with respect to the current frame data a random codebook; generating a parameter by carrying out one or more of short-term prediction, long-term prediction and a fixed codebook search based on the first temporary output signal; and memory updating the parameter for the next frame; wherein the parameter comprises one or more of pitch gain, pitch delay, fixed codebook gain and a fixed codebook.
    Type: Grant
    Filed: November 24, 2010
    Date of Patent: April 28, 2015
    Assignees: LG Electronics Inc., Industry-Academic Cooperation Foundation, Yonsei University
    Inventors: Hye Jeong Jeon, Dae Hwan Kim, Hong Goo Kang, Min Ki Lee, Byung Suk Lee, Gyu Hyeok Jeong
  • Patent number: 9002711
    Abstract: According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.
    Type: Grant
    Filed: December 16, 2010
    Date of Patent: April 7, 2015
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Ryo Morinaka, Takehiko Kagoshima
  • Publication number: 20150095035
    Abstract: A method for speech parameterization and coding of a continuous speech signal. The method comprises dividing said speech signal into a plurality of speech frames, and for each one of the plurality of speech frames, modeling said speech frame by a first harmonic modeling to produce a plurality of harmonic model parameters, reconstructing an estimated frame signal from the plurality of harmonic model parameters, subtracting the estimated frame signal from the speech frame to produce a harmonic model residual, performing at least one second harmonic modeling analysis on the first harmonic model residual to determine at least one set of second harmonic model components, removing the at least one set of second harmonic model components from the first harmonic model residual to produce a harmonically-filtered residual signal, and processing the harmonically-filtered residual signal with analysis by synthesis techniques to produce vectors of codebook indices and corresponding gains.
    Type: Application
    Filed: September 30, 2013
    Publication date: April 2, 2015
    Applicant: International Business Machines Corporation
    Inventor: Slava Shechtman
  • Patent number: 8996384
    Abstract: Embodiments of the invention address the deficiencies of the prior art by providing a method, apparatus, and program product to of converting components of a web page to voice prompts for a user. In some embodiments, the method comprises selectively determining at least one HTML component from a plurality of HTML components of a web page to transform into a voice prompt for a mobile system based upon a voice attribute file associated with the web page. The method further comprises transforming the at least one HTML component into parameterized data suitable for use by the mobile system based upon at least a portion of the voice attribute file associated with the at least one HTML component and transmitting the parameterized data to the mobile system.
    Type: Grant
    Filed: October 30, 2009
    Date of Patent: March 31, 2015
    Assignee: Vocollect, Inc.
    Inventors: Paul M. Funyak, Norman J. Connors, Paul E. Kolonay, Matthew Aaron Nichols
  • Patent number: 8990087
    Abstract: A method for providing text to speech from digital content in an electronic device is described. Digital content including a plurality of words and a pronunciation database is received. Pronunciation instructions are determined for the word using the digital content. Audio or speech is played for the word using the pronunciation instructions. As a result, the method provides text to speech on the electronic device based on the digital content.
    Type: Grant
    Filed: September 30, 2008
    Date of Patent: March 24, 2015
    Assignee: Amazon Technologies, Inc.
    Inventors: John Lattyak, John T. Kim, Robert Wai-Chi Chu, Laurent An Minh Nguyen
  • Patent number: 8977552
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: May 28, 2014
    Date of Patent: March 10, 2015
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 8954328
    Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. Further disclosed are techniques and systems for providing a plurality of characters at least some of the characters having multiple associated moods for use in document narration.
    Type: Grant
    Filed: January 14, 2010
    Date of Patent: February 10, 2015
    Assignee: K-NFB Reading Technology, Inc.
    Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
  • Patent number: 8898062
    Abstract: A strained-rough-voice conversion unit (10) is included in a voice conversion device that can generate a “strained rough” voice produced in a part of a speech when speaking forcefully with excitement, nervousness, anger, or emphasis and thereby richly express vocal expression such as anger, excitement, or an animated or lively way of speaking, using voice quality change. The strained-rough-voice conversion unit (10) includes: a strained phoneme position designation unit (11) designating a phoneme to be uttered as a “strained rough” voice in a speech; and an amplitude modulation unit (14) performing modulation including periodic amplitude fluctuation on a speech waveform.
    Type: Grant
    Filed: January 22, 2008
    Date of Patent: November 25, 2014
    Assignee: Panasonic Intellectual Property Corporation of America
    Inventors: Yumiko Kato, Takahiro Kamai
  • Patent number: 8888494
    Abstract: One or more embodiments present a script to a user in an interactive script environment. A digital representation of a manuscript is analyzed. This digital representation includes a set of roles and a set of information associated with each role in the set of roles. An active role in the set of roles that is associated with a given user is identified based on the analyzing. At least a portion of the manuscript is presented to the given user via a user interface. The portion includes at least a subset of information in the set of information. Information within the set of information that is associated with the active role is presented in a visually different manner than information within the set of information that is associated with a non-active role, which is a role that is associated with a user other than the given user.
    Type: Grant
    Filed: June 27, 2011
    Date of Patent: November 18, 2014
    Inventor: Randall Lee Threewits
  • Patent number: 8868418
    Abstract: Embodiments of the invention provide a communication device and methods for enhancing audio signals. A first audio signal buffer and a second audio signal buffer are acquired. Thereafter, the magnitude spectrum calculated from the Fast Fourier Transform (FFT) of the second audio signal is processed based on the Linear Predictive Coding (LPC) spectrum of the first audio signal to generate an enhanced second audio signal.
    Type: Grant
    Filed: November 20, 2010
    Date of Patent: October 21, 2014
    Inventors: Alon Konchitsky, Sandeep Kulakcherla, Alberto D. Berstein
  • Patent number: 8868431
    Abstract: A recognition dictionary creation device identifies the language of a reading of an inputted text which is a target to be registered and adds a reading with phonemes in the language identified thereby to the target text to be registered, and also converts the reading of the target text to be registered from the phonemes in the language identified thereby to phonemes in a language to be recognized which is handled in voice recognition to create a recognition dictionary in which the converted reading of the target text to be registered is registered.
    Type: Grant
    Filed: February 5, 2010
    Date of Patent: October 21, 2014
    Assignee: Mitsubishi Electric Corporation
    Inventors: Michihiro Yamazaki, Jun Ishii, Yasushi Ishikawa
  • Patent number: 8868423
    Abstract: Systems and methods for controlling access to resources using spoken Completely Automatic Public Turing Tests To Tell Humans And Computers Apart (CAPTCHA) tests are disclosed. In these systems and methods, entities seeking access to resources are required to produce an input utterance that contains at least some audio. That utterance is compared with voice reference data for human and machine entities, and a determination is made as to whether the entity requesting access is a human or a machine. Access is then permitted or refused based on that determination.
    Type: Grant
    Filed: July 11, 2013
    Date of Patent: October 21, 2014
    Assignee: John Nicholas and Kristin Gross Trust
    Inventor: John Nicholas Gross
  • Patent number: 8856008
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: September 18, 2013
    Date of Patent: October 7, 2014
    Assignee: Morphism LLC
    Inventor: James H. Stephens, Jr.
  • Patent number: 8849669
    Abstract: An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and/or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech, including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats.
    Type: Grant
    Filed: April 3, 2013
    Date of Patent: September 30, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Raimo Bakis, Ellen Marie Eide, Roberto Pieraccini, Maria E. Smith, Jie Z. Zeng
  • Patent number: 8831950
    Abstract: Embodiments of the present invention provide a method, system and computer program product for the automated voice enablement of a Web page. In an embodiment of the invention, a method for voice enabling a Web page can include selecting an input field of a Web page for speech input, generating a speech grammar for the input field based upon terms in a core attribute of the input field, receiving speech input for the input field, posting the received speech input and the grammar to an automatic speech recognition (ASR) engine and inserting a textual equivalent to the speech input provided by the ASR engine into a document object model (DOM) for the Web page.
    Type: Grant
    Filed: April 7, 2008
    Date of Patent: September 9, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Victor S. Moore, Wendi L. Nusbickel
  • Patent number: 8825485
    Abstract: A text-to-speech method for use in a plurality of languages, including: inputting text in a selected language; dividing the inputted text into a sequence of acoustic units; converting the sequence of acoustic units to a sequence of speech vectors using an acoustic model, wherein the model has a plurality of model parameters describing probability distributions which relate an acoustic unit to a speech vector; and outputting the sequence of speech vectors as audio in the selected language. A parameter of a predetermined type of each probability distribution in the selected language is expressed as a weighted sum of language independent parameters of the same type. The weighting used is language dependent, such that converting the sequence of acoustic units to a sequence of speech vectors includes retrieving the language dependent weights for the selected language.
    Type: Grant
    Filed: June 10, 2009
    Date of Patent: September 2, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Byung Ha Chun, Sacha Krstulovic
  • Publication number: 20140222421
    Abstract: A speech-synthesizing device includes a hierarchical prosodic module, a prosody-analyzing device, and a prosody-synthesizing unit. The hierarchical prosodic module generates at least a first hierarchical prosodic model. The prosody-analyzing device receives a low-level linguistic feature, a high-level linguistic feature and a first prosodic feature, and generates at least a prosodic tag based on the low-level linguistic feature, the high-level linguistic feature, the first prosodic feature and the first hierarchical prosodic model. The prosody-synthesizing unit synthesizes a second prosodic feature based on the hierarchical prosodic module, the low-level linguistic feature and the prosodic tag.
    Type: Application
    Filed: January 30, 2014
    Publication date: August 7, 2014
    Applicant: National Chiao Tung University
    Inventors: Sin-Horng Chen, Yih-Ru Wang, Chen-Yu Chiang, Chiao-Hua Hsieh
  • Patent number: 8781835
    Abstract: Methods and apparatuses are provided for facilitating speech synthesis. A method may include generating a plurality of input models representing an input by using a statistical model synthesizer to statistically model the input. The method may further include determining a speech unit sequence representing at least a portion of the input by using the input models to influence selection of one or more pre-recorded speech units having parameter representations. The method may additionally include identifying one or more bad units in the unit sequence. The method may also include replacing the identified one or more bad units with one or more parameters generated by the statistical model synthesizer. Corresponding apparatuses are also provided.
    Type: Grant
    Filed: May 2, 2011
    Date of Patent: July 15, 2014
    Assignee: Nokia Corporation
    Inventors: Jani Kristian Nurminen, Hanna Margareeta Silen, Elina Helander
  • Patent number: 8751236
    Abstract: A device may receive a plurality of speech sounds that are indicative of pronunciations of a first linguistic term. The device may determine concatenation features of the plurality of speech sounds. The concatenation features may be indicative of an acoustic transition between a first speech sound and a second speech sound when the first speech sound and the second speech sound are concatenated. The first speech sound may be included in the plurality of speech sounds and the second speech sound may be indicative of a pronunciation of a second linguistic term. The device may cluster the plurality of speech sounds into one or more clusters based on the concatenation features. The device may provide a representative speech sound of the given cluster as the first speech sound when the first speech sound and the second speech sound are concatenated.
    Type: Grant
    Filed: October 23, 2013
    Date of Patent: June 10, 2014
    Assignee: Google Inc.
    Inventors: Javier Gonzalvo Fructuoso, Alexander Gutkin, Ioannis Agiomyrgiannakis
  • Patent number: 8751239
    Abstract: An apparatus for providing text independent voice conversion may include a first voice conversion model and a second voice conversion model. The first voice conversion model may be trained with respect to conversion of training source speech to synthetic speech corresponding to the training source speech. The second voice conversion model may be trained with respect to conversion to training target speech from synthetic speech corresponding to the training target speech. An output of the first voice conversion model may be communicated to the second voice conversion model to process source speech input into the first voice conversion model into target speech corresponding to the source speech as the output of the second voice conversion model.
    Type: Grant
    Filed: October 4, 2007
    Date of Patent: June 10, 2014
    Assignee: Core Wireless Licensing, S.a.r.l.
    Inventors: Jilei Tian, Victor Popa, Jani K. Nurminen
  • Patent number: 8744851
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: August 13, 2013
    Date of Patent: June 3, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair Conkie, Ann K Syrdal
  • Publication number: 20140142946
    Abstract: The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary waveforms, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition.
    Type: Application
    Filed: September 24, 2012
    Publication date: May 22, 2014
    Inventor: Chengjun Julian Chen
  • Patent number: 8731933
    Abstract: A speech synthesizing apparatus includes a selector configured to select a plurality of speech units for synthesizing a speech of a phoneme sequence by referring to speech unit information stored in an information memory. Speech unit waveforms corresponding to the speech units are acquired from a plurality of speech unit waveforms stored in a waveform memory, and the speech is synthesized by utilizing the speech unit waveforms acquired. When acquiring the speech unit waveforms, at least two speech unit waveforms from a continuous region of the waveform memory are copied onto a buffer by one access, wherein a data quantity of the at least two speech unit waveforms is less than or equal to a size of the buffer.
    Type: Grant
    Filed: April 10, 2013
    Date of Patent: May 20, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Takehiko Kagoshima
  • Patent number: 8719030
    Abstract: The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary acoustic waves, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition.
    Type: Grant
    Filed: December 3, 2012
    Date of Patent: May 6, 2014
    Inventor: Chengjun Julian Chen
  • Patent number: 8706497
    Abstract: A synthesis filter 106 synthesizes a plurality of wide-band speech signals by combining wide-band phoneme signals and sound source signals from a speech signal code book 105, and a distortion evaluation unit 107 selects one of the wide-band speech signals with a minimum waveform distortion with respect to an up-sampled narrow-band speech signal output from a sampling conversion unit 101. A first bandpass filter 103 extracts a frequency component outside a narrow-band of the wide-band speech signal and a band synthesis unit 104 combines it with the up-sampled narrow-band speech signal.
    Type: Grant
    Filed: October 22, 2010
    Date of Patent: April 22, 2014
    Assignee: Mitsubishi Electric Corporation
    Inventors: Satoru Furuta, Hirohisa Tasaki
  • Patent number: 8655664
    Abstract: According to an embodiment, a text presentation apparatus presenting text for a speaker to read aloud for voice recording includes: a text storing unit for storing first text; a presenting unit for presenting the first text; a determination unit for determining whether or not the first text needs to be replaced, on the basis of a speaker's input for the first text presented; a preliminary text storing unit for storing preliminary text; a select unit configured to select, if it is determined that the first text needs to be replaced, second text to replace the first text from among the preliminary text, the selecting being performed on the basis of attribute information describing an attribute of the first text and on the basis of at least one of attribute information describing pronunciation of the first text and attribute information describing a stress type of the first text; and a control unit configured to control the presenting unit so that the presenting unit presents the second text.
    Type: Grant
    Filed: August 11, 2011
    Date of Patent: February 18, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Kentaro Tachibana, Gou Hirabayashi, Takehiko Kagoshima
  • Patent number: 8655659
    Abstract: A personalized text-to-speech synthesizing device includes: a personalized speech feature library creator, configured to recognize personalized speech features of a specific speaker by comparing a random speech fragment of the specific speaker with preset keywords, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker and created by the personalized speech feature library creator, thereby to generate and output a speech fragment having pronunciation characteristics of the specific speaker.
    Type: Grant
    Filed: August 12, 2010
    Date of Patent: February 18, 2014
    Assignees: Sony Corporation, Sony Mobile Communications AB
    Inventors: Qingfang Wang, Shouchun He
  • Patent number: 8639511
    Abstract: A robot may include a driving control unit configured to control a driving of a movable unit that is connected movably to a body unit, a voice generating unit configured to generate a voice, and a voice output unit configured to output the voice, which has been generated by the voice generating unit. The voice generating unit may correct the voice, which is generated, based on a bearing of the movable unit, which is controlled by the driving control unit, to the body unit.
    Type: Grant
    Filed: September 14, 2010
    Date of Patent: January 28, 2014
    Assignee: Honda Motor Co., Ltd.
    Inventors: Kazuhiro Nakadai, Takuma Otsuka, Hiroshi Okuno
  • Patent number: 8630857
    Abstract: Disclosed is a speech synthesizing apparatus including a segment selection unit that selects a segment suited to a target segment environment from candidate segments, includes a prosody change amount calculation unit that calculates prosody change amount of each candidate segment based on prosody information of candidate segments and the target segment environment, a selection criterion calculation unit that calculates a selection criterion based on the prosody change amount, a candidate selection unit that narrows down selection candidates based on the prosody change amount and the selection criterion, and an optimum segment search unit than searches for an optimum segment from among the narrowed-down candidate segments.
    Type: Grant
    Filed: February 15, 2008
    Date of Patent: January 14, 2014
    Assignee: NEC Corporation
    Inventors: Masanori Kato, Reishi Kondo, Yasuyuki Mitsui
  • Patent number: 8630971
    Abstract: Systems, devices, and methods for using Multi-Pattern Viterbi Algorithm for joint decoding of multiple patterns are disclosed. An exemplary method may receive a plurality of sets of time-sequential signal observations for each of a number K of signal repetitions. Further, each set of signal observations is associated with a respective dimension of a K-dimensional time grid having time-indexed points. Moreover, at each of a plurality of the time-indexed points, a state cost metric is calculated with a processor for each state in a set of states of a hidden Markov model (HMM). In addition, each state in the set of states and for a given time-indexed point, the state cost metric calculation provides a most-likely predecessor state and a corresponding most-likely predecessor time-indexed point. The exemplary method may also determine a sequence of states using the calculated state cost metrics and determine a corresponding cumulative probability measure for the HMM.
    Type: Grant
    Filed: January 5, 2010
    Date of Patent: January 14, 2014
    Assignee: Indian Institute of Science
    Inventors: Nishanth Ulhas Nair, Thippur Venkatanarasaiah Sreenivas
  • Patent number: 8600753
    Abstract: An arrangement provides for improved synthesis of speech arising from a message text. The arrangement stores prerecorded prompts and speech related characteristics for those prompts. A message is parsed to determine if any message portions have been recorded previously. If so then speech related characteristics for those portions are retrieved. The arrangement generates speech related characteristics for those parties not previously stored. The retrieved and generated characteristics are combined. The combination of characteristics is then used as the input to a speech synthesizer.
    Type: Grant
    Filed: December 30, 2005
    Date of Patent: December 3, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Alistair Conkie
  • Patent number: 8583437
    Abstract: Service architecture for providing to a user terminal of a communications network textual information and relative speech synthesis, the user terminal being provided with a speech synthesis engine and a basic database of speech waveforms includes: a content server for downloading textual information requested by means of a browser application on the user terminal; a context manager for extracting context information from the textual information requested by the user terminal; a context selector for selecting an incremental database of speech waveforms associated with extracted context information and for downloading the incremental database into the user terminal; a database manager on the user terminal for managing the composition of an enlarged database of speech waveforms for the speech synthesis engine including the basic and the incremental databases of speech waveforms.
    Type: Grant
    Filed: May 31, 2005
    Date of Patent: November 12, 2013
    Assignee: Telecom Italia S.p.A.
    Inventors: Alessio Cervone, Ivano Salvatore Collotta, Paolo Coppo, Donato Ettorre, Maurizio Fodrini, Maura Turolla