Transformation Patents (Class 704/269)
  • Patent number: 11240480
    Abstract: Multiple clients (viewers) are allowed to share their VR spaces for communication with one another. A server-distributed stream including a video stream obtained by encoding a background image is received from a server. A client-transmitted stream including representative image meta information for displaying a representative image of another client is received from another client apparatus. The video stream is decoded to obtain the background image. The image data of the representative image is generated on the basis of the representative image meta information. Display image data is obtained by synthesizing the representative image on the background image.
    Type: Grant
    Filed: April 19, 2018
    Date of Patent: February 1, 2022
    Assignee: SONY CORPORATION
    Inventor: Ikuo Tsukagoshi
  • Patent number: 11206374
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating avatars based on physical sensor measurements. One of the methods includes receiving, by a second user device from a video streaming platform system during a video streaming session between a first user device and the second user device, an updated physical sensor measurement of the first user device. An updated graphical representation of an avatar selected by a first user of the first user device is generated by the second user device based on the updated physical sensor measurement of the first user device. The updated graphical representation of the avatar is presented by the second user device on a display device of the second user device during a video streaming session between the first user device and the second user device.
    Type: Grant
    Filed: September 11, 2020
    Date of Patent: December 21, 2021
    Assignee: Twitter, Inc.
    Inventor: Tyler Hansen
  • Patent number: 11030407
    Abstract: A multilingual named-entity recognition system according to an embodiment includes an acquisition unit configured to acquire an annotated sample of a source language and a sample of a target language, a first generation unit configured to generate an annotated named-entity recognition model of the source language by applying Conditional Random Field sequence labeling to the annotated sample of the source language and obtaining an optimum weight for each annotated named entity of the source language, a calculation unit configured to calculate similarity between the annotated sample of the source language and the sample of the target language, and a second generation unit configured to generate a named-entity recognition model of the target language based on the annotated named-entity recognition model of the source language and the similarity.
    Type: Grant
    Filed: June 22, 2016
    Date of Patent: June 8, 2021
    Assignee: Rakuten, Inc.
    Inventors: Masato Hagiwara, Ayah Zirikly
  • Patent number: 10785451
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating avatars based on physical sensor measurements. One of the methods includes receiving, by a second user device from a video streaming platform system during a video streaming session between a first user device and the second user device, an updated physical sensor measurement of the first user device. An updated graphical representation of an avatar selected by a first user of the first user device is generated by the second user device based on the updated physical sensor measurement of the first user device. The updated graphical representation of the avatar is presented by the second user device on a display device of the second user device during a video streaming session between the first user device and the second user device.
    Type: Grant
    Filed: December 21, 2018
    Date of Patent: September 22, 2020
    Assignee: Twitter, Inc.
    Inventor: Tyler Hansen
  • Patent number: 10643636
    Abstract: There is provided an information processing apparatus capable of enhancing the possibility of outputting information with granularity desired by a user, the information processing apparatus including: a generation unit configured to generate second text data on a basis of first text data and information regarding a first user's auditory characteristics; and an output unit configured to output output information regarding the second text data. The generation unit controls granularity of the second text data on a basis of the information regarding the first user's auditory characteristics.
    Type: Grant
    Filed: May 23, 2016
    Date of Patent: May 5, 2020
    Assignee: SONY CORPORATION
    Inventors: Yuhei Taki, Yoko Ito, Shinichi Kawano
  • Patent number: 10586527
    Abstract: Creating and deploying a voice from text-to-speech, with such voice being a new language derived from the original phoneset of a known language, and thus being audio of the new language outputted using a single TTS synthesizer. An end product message is determined in an original language n to be outputted as audio n by a text-to-speech engine, wherein the original language n includes an existing phoneset n including one or more phonemes n. Words and phrases of a new language n+1 are recorded, thereby forming audio file n+1. This new audio file is labeled into unique units, thereby defining one or more phonemes n+1. The new phonemes of the new language are added to the phoneset, thereby forming new phoneset n+1, as a result outputting the end product message as an audio n+1 language different from the original language n.
    Type: Grant
    Filed: October 25, 2017
    Date of Patent: March 10, 2020
    Assignee: Third Pillar, LLC
    Inventors: Patrick Dexter, Kevin Jeffries
  • Patent number: 9818081
    Abstract: A smart hook system for a store display including a hook configured to hang smart items having a resistor and a capacitor for display in a store. The hook at least one resistive electrical contact configured to come into electrical circuit contact with the resistor of the smart items hanging on the hook, and at least one capacitive electrical contact configured to come into electrical contact with the capacitor of the smart items that are hanging on hook. The smart hook also includes a processor configured to measure the resistance and capacitance of the smart items that are hanging on hook, and determine a quantity of the smart items hanging on the hook and identity of the smart items hanging on the hook based on the measured resistance and capacitance.
    Type: Grant
    Filed: January 6, 2015
    Date of Patent: November 14, 2017
    Assignee: Verizon Patent and Licensing Inc.
    Inventors: Mohammad Raheel Khalid, Ji Hoon Kim, Manuel Enrique Caceres, Yuk Lun Li, SM Masudur Rahman
  • Patent number: 9812119
    Abstract: A voice selection supporting device according to an embodiment of the present invention includes an acceptance unit that accepts input of a text, an analysis knowledge storage unit that stores therein text analysis knowledge to be used for characteristic analysis for the input text, an analysis unit that analyzes a characteristic of the text by referring to the text analysis knowledge, a voice attribute storage unit that stores therein a voice attribute of each voice dictionary, an evaluation unit that evaluates similarity between the voice attribute of the voice dictionary and the characteristic of the text, and a candidate presentation unit that presents, based on the similarity, a candidate for the voice dictionary suitable for the text.
    Type: Grant
    Filed: March 10, 2016
    Date of Patent: November 7, 2017
    Assignees: KABUSHIKI KAISHA TOSHIBA, TOSHIBA SOLUTIONS CORPORATION
    Inventors: Masaru Suzuki, Kaoru Hirano
  • Patent number: 9762198
    Abstract: Disclosed are examples of systems, apparatus, methods and computer-readable storage media for dynamically adjusting thresholds of a compressor. An input audio signal having a number of frequency band components is processed. Time-varying thresholds can be determined. A compressor performs, on each frequency band component, a compression operation having a corresponding time-varying threshold to produce gains. Each gain is applied to a delayed corresponding frequency band component to produce processed band components, which are summed to produce an output signal. In some implementations, a time-varying estimate of a perceived spectrum of the output signal and a time-varying estimate of a distortion spectrum induced by the perceived spectrum estimate are determined, for example, using a distortion audibility model. An audibility measure of the distortion spectrum estimate in the presence of the perceived spectrum estimate can be predicted and used to adjust the time-varying thresholds.
    Type: Grant
    Filed: April 14, 2014
    Date of Patent: September 12, 2017
    Assignee: Dolby Laboratories Licensing Corporation
    Inventor: Alan J. Seefeldt
  • Patent number: 9009052
    Abstract: Herein provided is a system for singing synthesis capable of reflecting not only pitch and dynamics changes but also timbre changes of a user's singing. A spectral transform surface generating section 119 temporally concatenates all the spectral transform curves estimated by a second spectral transform curve estimating section 117 to define a spectral transform surface. A synthesized audio signal generating section 121 generates a transform spectral envelope at each instant of time by scaling a reference spectral envelope based on the spectral transform surface. Then, the synthesized audio signal generating section 121 generates an audio signal of a synthesized singing voice reflecting timbre changes of an input singing voice, based on the transform spectral envelope and a fundamental frequency contained in a reference singing voice source data.
    Type: Grant
    Filed: July 19, 2011
    Date of Patent: April 14, 2015
    Assignee: National Institute of Advanced Industrial Science and Technology
    Inventors: Tomoyasu Nakano, Masataka Goto
  • Patent number: 9002711
    Abstract: According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.
    Type: Grant
    Filed: December 16, 2010
    Date of Patent: April 7, 2015
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Ryo Morinaka, Takehiko Kagoshima
  • Patent number: 8990075
    Abstract: Provided are a method, apparatus, and medium for encoding/decoding a high frequency band signal by using a low frequency band signal corresponding to an audio signal or a speech signal. Accordingly, since the high frequency band signal is encoded and decoded by using the low frequency band signal, encoding and decoding can be carried out with a small data size while avoiding deterioration of sound quality.
    Type: Grant
    Filed: July 9, 2012
    Date of Patent: March 24, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Eun-mi Oh, Ki-Hyun Choo, Jung-hoo Kim
  • Patent number: 8972260
    Abstract: In accordance with one embodiment, a method of generating language models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a low-frequency plurality of utterances from the plurality of utterances having a frequency that is below the predetermined frequency threshold, generating a grammar-based language model using the high-frequency plurality of utterances as training data, and generating a statistical language model using the low-frequency plurality of utterances as training data.
    Type: Grant
    Filed: April 19, 2012
    Date of Patent: March 3, 2015
    Assignee: Robert Bosch GmbH
    Inventors: Fuliang Weng, Zhe Feng, Kui Xu, Lin Zhao
  • Patent number: 8949123
    Abstract: The voice conversion method of a display apparatus includes: in response to the receipt of a first video frame, detecting one or more entities from the first video frame; in response to the selection of one of the detected entities, storing the selected entity; in response to the selection of one of a plurality of previously-stored voice samples, storing the selected voice sample in connection with the selected entity; and in response to the receipt of a second video frame including the selected entity, changing a voice of the selected entity based on the selected voice sample and outputting the changed voice.
    Type: Grant
    Filed: April 11, 2012
    Date of Patent: February 3, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Aditi Garg, Kasthuri Jayachand Yadlapalli
  • Patent number: 8892436
    Abstract: A method of recognizing speech is provided. The method includes the operations of (a) dividing first speech that is input to a speech recognizing apparatus into frames; (b) converting the frames of the first speech into frames of second speech by applying conversion rules to the divided frames, respectively; and (c) recognizing, by the speech recognizing apparatus, the frames of the second speech, wherein (b) comprises converting the frames of the first speech into the frames of the second speech by reflecting at least one frame from among the frames that are previously positioned with respect to a frame of the first speech.
    Type: Grant
    Filed: October 19, 2011
    Date of Patent: November 18, 2014
    Assignees: Samsung Electronics Co., Ltd., Seoul National University Industry Foundation
    Inventors: Ki-wan Eom, Chang-woo Han, Tae-gyoon Kang, Nam-soo Kim, Doo-hwa Hong, Jae-won Lee, Hyung-joon Lim
  • Patent number: 8856008
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: September 18, 2013
    Date of Patent: October 7, 2014
    Assignee: Morphism LLC
    Inventor: James H. Stephens, Jr.
  • Patent number: 8825483
    Abstract: A method of audio processing comprises composing one or more transformation profiles for transforming audio characteristics of an audio recording and then generating for the or each transformation profile, a metadata set comprising transformation profile data and location data indicative of where in the recording the transformation profile data is to be applied; the or each metadata set is then stored in association with the corresponding recording. A corresponding method of audio reproduction comprises reading a recording and a meta-data set associated with that recording from storage, applying transformations to the recording data in accordance with the metadata set transformation profile; and then outputting the transformed recording.
    Type: Grant
    Filed: October 17, 2007
    Date of Patent: September 2, 2014
    Assignee: Sony Computer Entertainment Europe Limited
    Inventors: Daniele Giuseppe Bardino, Richard James Griffiths
  • Patent number: 8805687
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for unit selection synthesis. The method causes a computing device to add a supplemental phoneset to a speech synthesizer front end having an existing phoneset, modify a unit preselection process based on the supplemental phoneset, preselect units from the supplemental phoneset and the existing phoneset based on the modified unit preselection process, and generate speech based on the preselected units. The supplemental phoneset can be a variation of the existing phoneset, can include a word boundary feature, can include a cluster feature where initial consonant clusters and some word boundaries are marked with diacritics, can include a function word feature which marks units as originating from a function word or a content word, and/or can include a pre-vocalic or post-vocalic feature. The speech synthesizer front end can incorporates the supplemental phoneset as an extra feature.
    Type: Grant
    Filed: September 21, 2009
    Date of Patent: August 12, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Alistair D. Conkie, Mark Beutnagel, Yeon-Jun Kim, Ann K. Syrdal
  • Patent number: 8781836
    Abstract: Broadly speaking, the embodiments disclosed herein describe an apparatus, system, and method that allows a user of a hearing assistance system to perceive consistent human speech. The consistent human speech can be based upon user specific preferences.
    Type: Grant
    Filed: February 22, 2011
    Date of Patent: July 15, 2014
    Assignee: Apple Inc.
    Inventors: Edwin W. Foo, Gregory F. Hughes
  • Patent number: 8751239
    Abstract: An apparatus for providing text independent voice conversion may include a first voice conversion model and a second voice conversion model. The first voice conversion model may be trained with respect to conversion of training source speech to synthetic speech corresponding to the training source speech. The second voice conversion model may be trained with respect to conversion to training target speech from synthetic speech corresponding to the training target speech. An output of the first voice conversion model may be communicated to the second voice conversion model to process source speech input into the first voice conversion model into target speech corresponding to the source speech as the output of the second voice conversion model.
    Type: Grant
    Filed: October 4, 2007
    Date of Patent: June 10, 2014
    Assignee: Core Wireless Licensing, S.a.r.l.
    Inventors: Jilei Tian, Victor Popa, Jani K. Nurminen
  • Publication number: 20140156280
    Abstract: A method of deriving speech synthesis parameters from an audio signal, the method comprising: receiving an input speech signal; estimating the position of glottal closure incidents from said audio signal; deriving a pulsed excitation signal from the position of the glottal closure incidents; segmenting said audio signal on the basis of said glottal closure incidents, to obtain segments of said audio signal; processing the segments of the audio signal to obtain the complex cepstrum and deriving a synthesis filter from said complex cepstrum; reconstructing said speech audio signal to produce a reconstructed speech signal using an excitation model where the pulsed excitation signal is passed through said synthesis filter; comparing said reconstructed speech signal with said input speech signal; and calculating the difference between the reconstructed speech signal and the input speech signal and modifying either the pulsed excitation signal or the complex cepstrum to reduce the difference between the reconstr
    Type: Application
    Filed: November 26, 2013
    Publication date: June 5, 2014
    Applicant: Kabushiki Kaisha Toshiba
    Inventor: Maia Ranniery
  • Patent number: 8744854
    Abstract: The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary waveforms, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition.
    Type: Grant
    Filed: September 24, 2012
    Date of Patent: June 3, 2014
    Inventor: Chengjun Julian Chen
  • Patent number: 8744841
    Abstract: An adaptive time/frequency-based encoding mode determination apparatus including a time domain feature extraction unit to generate a time domain feature by analysis of a time domain signal of an input audio signal, a frequency domain feature extraction unit to generate a frequency domain feature corresponding to each frequency band generated by division of a frequency domain corresponding to a frame of the input audio signal into a plurality of frequency domains, by analysis of a frequency domain signal of the input audio signal, and a mode determination unit to determine any one of a time-based encoding mode and a frequency-based encoding mode, with respect to the each frequency band, by use of the time domain feature and the frequency domain feature.
    Type: Grant
    Filed: September 21, 2006
    Date of Patent: June 3, 2014
    Assignee: SAMSUNG Electronics Co., Ltd.
    Inventors: Eun Mi Oh, Ki Hyun Choo, Jung-Hoe Kim, Chang Yong Son
  • Patent number: 8712074
    Abstract: A method estimates noise power spectral density (PSD) in an input sound signal to generate an output for noise reduction of the input sound signal. The method includes storing frames of a digitized version of the input signal, each frame having a predefined number N2 of samples corresponding to a frame length in time of L2=N2/sampling frequency. It further includes performing a time to frequency transformation, deriving a periodogram comprising an energy content |Y|2 from the corresponding spectrum Y, applying a gain function G(k,m)=f(?s2(km),?w2l (k,m?1), |Y(k,m)|2), to estimate a noise energy level |?|2 in each frequency sample, where ?s2 is the speech PSD and ?w2 the noise PSD. It further includes dividing spectra into a number of sub-bands, and providing a first estimate |{circumflex over (N)}|2 of the noise PSD level in a sub-band and a second, improved estimate |{circumflex over (N)}|2 of the noise PSD level in a subband by applying a bias compensation factor B to the first estimate.
    Type: Grant
    Filed: August 31, 2009
    Date of Patent: April 29, 2014
    Assignee: Oticon A/S
    Inventors: Richard C. Hendriks, Jesper Jensen, Ulrik Kjems, Richard Heusdens
  • Patent number: 8600753
    Abstract: An arrangement provides for improved synthesis of speech arising from a message text. The arrangement stores prerecorded prompts and speech related characteristics for those prompts. A message is parsed to determine if any message portions have been recorded previously. If so then speech related characteristics for those portions are retrieved. The arrangement generates speech related characteristics for those parties not previously stored. The retrieved and generated characteristics are combined. The combination of characteristics is then used as the input to a speech synthesizer.
    Type: Grant
    Filed: December 30, 2005
    Date of Patent: December 3, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Alistair Conkie
  • Patent number: 8583442
    Abstract: A method for mimicking the auditory system's response to rhythm of an input signal having a time varying structure comprising the steps of receiving a time varying input signal x(t) to a network of n nonlinear oscillators, each oscillator having a different natural frequency of oscillation and obeying a dynamical equation of the form r . = r ? ( ? + ? 1 ? ? z ? 2 + ? ? ? 2 ? ? z ? 4 1 - ? ? ? z ? 2 ) + c ? ? x ? ( t ) ? cos ? ? ? - r ? ? ? ? ? r 2 - 2 ? ? ? r ? ? cos ? ? ? + 1 ? . = ? + ? 1 ? r 2 + ? ? ? 2 ? r 4 1 - ? ? ? r 2 - c ? ? x ? ( t ) ? sin ? ( ? ) ? ? ? r 2 - 2 ? ? ? r ? ? cos ? ( ? ) + 1 ? ? ? . = - k ? ? x ? ( t ) ? sin ? ? ? ? ? ? r 2 - 2 ? ? ? r ? ? cos ? ? ? + 1 wherein ? represents the response frequency, r is the amplitude of the oscillator and ? is the phase of the oscillator.
    Type: Grant
    Filed: January 28, 2011
    Date of Patent: November 12, 2013
    Assignees: Circular Logic, LLC, Florida Atlantic University Research Corporation
    Inventor: Edward W. Large
  • Patent number: 8560320
    Abstract: Speech enhancement based on a psycho-acoustic model is disclosed that is capable of preserving the fidelity of speech while sufficiently suppressing noise including the processing artifact known as “musical noise”.
    Type: Grant
    Filed: March 14, 2008
    Date of Patent: October 15, 2013
    Assignee: Dolby Laboratories Licensing Corporation
    Inventor: Rongshan Yu
  • Patent number: 8554566
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: November 29, 2012
    Date of Patent: October 8, 2013
    Assignee: Morphism LLC
    Inventor: James H. Stephens, Jr.
  • Patent number: 8527281
    Abstract: Methods and systems for sculpting synthesized speech using a graphic user interface are disclosed. An operator enters a stream of text that is used to produce a stream of target phonetic-units. The stream of target phonetic-units is then submitted to a unit-selection process to produce a stream of selected phonetic-units, each selected phonetic-unit derived from a database of sample phonetic-units. After the stream of sample phonetic-units is selected, an operator can remove various selected phonetic-units from the stream of selected phonetic-units, prune the sample phonetic-database and edit various cost functions using the graphic user interface. The edited speech information can then be submitted to the unit-selection process to produce a second stream of selected phonetic-units.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: September 3, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Peter Rutten, Paul A. Taylor
  • Patent number: 8433573
    Abstract: A prosody modification device includes: a real voice prosody input part that receives real voice prosody information extracted from an utterance of a human; a regular prosody generating part that generates regular prosody information having a regular phoneme boundary that determines a boundary between phonemes and a regular phoneme length of a phoneme by using data representing a regular or statistical phoneme length in an utterance of a human with respect to a section including at least a phoneme or a phoneme string to be modified in the real voice prosody information; and a real voice prosody modification part that resets a real voice phoneme boundary by using the generated regular prosody information so that the real voice phoneme boundary and a real voice phoneme length of the phoneme or the phoneme string to be modified in the real voice prosody information are approximate to an actual phoneme boundary and an actual phoneme length of the utterance of the human, thereby modifying the real voice prosody in
    Type: Grant
    Filed: February 11, 2008
    Date of Patent: April 30, 2013
    Assignee: Fujitsu Limited
    Inventors: Kentaro Murase, Nobuyuki Katae
  • Patent number: 8433584
    Abstract: Provided is a multi-channel audio decoding method and apparatus therefor, the method involving decoding filter bank coefficients of a plurality of bands from a bitstream having a predetermined format; performing frequency transformation on the decoded filter bank coefficients of the plurality of bands, with respect to each of the plurality of bands; compensating for a phase of each of the plurality of bands according to a predetermined phase compensation value, and serially band-synthesizing the frequency-transformed coefficients of each of the plurality of phase-compensated bands on a frequency domain; and decoding a multi-channel audio signal from the band-synthesized frequency-transformed coefficients.
    Type: Grant
    Filed: January 26, 2010
    Date of Patent: April 30, 2013
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hyun-wook Kim, Jong-hoon Jeong, Han-gil Moon
  • Patent number: 8386265
    Abstract: A computer program product for communicating across channels with emotion preservation includes a computer usable storage medium having computer useable program code embodied therewith, the computer usable program code including: computer usable program code to receive a first language communication comprising text marked up with emotion metadata; computer usable program code to translate the emotion metadata into second language emotion metadata; computer usable program code to translate the text to second language text; computer usable program code to analyze the second language emotion metadata for second language emotion information; and computer usable program code to combine the second language emotion information in first language communication with the second language text.
    Type: Grant
    Filed: April 4, 2011
    Date of Patent: February 26, 2013
    Assignee: International Business Machines Corporation
    Inventors: Balan Subramanian, Deepa Srinivasan, Mohamad Reza Salahshoor
  • Patent number: 8374873
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: August 11, 2009
    Date of Patent: February 12, 2013
    Assignee: Morphism, LLC
    Inventor: James H. Stephens, Jr.
  • Patent number: 8332225
    Abstract: Techniques to create and share custom voice fonts are described. An apparatus may include a preprocessing component to receive voice audio data and a corresponding text script from a client and to process the voice audio data to produce prosody labels and a rich script. The apparatus may further include a verification component to automatically verify the voice audio data and the text script. The apparatus may further include a training component to train a custom voice font from the verified voice audio data and rich script and to generate custom voice font data usable by the TTS component. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 4, 2009
    Date of Patent: December 11, 2012
    Assignee: Microsoft Corporation
    Inventors: Sheng Zhao, Zhi Li, Shenghao Qin, Chiwei Che, Jingyang Xu, Binggong Ding
  • Patent number: 8321222
    Abstract: A speech synthesis system and method is described. A speech segment database references speech segments having various different speech representational structures. A speech segment selector selects from the speech segment database a sequence of speech segment candidates corresponding to a target text. A speech segment sequencer generates from the speech segment candidates sequenced speech segments corresponding to the target text. A speech segment synthesizer combines the selected sequenced speech segments to produce a synthesized speech signal output corresponding to the target text.
    Type: Grant
    Filed: August 14, 2007
    Date of Patent: November 27, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Vincent Pollet, Andrew Breen
  • Patent number: 8321224
    Abstract: A text-to-speech system adapted to operate on text in a first language including sections in a second language, includes a grapheme/phoneme transcriptor for converting the sections in the second language into phonemes of the second language; a mapping module configured for mapping at least part of the phonemes of the second language onto sets of phonemes of the first language; and a speech-synthesis module adapted to be fed with a resulting stream of phonemes including the sets of phonemes of the first language resulting from mapping and the stream of phonemes of the first language representative of the text, and to generate a speech signal from the resulting stream of phonemes.
    Type: Grant
    Filed: January 10, 2012
    Date of Patent: November 27, 2012
    Assignee: Loquendo S.p.A.
    Inventors: Leonardo Badino, Claudia Barolo, Silvia Quazza
  • Patent number: 8315871
    Abstract: A rope-jumping algorithm is employed in a Hidden Markov Model based text to speech system to determine start and end models and to modify the start and end models by setting small co-variances. Disordered acoustic parameters due to violation of parameter constraints are avoided through the modification and result in stable line frequency spectrum for the generated speech.
    Type: Grant
    Filed: June 4, 2009
    Date of Patent: November 20, 2012
    Assignee: Microsoft Corporation
    Inventors: Wenlin Wang, Guoliang Zhang, Jingyang Xu
  • Patent number: 8301279
    Abstract: A signal processing apparatus subjects an audio signal to musical pitch analysis using different analysis techniques for the higher and lower frequencies. When an audio signal is input, a first extractor extracts a high-frequency signal, and a second extractor extracts a low-frequency signal from the audio signal. A high-frequency processor extracts pitch components from the high-frequency signal by applying the short-time Fourier transform. A low-frequency processor extracts pitch components from the low-frequency signal by dividing the low-frequency signal into a plurality of octave components. A synthesizing unit then combines the pitch components thus extracted from the high-frequency signal and the low-frequency signal and outputs the analysis result.
    Type: Grant
    Filed: October 3, 2008
    Date of Patent: October 30, 2012
    Assignee: Sony Corporation
    Inventor: Yoshiyuki Kobayashi
  • Patent number: 8253527
    Abstract: An alarm system and method for warning of emergencies are provided. The method predefines a sign language list, and stores the sign language list in a storage device of a terminal device connected to at least one video camera. The method can control the video camera to capture sign images of a person when the person warns of an emergency using sign language, and combine the sign images to create a combined image. In addition, the method analyzes each of the sign images of the combined image to generate a group of sign numbers according to the sign language list stored in the storage device, generates a sign event according to the group of sign numbers, and responds to the sign event using a corresponding alarm.
    Type: Grant
    Filed: January 28, 2010
    Date of Patent: August 28, 2012
    Assignee: Hon Hai Precision Industry Co., Ltd.
    Inventors: Tse Yang, Pi-Jye Tsaur
  • Patent number: 8239193
    Abstract: Provided are a method, apparatus, and medium for encoding/decoding a high frequency band signal by using a low frequency band signal corresponding to an audio signal or a speech signal. Accordingly, since the high frequency band signal is encoded and decoded by using the low frequency band signal, encoding and decoding can be carried out with a small data size while avoiding deterioration of sound quality.
    Type: Grant
    Filed: September 17, 2009
    Date of Patent: August 7, 2012
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Eun-mi Oh, Ki-hyun Choo, Jung-hoo Kim
  • Patent number: 8224648
    Abstract: A hybrid approach is described for combining frequency warping and Gaussian Mixture Modeling (GMM) to achieve better speaker identity and speech quality. To train the voice conversion GMM model, line spectral frequency and other features are extracted from a set of source sounds to generate a source feature vector and from a set of target sounds to generate a target feature vector. The GMM model is estimated based on the aligned source feature vector and the target feature vector. A mixture specific warping function is generated each set of mixture mean pairs of the GMM model, and a warping function is generated based on a weighting of each of the mixture specific warping functions. The warping function can be used to convert sounds received from a source speaker to approximate speech of a target speaker.
    Type: Grant
    Filed: December 28, 2007
    Date of Patent: July 17, 2012
    Assignee: Nokia Corporation
    Inventors: Jilei Tian, Victor Popa, Jani Kristian Nurminen
  • Patent number: 8201074
    Abstract: A range of unified software authoring tools for creating a talking paper application for integration in an end user platform are described herein. The authoring tools are easy to use and are interoperable to provide an easy and cost-effective method of creating a talking paper application. The authoring tools provide a framework for creating audio content and image content and interactively linking the audio content and the image content. The authoring tools also provide for verifying the interactively linked audio and image content, reviewing the audio content, the image content and the interactive linking on a display device. Finally, the authoring tools provide for saving the audio content, the video content and the interactive linking for publication to a manufacturer for integration in an end user platform or talking paper platform.
    Type: Grant
    Filed: October 8, 2008
    Date of Patent: June 12, 2012
    Assignee: Microsoft Corporation
    Inventors: Kentaro Toyama, Gerald Chu, Ravin Balakrishnan
  • Patent number: 8190426
    Abstract: An audio enhancement refines a short-time spectrum. The refinement may reduce overlap between audio sub-bands. The sub-bands are transformed into sub-band short-time spectra. A portion of the spectra are time-delayed. The sub-band short-time spectrum and the time-delayed portion are filtered to obtain a refined sub-band short-time spectrum. The refined spectrum improves audio processing.
    Type: Grant
    Filed: November 30, 2007
    Date of Patent: May 29, 2012
    Assignee: Nuance Communications, Inc.
    Inventors: Mohamed Krini, Gerhard Uwe Schmidt
  • Patent number: 8160880
    Abstract: Techniques for operating a reading machine are disclosed. The techniques include forming an N-dimensional features vector based on features of an image, the features corresponding to characteristics of at least one object depicted in the image, representing the features vector as a point in n-dimensional space, where n corresponds to N, the number of features in the features vector and comparing the point in n-dimensional space to a centroid that represents a cluster of points in the n-dimensional space corresponding to a class of objects to determine whether the point belongs in the class of objects corresponding to the centroid.
    Type: Grant
    Filed: April 28, 2008
    Date of Patent: April 17, 2012
    Assignee: K-NFB Reading Technology, Inc.
    Inventors: Paul Albrecht, Rafael Maya Zetune, Lucy Gibson, Raymond C. Kurzweil
  • Patent number: 8155966
    Abstract: [Problems] To convert a signal of non-audible murmur obtained through an in-vivo conduction microphone into a signal of a speech that is recognizable for (hardly misrecognized by) a receiving person with maximum accuracy.
    Type: Grant
    Filed: February 7, 2007
    Date of Patent: April 10, 2012
    Assignee: National University Corporation Nara Institute of Science and Technology
    Inventors: Tomoki Toda, Mikihiro Nakagiri, Hideki Kashioka, Kiyohiro Shikano
  • Patent number: 8145497
    Abstract: Provided are a user interface for processing digital data, a method for processing a media interface, and a recording medium thereof. The user interface is used for converting a selected script into voice to generate digital data having a form of a voice file corresponding to the script, or for managing the generated digital data. In the method, the user interface is displayed. The user interface includes at least a text window on which a script to be converted into voice is written, and an icon to be selected for converting the script written on the text window into voice.
    Type: Grant
    Filed: July 10, 2008
    Date of Patent: March 27, 2012
    Assignee: LG Electronics Inc.
    Inventors: Tae Hee Ahn, Sung Hun Kim, Dong Hoon Lee
  • Patent number: 8127302
    Abstract: A method for dynamically arranging DSP tasks. The method comprises receiving an audio bit stream, checking a remaining execution time as the DSP transforms the audio information into spectral information, simplifying the step of transforming the audio information when the DSP detects that the remaining execution time is shorter then a predetermined interval, and skipping one section of the audio information and decoding the remaining section when the execution time is less than a predetermined interval.
    Type: Grant
    Filed: January 4, 2011
    Date of Patent: February 28, 2012
    Assignee: Mediatek Inc.
    Inventors: Chih-Chiang Chuang, Pei-Yun Kuo
  • Patent number: 8121841
    Abstract: A text-to-speech system adapted to operate on text in a first language including sections in a second language, includes a grapheme/phoneme transcriptor for converting the sections in the second language into phonemes of the second language; a mapping module configured for mapping at least part of the phonemes of the second language onto sets of phonemes of the first language; and a speech-synthesis module adapted to be fed with a resulting stream of phonemes including the sets of phonemes of the first language resulting from mapping and the stream of phonemes of the first language representative of the text, and to generate a speech signal from the resulting stream of phonemes.
    Type: Grant
    Filed: December 16, 2003
    Date of Patent: February 21, 2012
    Assignee: Loquendo S.p.A.
    Inventors: Leonardo Badino, Claudia Barolo, Silvia Quazza
  • Patent number: 8121831
    Abstract: Provided are a method, apparatus, and medium for encoding/decoding a high frequency band signal by using a low frequency band signal corresponding to an audio signal or a speech signal. Accordingly, since the high frequency band signal is encoded and decoded by using the low frequency band signal, encoding and decoding can be carried out with a small data size while avoiding deterioration of sound quality.
    Type: Grant
    Filed: October 26, 2007
    Date of Patent: February 21, 2012
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Eun-mi Oh, Ki-hyun Choo, Jung-hoe Kim
  • Patent number: 8095359
    Abstract: Perceptual audio codecs make use of filter banks and MDCT in order to achieve a compact representation of the audio signal, by removing redundancy and irrelevancy from the original audio signal. During quasi-stationary parts of the audio signal a high frequency resolution of the filter bank is advantageous in order to achieve a high coding gain, but this high frequency resolution is coupled to a coarse temporal resolution that becomes a problem during transient signal parts by producing audible pre-echo effects. The invention achieves improved coding/decoding quality by applying on top of the output of a first filter bank a second non-uniform filter bank, i.e. a cascaded MDCT. The inventive codec uses switching to an additional extension filter bank (or multi-resolution filter bank) in order to re-group the time-frequency representation during transient or fast changing audio signal sections.
    Type: Grant
    Filed: June 4, 2008
    Date of Patent: January 10, 2012
    Assignee: Thomson Licensing
    Inventors: Johannes Boehm, Sven Kordon