Transformation Patents (Class 704/269)

Client apparatus, client apparatus processing method, server, and server processing method

Patent number: 11240480

Abstract: Multiple clients (viewers) are allowed to share their VR spaces for communication with one another. A server-distributed stream including a video stream obtained by encoding a background image is received from a server. A client-transmitted stream including representative image meta information for displaying a representative image of another client is received from another client apparatus. The video stream is decoded to obtain the background image. The image data of the representative image is generated on the basis of the representative image meta information. Display image data is obtained by synthesizing the representative image on the background image.

Type: Grant

Filed: April 19, 2018

Date of Patent: February 1, 2022

Assignee: SONY CORPORATION

Inventor: Ikuo Tsukagoshi
Low-bandwidth avatar animation

Patent number: 11206374

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating avatars based on physical sensor measurements. One of the methods includes receiving, by a second user device from a video streaming platform system during a video streaming session between a first user device and the second user device, an updated physical sensor measurement of the first user device. An updated graphical representation of an avatar selected by a first user of the first user device is generated by the second user device based on the updated physical sensor measurement of the first user device. The updated graphical representation of the avatar is presented by the second user device on a display device of the second user device during a video streaming session between the first user device and the second user device.

Type: Grant

Filed: September 11, 2020

Date of Patent: December 21, 2021

Assignee: Twitter, Inc.

Inventor: Tyler Hansen
Computer system, method and program for performing multilingual named entity recognition model transfer

Patent number: 11030407

Abstract: A multilingual named-entity recognition system according to an embodiment includes an acquisition unit configured to acquire an annotated sample of a source language and a sample of a target language, a first generation unit configured to generate an annotated named-entity recognition model of the source language by applying Conditional Random Field sequence labeling to the annotated sample of the source language and obtaining an optimum weight for each annotated named entity of the source language, a calculation unit configured to calculate similarity between the annotated sample of the source language and the sample of the target language, and a second generation unit configured to generate a named-entity recognition model of the target language based on the annotated named-entity recognition model of the source language and the similarity.

Type: Grant

Filed: June 22, 2016

Date of Patent: June 8, 2021

Assignee: Rakuten, Inc.

Inventors: Masato Hagiwara, Ayah Zirikly
Low-bandwidth avatar animation

Patent number: 10785451

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating avatars based on physical sensor measurements. One of the methods includes receiving, by a second user device from a video streaming platform system during a video streaming session between a first user device and the second user device, an updated physical sensor measurement of the first user device. An updated graphical representation of an avatar selected by a first user of the first user device is generated by the second user device based on the updated physical sensor measurement of the first user device. The updated graphical representation of the avatar is presented by the second user device on a display device of the second user device during a video streaming session between the first user device and the second user device.

Type: Grant

Filed: December 21, 2018

Date of Patent: September 22, 2020

Assignee: Twitter, Inc.

Inventor: Tyler Hansen
Information processing apparatus, information processing method, and program

Patent number: 10643636

Abstract: There is provided an information processing apparatus capable of enhancing the possibility of outputting information with granularity desired by a user, the information processing apparatus including: a generation unit configured to generate second text data on a basis of first text data and information regarding a first user's auditory characteristics; and an output unit configured to output output information regarding the second text data. The generation unit controls granularity of the second text data on a basis of the information regarding the first user's auditory characteristics.

Type: Grant

Filed: May 23, 2016

Date of Patent: May 5, 2020

Assignee: SONY CORPORATION

Inventors: Yuhei Taki, Yoko Ito, Shinichi Kawano
Text-to-speech process capable of interspersing recorded words and phrases

Patent number: 10586527

Abstract: Creating and deploying a voice from text-to-speech, with such voice being a new language derived from the original phoneset of a known language, and thus being audio of the new language outputted using a single TTS synthesizer. An end product message is determined in an original language n to be outputted as audio n by a text-to-speech engine, wherein the original language n includes an existing phoneset n including one or more phonemes n. Words and phrases of a new language n+1 are recorded, thereby forming audio file n+1. This new audio file is labeled into unique units, thereby defining one or more phonemes n+1. The new phonemes of the new language are added to the phoneset, thereby forming new phoneset n+1, as a result outputting the end product message as an audio n+1 language different from the original language n.

Type: Grant

Filed: October 25, 2017

Date of Patent: March 10, 2020

Assignee: Third Pillar, LLC

Inventors: Patrick Dexter, Kevin Jeffries
Smart hook for retail inventory tracking

Patent number: 9818081

Abstract: A smart hook system for a store display including a hook configured to hang smart items having a resistor and a capacitor for display in a store. The hook at least one resistive electrical contact configured to come into electrical circuit contact with the resistor of the smart items hanging on the hook, and at least one capacitive electrical contact configured to come into electrical contact with the capacitor of the smart items that are hanging on hook. The smart hook also includes a processor configured to measure the resistance and capacitance of the smart items that are hanging on hook, and determine a quantity of the smart items hanging on the hook and identity of the smart items hanging on the hook based on the measured resistance and capacitance.

Type: Grant

Filed: January 6, 2015

Date of Patent: November 14, 2017

Assignee: Verizon Patent and Licensing Inc.

Inventors: Mohammad Raheel Khalid, Ji Hoon Kim, Manuel Enrique Caceres, Yuk Lun Li, SM Masudur Rahman
Voice selection supporting device, voice selection method, and computer-readable recording medium

Patent number: 9812119

Abstract: A voice selection supporting device according to an embodiment of the present invention includes an acceptance unit that accepts input of a text, an analysis knowledge storage unit that stores therein text analysis knowledge to be used for characteristic analysis for the input text, an analysis unit that analyzes a characteristic of the text by referring to the text analysis knowledge, a voice attribute storage unit that stores therein a voice attribute of each voice dictionary, an evaluation unit that evaluates similarity between the voice attribute of the voice dictionary and the characteristic of the text, and a candidate presentation unit that presents, based on the similarity, a candidate for the voice dictionary suitable for the text.

Type: Grant

Filed: March 10, 2016

Date of Patent: November 7, 2017

Assignees: KABUSHIKI KAISHA TOSHIBA, TOSHIBA SOLUTIONS CORPORATION

Inventors: Masaru Suzuki, Kaoru Hirano
Frequency band compression with dynamic thresholds

Patent number: 9762198

Abstract: Disclosed are examples of systems, apparatus, methods and computer-readable storage media for dynamically adjusting thresholds of a compressor. An input audio signal having a number of frequency band components is processed. Time-varying thresholds can be determined. A compressor performs, on each frequency band component, a compression operation having a corresponding time-varying threshold to produce gains. Each gain is applied to a delayed corresponding frequency band component to produce processed band components, which are summed to produce an output signal. In some implementations, a time-varying estimate of a perceived spectrum of the output signal and a time-varying estimate of a distortion spectrum induced by the perceived spectrum estimate are determined, for example, using a distortion audibility model. An audibility measure of the distortion spectrum estimate in the presence of the perceived spectrum estimate can be predicted and used to adjust the time-varying thresholds.

Type: Grant

Filed: April 14, 2014

Date of Patent: September 12, 2017

Assignee: Dolby Laboratories Licensing Corporation

Inventor: Alan J. Seefeldt
System and method for singing synthesis capable of reflecting voice timbre changes

Patent number: 9009052

Abstract: Herein provided is a system for singing synthesis capable of reflecting not only pitch and dynamics changes but also timbre changes of a user's singing. A spectral transform surface generating section 119 temporally concatenates all the spectral transform curves estimated by a second spectral transform curve estimating section 117 to define a spectral transform surface. A synthesized audio signal generating section 121 generates a transform spectral envelope at each instant of time by scaling a reference spectral envelope based on the spectral transform surface. Then, the synthesized audio signal generating section 121 generates an audio signal of a synthesized singing voice reflecting timbre changes of an input singing voice, based on the transform spectral envelope and a fundamental frequency contained in a reference singing voice source data.

Type: Grant

Filed: July 19, 2011

Date of Patent: April 14, 2015

Assignee: National Institute of Advanced Industrial Science and Technology

Inventors: Tomoyasu Nakano, Masataka Goto
Speech synthesis apparatus and method

Patent number: 9002711

Abstract: According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.

Type: Grant

Filed: December 16, 2010

Date of Patent: April 7, 2015

Assignee: Kabushiki Kaisha Toshiba

Inventors: Ryo Morinaka, Takehiko Kagoshima
Method, apparatus, and medium for bandwidth extension encoding and decoding

Patent number: 8990075

Abstract: Provided are a method, apparatus, and medium for encoding/decoding a high frequency band signal by using a low frequency band signal corresponding to an audio signal or a speech signal. Accordingly, since the high frequency band signal is encoded and decoded by using the low frequency band signal, encoding and decoding can be carried out with a small data size while avoiding deterioration of sound quality.

Type: Grant

Filed: July 9, 2012

Date of Patent: March 24, 2015

Assignee: Samsung Electronics Co., Ltd.

Inventors: Eun-mi Oh, Ki-Hyun Choo, Jung-hoo Kim
Speech recognition using multiple language models

Patent number: 8972260

Abstract: In accordance with one embodiment, a method of generating language models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a low-frequency plurality of utterances from the plurality of utterances having a frequency that is below the predetermined frequency threshold, generating a grammar-based language model using the high-frequency plurality of utterances as training data, and generating a statistical language model using the low-frequency plurality of utterances as training data.

Type: Grant

Filed: April 19, 2012

Date of Patent: March 3, 2015

Assignee: Robert Bosch GmbH

Inventors: Fuliang Weng, Zhe Feng, Kui Xu, Lin Zhao
Display apparatus and voice conversion method thereof

Patent number: 8949123

Abstract: The voice conversion method of a display apparatus includes: in response to the receipt of a first video frame, detecting one or more entities from the first video frame; in response to the selection of one of the detected entities, storing the selected entity; in response to the selection of one of a plurality of previously-stored voice samples, storing the selected voice sample in connection with the selected entity; and in response to the receipt of a second video frame including the selected entity, changing a voice of the selected entity based on the selected voice sample and outputting the changed voice.

Type: Grant

Filed: April 11, 2012

Date of Patent: February 3, 2015

Assignee: Samsung Electronics Co., Ltd.

Inventors: Aditi Garg, Kasthuri Jayachand Yadlapalli
Front-end processor for speech recognition, and speech recognizing apparatus and method using the same

Patent number: 8892436

Abstract: A method of recognizing speech is provided. The method includes the operations of (a) dividing first speech that is input to a speech recognizing apparatus into frames; (b) converting the frames of the first speech into frames of second speech by applying conversion rules to the divided frames, respectively; and (c) recognizing, by the speech recognizing apparatus, the frames of the second speech, wherein (b) comprises converting the frames of the first speech into the frames of the second speech by reflecting at least one frame from among the frames that are previously positioned with respect to a frame of the first speech.

Type: Grant

Filed: October 19, 2011

Date of Patent: November 18, 2014

Assignees: Samsung Electronics Co., Ltd., Seoul National University Industry Foundation

Inventors: Ki-wan Eom, Chang-woo Han, Tae-gyoon Kang, Nam-soo Kim, Doo-hwa Hong, Jae-won Lee, Hyung-joon Lim
Training and applying prosody models

Patent number: 8856008

Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.

Type: Grant

Filed: September 18, 2013

Date of Patent: October 7, 2014

Assignee: Morphism LLC

Inventor: James H. Stephens, Jr.
Apparatus and method for transforming audio characteristics of an audio recording

Patent number: 8825483

Abstract: A method of audio processing comprises composing one or more transformation profiles for transforming audio characteristics of an audio recording and then generating for the or each transformation profile, a metadata set comprising transformation profile data and location data indicative of where in the recording the transformation profile data is to be applied; the or each metadata set is then stored in association with the corresponding recording. A corresponding method of audio reproduction comprises reading a recording and a meta-data set associated with that recording from storage, applying transformations to the recording data in accordance with the metadata set transformation profile; and then outputting the transformed recording.

Type: Grant

Filed: October 17, 2007

Date of Patent: September 2, 2014

Assignee: Sony Computer Entertainment Europe Limited

Inventors: Daniele Giuseppe Bardino, Richard James Griffiths
System and method for generalized preselection for unit selection synthesis

Patent number: 8805687

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for unit selection synthesis. The method causes a computing device to add a supplemental phoneset to a speech synthesizer front end having an existing phoneset, modify a unit preselection process based on the supplemental phoneset, preselect units from the supplemental phoneset and the existing phoneset based on the modified unit preselection process, and generate speech based on the preselected units. The supplemental phoneset can be a variation of the existing phoneset, can include a word boundary feature, can include a cluster feature where initial consonant clusters and some word boundaries are marked with diacritics, can include a function word feature which marks units as originating from a function word or a content word, and/or can include a pre-vocalic or post-vocalic feature. The speech synthesizer front end can incorporates the supplemental phoneset as an extra feature.

Type: Grant

Filed: September 21, 2009

Date of Patent: August 12, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Alistair D. Conkie, Mark Beutnagel, Yeon-Jun Kim, Ann K. Syrdal
Hearing assistance system for providing consistent human speech

Patent number: 8781836

Abstract: Broadly speaking, the embodiments disclosed herein describe an apparatus, system, and method that allows a user of a hearing assistance system to perceive consistent human speech. The consistent human speech can be based upon user specific preferences.

Type: Grant

Filed: February 22, 2011

Date of Patent: July 15, 2014

Assignee: Apple Inc.

Inventors: Edwin W. Foo, Gregory F. Hughes
Method, apparatus and computer program product for providing text independent voice conversion

Patent number: 8751239

Abstract: An apparatus for providing text independent voice conversion may include a first voice conversion model and a second voice conversion model. The first voice conversion model may be trained with respect to conversion of training source speech to synthetic speech corresponding to the training source speech. The second voice conversion model may be trained with respect to conversion to training target speech from synthetic speech corresponding to the training target speech. An output of the first voice conversion model may be communicated to the second voice conversion model to process source speech input into the first voice conversion model into target speech corresponding to the source speech as the output of the second voice conversion model.

Type: Grant

Filed: October 4, 2007

Date of Patent: June 10, 2014

Assignee: Core Wireless Licensing, S.a.r.l.

Inventors: Jilei Tian, Victor Popa, Jani K. Nurminen
SPEECH PROCESSING SYSTEM

Publication number: 20140156280

Abstract: A method of deriving speech synthesis parameters from an audio signal, the method comprising: receiving an input speech signal; estimating the position of glottal closure incidents from said audio signal; deriving a pulsed excitation signal from the position of the glottal closure incidents; segmenting said audio signal on the basis of said glottal closure incidents, to obtain segments of said audio signal; processing the segments of the audio signal to obtain the complex cepstrum and deriving a synthesis filter from said complex cepstrum; reconstructing said speech audio signal to produce a reconstructed speech signal using an excitation model where the pulsed excitation signal is passed through said synthesis filter; comparing said reconstructed speech signal with said input speech signal; and calculating the difference between the reconstructed speech signal and the input speech signal and modifying either the pulsed excitation signal or the complex cepstrum to reduce the difference between the reconstr

Type: Application

Filed: November 26, 2013

Publication date: June 5, 2014

Applicant: Kabushiki Kaisha Toshiba

Inventor: Maia Ranniery
Adaptive time and/or frequency-based encoding mode determination apparatus and method of determining encoding mode of the apparatus

Patent number: 8744841

Abstract: An adaptive time/frequency-based encoding mode determination apparatus including a time domain feature extraction unit to generate a time domain feature by analysis of a time domain signal of an input audio signal, a frequency domain feature extraction unit to generate a frequency domain feature corresponding to each frequency band generated by division of a frequency domain corresponding to a frame of the input audio signal into a plurality of frequency domains, by analysis of a frequency domain signal of the input audio signal, and a mode determination unit to determine any one of a time-based encoding mode and a frequency-based encoding mode, with respect to the each frequency band, by use of the time domain feature and the frequency domain feature.

Type: Grant

Filed: September 21, 2006

Date of Patent: June 3, 2014

Assignee: SAMSUNG Electronics Co., Ltd.

Inventors: Eun Mi Oh, Ki Hyun Choo, Jung-Hoe Kim, Chang Yong Son
System and method for voice transformation

Patent number: 8744854

Abstract: The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary waveforms, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition.

Type: Grant

Filed: September 24, 2012

Date of Patent: June 3, 2014

Inventor: Chengjun Julian Chen
Noise spectrum tracking in noisy acoustical signals

Patent number: 8712074

Abstract: A method estimates noise power spectral density (PSD) in an input sound signal to generate an output for noise reduction of the input sound signal. The method includes storing frames of a digitized version of the input signal, each frame having a predefined number N2 of samples corresponding to a frame length in time of L2=N2/sampling frequency. It further includes performing a time to frequency transformation, deriving a periodogram comprising an energy content |Y|2 from the corresponding spectrum Y, applying a gain function G(k,m)=f(?s2(km),?w2l (k,m?1), |Y(k,m)|2), to estimate a noise energy level |?|2 in each frequency sample, where ?s2 is the speech PSD and ?w2 the noise PSD. It further includes dividing spectra into a number of sub-bands, and providing a first estimate |{circumflex over (N)}|2 of the noise PSD level in a sub-band and a second, improved estimate |{circumflex over (N)}|2 of the noise PSD level in a subband by applying a bias compensation factor B to the first estimate.

Type: Grant

Filed: August 31, 2009

Date of Patent: April 29, 2014

Assignee: Oticon A/S

Inventors: Richard C. Hendriks, Jesper Jensen, Ulrik Kjems, Richard Heusdens
Method and apparatus for combining text to speech and recorded prompts

Patent number: 8600753

Abstract: An arrangement provides for improved synthesis of speech arising from a message text. The arrangement stores prerecorded prompts and speech related characteristics for those prompts. A message is parsed to determine if any message portions have been recorded previously. If so then speech related characteristics for those portions are retrieved. The arrangement generates speech related characteristics for those parties not previously stored. The retrieved and generated characteristics are combined. The combination of characteristics is then used as the input to a speech synthesizer.

Type: Grant

Filed: December 30, 2005

Date of Patent: December 3, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventor: Alistair Conkie
Rhythm processing and frequency tracking in gradient frequency nonlinear oscillator networks

Patent number: 8583442

Abstract: A method for mimicking the auditory system's response to rhythm of an input signal having a time varying structure comprising the steps of receiving a time varying input signal x(t) to a network of n nonlinear oscillators, each oscillator having a different natural frequency of oscillation and obeying a dynamical equation of the form r . = r ? ( ? + ? 1 ? ? z ? 2 + ? ? ? 2 ? ? z ? 4 1 - ? ? ? z ? 2 ) + c ? ? x ? ( t ) ? cos ? ? ? - r ? ? ? ? ? r 2 - 2 ? ? ? r ? ? cos ? ? ? + 1 ? . = ? + ? 1 ? r 2 + ? ? ? 2 ? r 4 1 - ? ? ? r 2 - c ? ? x ? ( t ) ? sin ? ( ? ) ? ? ? r 2 - 2 ? ? ? r ? ? cos ? ( ? ) + 1 ? ? ? . = - k ? ? x ? ( t ) ? sin ? ? ? ? ? ? r 2 - 2 ? ? ? r ? ? cos ? ? ? + 1 wherein ? represents the response frequency, r is the amplitude of the oscillator and ? is the phase of the oscillator.

Type: Grant

Filed: January 28, 2011

Date of Patent: November 12, 2013

Assignees: Circular Logic, LLC, Florida Atlantic University Research Corporation

Inventor: Edward W. Large
Speech enhancement employing a perceptual model

Patent number: 8560320

Abstract: Speech enhancement based on a psycho-acoustic model is disclosed that is capable of preserving the fidelity of speech while sufficiently suppressing noise including the processing artifact known as “musical noise”.

Type: Grant

Filed: March 14, 2008

Date of Patent: October 15, 2013

Assignee: Dolby Laboratories Licensing Corporation

Inventor: Rongshan Yu
Training and applying prosody models

Patent number: 8554566

Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.

Type: Grant

Filed: November 29, 2012

Date of Patent: October 8, 2013

Assignee: Morphism LLC

Inventor: James H. Stephens, Jr.
Method and apparatus for sculpting synthesized speech

Patent number: 8527281

Abstract: Methods and systems for sculpting synthesized speech using a graphic user interface are disclosed. An operator enters a stream of text that is used to produce a stream of target phonetic-units. The stream of target phonetic-units is then submitted to a unit-selection process to produce a stream of selected phonetic-units, each selected phonetic-unit derived from a database of sample phonetic-units. After the stream of sample phonetic-units is selected, an operator can remove various selected phonetic-units from the stream of selected phonetic-units, prune the sample phonetic-database and edit various cost functions using the graphic user interface. The edited speech information can then be submitted to the unit-selection process to produce a second stream of selected phonetic-units.

Type: Grant

Filed: June 29, 2012

Date of Patent: September 3, 2013

Assignee: Nuance Communications, Inc.

Inventors: Peter Rutten, Paul A. Taylor
Prosody modification device, prosody modification method, and recording medium storing prosody modification program

Patent number: 8433573

Abstract: A prosody modification device includes: a real voice prosody input part that receives real voice prosody information extracted from an utterance of a human; a regular prosody generating part that generates regular prosody information having a regular phoneme boundary that determines a boundary between phonemes and a regular phoneme length of a phoneme by using data representing a regular or statistical phoneme length in an utterance of a human with respect to a section including at least a phoneme or a phoneme string to be modified in the real voice prosody information; and a real voice prosody modification part that resets a real voice phoneme boundary by using the generated regular prosody information so that the real voice phoneme boundary and a real voice phoneme length of the phoneme or the phoneme string to be modified in the real voice prosody information are approximate to an actual phoneme boundary and an actual phoneme length of the utterance of the human, thereby modifying the real voice prosody in

Type: Grant

Filed: February 11, 2008

Date of Patent: April 30, 2013

Assignee: Fujitsu Limited

Inventors: Kentaro Murase, Nobuyuki Katae
Multi-channel audio decoding method and apparatus therefor

Patent number: 8433584

Abstract: Provided is a multi-channel audio decoding method and apparatus therefor, the method involving decoding filter bank coefficients of a plurality of bands from a bitstream having a predetermined format; performing frequency transformation on the decoded filter bank coefficients of the plurality of bands, with respect to each of the plurality of bands; compensating for a phase of each of the plurality of bands according to a predetermined phase compensation value, and serially band-synthesizing the frequency-transformed coefficients of each of the plurality of phase-compensated bands on a frequency domain; and decoding a multi-channel audio signal from the band-synthesized frequency-transformed coefficients.

Type: Grant

Filed: January 26, 2010

Date of Patent: April 30, 2013

Assignee: Samsung Electronics Co., Ltd.

Inventors: Hyun-wook Kim, Jong-hoon Jeong, Han-gil Moon
Language translation with emotion metadata

Patent number: 8386265

Abstract: A computer program product for communicating across channels with emotion preservation includes a computer usable storage medium having computer useable program code embodied therewith, the computer usable program code including: computer usable program code to receive a first language communication comprising text marked up with emotion metadata; computer usable program code to translate the emotion metadata into second language emotion metadata; computer usable program code to translate the text to second language text; computer usable program code to analyze the second language emotion metadata for second language emotion information; and computer usable program code to combine the second language emotion information in first language communication with the second language text.

Type: Grant

Filed: April 4, 2011

Date of Patent: February 26, 2013

Assignee: International Business Machines Corporation

Inventors: Balan Subramanian, Deepa Srinivasan, Mohamad Reza Salahshoor
Training and applying prosody models

Patent number: 8374873

Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.

Type: Grant

Filed: August 11, 2009

Date of Patent: February 12, 2013

Assignee: Morphism, LLC

Inventor: James H. Stephens, Jr.
Techniques to create a custom voice font

Patent number: 8332225

Abstract: Techniques to create and share custom voice fonts are described. An apparatus may include a preprocessing component to receive voice audio data and a corresponding text script from a client and to process the voice audio data to produce prosody labels and a rich script. The apparatus may further include a verification component to automatically verify the voice audio data and the text script. The apparatus may further include a training component to train a custom voice font from the verified voice audio data and rich script and to generate custom voice font data usable by the TTS component. Other embodiments are described and claimed.

Type: Grant

Filed: June 4, 2009

Date of Patent: December 11, 2012

Assignee: Microsoft Corporation

Inventors: Sheng Zhao, Zhi Li, Shenghao Qin, Chiwei Che, Jingyang Xu, Binggong Ding
Synthesis by generation and concatenation of multi-form segments

Patent number: 8321222

Abstract: A speech synthesis system and method is described. A speech segment database references speech segments having various different speech representational structures. A speech segment selector selects from the speech segment database a sequence of speech segment candidates corresponding to a target text. A speech segment sequencer generates from the speech segment candidates sequenced speech segments corresponding to the target text. A speech segment synthesizer combines the selected sequenced speech segments to produce a synthesized speech signal output corresponding to the target text.

Type: Grant

Filed: August 14, 2007

Date of Patent: November 27, 2012

Assignee: Nuance Communications, Inc.

Inventors: Vincent Pollet, Andrew Breen
Text-to-speech method and system, computer program product therefor

Patent number: 8321224

Abstract: A text-to-speech system adapted to operate on text in a first language including sections in a second language, includes a grapheme/phoneme transcriptor for converting the sections in the second language into phonemes of the second language; a mapping module configured for mapping at least part of the phonemes of the second language onto sets of phonemes of the first language; and a speech-synthesis module adapted to be fed with a resulting stream of phonemes including the sets of phonemes of the first language resulting from mapping and the stream of phonemes of the first language representative of the text, and to generate a speech signal from the resulting stream of phonemes.

Type: Grant

Filed: January 10, 2012

Date of Patent: November 27, 2012

Assignee: Loquendo S.p.A.

Inventors: Leonardo Badino, Claudia Barolo, Silvia Quazza
Hidden Markov model based text to speech systems employing rope-jumping algorithm

Patent number: 8315871

Abstract: A rope-jumping algorithm is employed in a Hidden Markov Model based text to speech system to determine start and end models and to modify the start and end models by setting small co-variances. Disordered acoustic parameters due to violation of parameter constraints are avoided through the modification and result in stable line frequency spectrum for the generated speech.

Type: Grant

Filed: June 4, 2009

Date of Patent: November 20, 2012

Assignee: Microsoft Corporation

Inventors: Wenlin Wang, Guoliang Zhang, Jingyang Xu
Signal processing apparatus, signal processing method, and program therefor

Patent number: 8301279

Abstract: A signal processing apparatus subjects an audio signal to musical pitch analysis using different analysis techniques for the higher and lower frequencies. When an audio signal is input, a first extractor extracts a high-frequency signal, and a second extractor extracts a low-frequency signal from the audio signal. A high-frequency processor extracts pitch components from the high-frequency signal by applying the short-time Fourier transform. A low-frequency processor extracts pitch components from the low-frequency signal by dividing the low-frequency signal into a plurality of octave components. A synthesizing unit then combines the pitch components thus extracted from the high-frequency signal and the low-frequency signal and outputs the analysis result.

Type: Grant

Filed: October 3, 2008

Date of Patent: October 30, 2012

Assignee: Sony Corporation

Inventor: Yoshiyuki Kobayashi
Alarm system and method for warning of emergencies

Patent number: 8253527

Abstract: An alarm system and method for warning of emergencies are provided. The method predefines a sign language list, and stores the sign language list in a storage device of a terminal device connected to at least one video camera. The method can control the video camera to capture sign images of a person when the person warns of an emergency using sign language, and combine the sign images to create a combined image. In addition, the method analyzes each of the sign images of the combined image to generate a group of sign numbers according to the sign language list stored in the storage device, generates a sign event according to the group of sign numbers, and responds to the sign event using a corresponding alarm.

Type: Grant

Filed: January 28, 2010

Date of Patent: August 28, 2012

Assignee: Hon Hai Precision Industry Co., Ltd.

Inventors: Tse Yang, Pi-Jye Tsaur
Method, apparatus, and medium for bandwidth extension encoding and decoding

Patent number: 8239193

Abstract: Provided are a method, apparatus, and medium for encoding/decoding a high frequency band signal by using a low frequency band signal corresponding to an audio signal or a speech signal. Accordingly, since the high frequency band signal is encoded and decoded by using the low frequency band signal, encoding and decoding can be carried out with a small data size while avoiding deterioration of sound quality.

Type: Grant

Filed: September 17, 2009

Date of Patent: August 7, 2012

Assignee: Samsung Electronics Co., Ltd.

Inventors: Eun-mi Oh, Ki-hyun Choo, Jung-hoo Kim
Hybrid approach in voice conversion

Patent number: 8224648

Abstract: A hybrid approach is described for combining frequency warping and Gaussian Mixture Modeling (GMM) to achieve better speaker identity and speech quality. To train the voice conversion GMM model, line spectral frequency and other features are extracted from a set of source sounds to generate a source feature vector and from a set of target sounds to generate a target feature vector. The GMM model is estimated based on the aligned source feature vector and the target feature vector. A mixture specific warping function is generated each set of mixture mean pairs of the GMM model, and a warping function is generated based on a weighting of each of the mixture specific warping functions. The warping function can be used to convert sounds received from a source speaker to approximate speech of a target speaker.

Type: Grant

Filed: December 28, 2007

Date of Patent: July 17, 2012

Assignee: Nokia Corporation

Inventors: Jilei Tian, Victor Popa, Jani Kristian Nurminen
Talking paper authoring tools

Patent number: 8201074

Abstract: A range of unified software authoring tools for creating a talking paper application for integration in an end user platform are described herein. The authoring tools are easy to use and are interoperable to provide an easy and cost-effective method of creating a talking paper application. The authoring tools provide a framework for creating audio content and image content and interactively linking the audio content and the image content. The authoring tools also provide for verifying the interactively linked audio and image content, reviewing the audio content, the image content and the interactive linking on a display device. Finally, the authoring tools provide for saving the audio content, the video content and the interactive linking for publication to a manufacturer for integration in an end user platform or talking paper platform.

Type: Grant

Filed: October 8, 2008

Date of Patent: June 12, 2012

Assignee: Microsoft Corporation

Inventors: Kentaro Toyama, Gerald Chu, Ravin Balakrishnan
Spectral refinement system

Patent number: 8190426

Abstract: An audio enhancement refines a short-time spectrum. The refinement may reduce overlap between audio sub-bands. The sub-bands are transformed into sub-band short-time spectra. A portion of the spectra are time-delayed. The sub-band short-time spectrum and the time-delayed portion are filtered to obtain a refined sub-band short-time spectrum. The refined spectrum improves audio processing.

Type: Grant

Filed: November 30, 2007

Date of Patent: May 29, 2012

Assignee: Nuance Communications, Inc.

Inventors: Mohamed Krini, Gerhard Uwe Schmidt
Generalized object recognition for portable reading machine

Patent number: 8160880

Abstract: Techniques for operating a reading machine are disclosed. The techniques include forming an N-dimensional features vector based on features of an image, the features corresponding to characteristics of at least one object depicted in the image, representing the features vector as a point in n-dimensional space, where n corresponds to N, the number of features in the features vector and comparing the point in n-dimensional space to a centroid that represents a cluster of points in the n-dimensional space corresponding to a class of objects to determine whether the point belongs in the class of objects corresponding to the centroid.

Type: Grant

Filed: April 28, 2008

Date of Patent: April 17, 2012

Assignee: K-NFB Reading Technology, Inc.

Inventors: Paul Albrecht, Rafael Maya Zetune, Lucy Gibson, Raymond C. Kurzweil
Apparatus and method for producing an audible speech signal from a non-audible speech signal

Patent number: 8155966

Abstract: [Problems] To convert a signal of non-audible murmur obtained through an in-vivo conduction microphone into a signal of a speech that is recognizable for (hardly misrecognized by) a receiving person with maximum accuracy.

Type: Grant

Filed: February 7, 2007

Date of Patent: April 10, 2012

Assignee: National University Corporation Nara Institute of Science and Technology

Inventors: Tomoki Toda, Mikihiro Nakagiri, Hideki Kashioka, Kiyohiro Shikano
Media interface for converting voice to text

Patent number: 8145497

Abstract: Provided are a user interface for processing digital data, a method for processing a media interface, and a recording medium thereof. The user interface is used for converting a selected script into voice to generate digital data having a form of a voice file corresponding to the script, or for managing the generated digital data. In the method, the user interface is displayed. The user interface includes at least a text window on which a script to be converted into voice is written, and an icon to be selected for converting the script written on the text window into voice.

Type: Grant

Filed: July 10, 2008

Date of Patent: March 27, 2012

Assignee: LG Electronics Inc.

Inventors: Tae Hee Ahn, Sung Hun Kim, Dong Hoon Lee
Method for dynamically adjusting audio decoding process

Patent number: 8127302

Abstract: A method for dynamically arranging DSP tasks. The method comprises receiving an audio bit stream, checking a remaining execution time as the DSP transforms the audio information into spectral information, simplifying the step of transforming the audio information when the DSP detects that the remaining execution time is shorter then a predetermined interval, and skipping one section of the audio information and decoding the remaining section when the execution time is less than a predetermined interval.

Type: Grant

Filed: January 4, 2011

Date of Patent: February 28, 2012

Assignee: Mediatek Inc.

Inventors: Chih-Chiang Chuang, Pei-Yun Kuo
Method, apparatus, and medium for bandwidth extension encoding and decoding

Patent number: 8121831

Abstract: Provided are a method, apparatus, and medium for encoding/decoding a high frequency band signal by using a low frequency band signal corresponding to an audio signal or a speech signal. Accordingly, since the high frequency band signal is encoded and decoded by using the low frequency band signal, encoding and decoding can be carried out with a small data size while avoiding deterioration of sound quality.

Type: Grant

Filed: October 26, 2007

Date of Patent: February 21, 2012

Assignee: Samsung Electronics Co., Ltd.

Inventors: Eun-mi Oh, Ki-hyun Choo, Jung-hoe Kim
Text-to-speech method and system, computer program product therefor

Patent number: 8121841

Abstract: A text-to-speech system adapted to operate on text in a first language including sections in a second language, includes a grapheme/phoneme transcriptor for converting the sections in the second language into phonemes of the second language; a mapping module configured for mapping at least part of the phonemes of the second language onto sets of phonemes of the first language; and a speech-synthesis module adapted to be fed with a resulting stream of phonemes including the sets of phonemes of the first language resulting from mapping and the stream of phonemes of the first language representative of the text, and to generate a speech signal from the resulting stream of phonemes.

Type: Grant

Filed: December 16, 2003

Date of Patent: February 21, 2012

Assignee: Loquendo S.p.A.

Inventors: Leonardo Badino, Claudia Barolo, Silvia Quazza
Method and apparatus for encoding and decoding an audio signal using adaptively switched temporal resolution in the spectral domain

Patent number: 8095359

Abstract: Perceptual audio codecs make use of filter banks and MDCT in order to achieve a compact representation of the audio signal, by removing redundancy and irrelevancy from the original audio signal. During quasi-stationary parts of the audio signal a high frequency resolution of the filter bank is advantageous in order to achieve a high coding gain, but this high frequency resolution is coupled to a coarse temporal resolution that becomes a problem during transient signal parts by producing audible pre-echo effects. The invention achieves improved coding/decoding quality by applying on top of the output of a first filter bank a second non-uniform filter bank, i.e. a cascaded MDCT. The inventive codec uses switching to an additional extension filter bank (or multi-resolution filter bank) in order to re-group the time-frequency representation during transient or fast changing audio signal sections.

Type: Grant

Filed: June 4, 2008

Date of Patent: January 10, 2012

Assignee: Thomson Licensing

Inventors: Johannes Boehm, Sven Kordon

1 2 3 next