Transformation Patents (Class 704/269)
-
Patent number: 11240480Abstract: Multiple clients (viewers) are allowed to share their VR spaces for communication with one another. A server-distributed stream including a video stream obtained by encoding a background image is received from a server. A client-transmitted stream including representative image meta information for displaying a representative image of another client is received from another client apparatus. The video stream is decoded to obtain the background image. The image data of the representative image is generated on the basis of the representative image meta information. Display image data is obtained by synthesizing the representative image on the background image.Type: GrantFiled: April 19, 2018Date of Patent: February 1, 2022Assignee: SONY CORPORATIONInventor: Ikuo Tsukagoshi
-
Patent number: 11206374Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating avatars based on physical sensor measurements. One of the methods includes receiving, by a second user device from a video streaming platform system during a video streaming session between a first user device and the second user device, an updated physical sensor measurement of the first user device. An updated graphical representation of an avatar selected by a first user of the first user device is generated by the second user device based on the updated physical sensor measurement of the first user device. The updated graphical representation of the avatar is presented by the second user device on a display device of the second user device during a video streaming session between the first user device and the second user device.Type: GrantFiled: September 11, 2020Date of Patent: December 21, 2021Assignee: Twitter, Inc.Inventor: Tyler Hansen
-
Patent number: 11030407Abstract: A multilingual named-entity recognition system according to an embodiment includes an acquisition unit configured to acquire an annotated sample of a source language and a sample of a target language, a first generation unit configured to generate an annotated named-entity recognition model of the source language by applying Conditional Random Field sequence labeling to the annotated sample of the source language and obtaining an optimum weight for each annotated named entity of the source language, a calculation unit configured to calculate similarity between the annotated sample of the source language and the sample of the target language, and a second generation unit configured to generate a named-entity recognition model of the target language based on the annotated named-entity recognition model of the source language and the similarity.Type: GrantFiled: June 22, 2016Date of Patent: June 8, 2021Assignee: Rakuten, Inc.Inventors: Masato Hagiwara, Ayah Zirikly
-
Patent number: 10785451Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating avatars based on physical sensor measurements. One of the methods includes receiving, by a second user device from a video streaming platform system during a video streaming session between a first user device and the second user device, an updated physical sensor measurement of the first user device. An updated graphical representation of an avatar selected by a first user of the first user device is generated by the second user device based on the updated physical sensor measurement of the first user device. The updated graphical representation of the avatar is presented by the second user device on a display device of the second user device during a video streaming session between the first user device and the second user device.Type: GrantFiled: December 21, 2018Date of Patent: September 22, 2020Assignee: Twitter, Inc.Inventor: Tyler Hansen
-
Patent number: 10643636Abstract: There is provided an information processing apparatus capable of enhancing the possibility of outputting information with granularity desired by a user, the information processing apparatus including: a generation unit configured to generate second text data on a basis of first text data and information regarding a first user's auditory characteristics; and an output unit configured to output output information regarding the second text data. The generation unit controls granularity of the second text data on a basis of the information regarding the first user's auditory characteristics.Type: GrantFiled: May 23, 2016Date of Patent: May 5, 2020Assignee: SONY CORPORATIONInventors: Yuhei Taki, Yoko Ito, Shinichi Kawano
-
Patent number: 10586527Abstract: Creating and deploying a voice from text-to-speech, with such voice being a new language derived from the original phoneset of a known language, and thus being audio of the new language outputted using a single TTS synthesizer. An end product message is determined in an original language n to be outputted as audio n by a text-to-speech engine, wherein the original language n includes an existing phoneset n including one or more phonemes n. Words and phrases of a new language n+1 are recorded, thereby forming audio file n+1. This new audio file is labeled into unique units, thereby defining one or more phonemes n+1. The new phonemes of the new language are added to the phoneset, thereby forming new phoneset n+1, as a result outputting the end product message as an audio n+1 language different from the original language n.Type: GrantFiled: October 25, 2017Date of Patent: March 10, 2020Assignee: Third Pillar, LLCInventors: Patrick Dexter, Kevin Jeffries
-
Patent number: 9818081Abstract: A smart hook system for a store display including a hook configured to hang smart items having a resistor and a capacitor for display in a store. The hook at least one resistive electrical contact configured to come into electrical circuit contact with the resistor of the smart items hanging on the hook, and at least one capacitive electrical contact configured to come into electrical contact with the capacitor of the smart items that are hanging on hook. The smart hook also includes a processor configured to measure the resistance and capacitance of the smart items that are hanging on hook, and determine a quantity of the smart items hanging on the hook and identity of the smart items hanging on the hook based on the measured resistance and capacitance.Type: GrantFiled: January 6, 2015Date of Patent: November 14, 2017Assignee: Verizon Patent and Licensing Inc.Inventors: Mohammad Raheel Khalid, Ji Hoon Kim, Manuel Enrique Caceres, Yuk Lun Li, SM Masudur Rahman
-
Patent number: 9812119Abstract: A voice selection supporting device according to an embodiment of the present invention includes an acceptance unit that accepts input of a text, an analysis knowledge storage unit that stores therein text analysis knowledge to be used for characteristic analysis for the input text, an analysis unit that analyzes a characteristic of the text by referring to the text analysis knowledge, a voice attribute storage unit that stores therein a voice attribute of each voice dictionary, an evaluation unit that evaluates similarity between the voice attribute of the voice dictionary and the characteristic of the text, and a candidate presentation unit that presents, based on the similarity, a candidate for the voice dictionary suitable for the text.Type: GrantFiled: March 10, 2016Date of Patent: November 7, 2017Assignees: KABUSHIKI KAISHA TOSHIBA, TOSHIBA SOLUTIONS CORPORATIONInventors: Masaru Suzuki, Kaoru Hirano
-
Patent number: 9762198Abstract: Disclosed are examples of systems, apparatus, methods and computer-readable storage media for dynamically adjusting thresholds of a compressor. An input audio signal having a number of frequency band components is processed. Time-varying thresholds can be determined. A compressor performs, on each frequency band component, a compression operation having a corresponding time-varying threshold to produce gains. Each gain is applied to a delayed corresponding frequency band component to produce processed band components, which are summed to produce an output signal. In some implementations, a time-varying estimate of a perceived spectrum of the output signal and a time-varying estimate of a distortion spectrum induced by the perceived spectrum estimate are determined, for example, using a distortion audibility model. An audibility measure of the distortion spectrum estimate in the presence of the perceived spectrum estimate can be predicted and used to adjust the time-varying thresholds.Type: GrantFiled: April 14, 2014Date of Patent: September 12, 2017Assignee: Dolby Laboratories Licensing CorporationInventor: Alan J. Seefeldt
-
Patent number: 9009052Abstract: Herein provided is a system for singing synthesis capable of reflecting not only pitch and dynamics changes but also timbre changes of a user's singing. A spectral transform surface generating section 119 temporally concatenates all the spectral transform curves estimated by a second spectral transform curve estimating section 117 to define a spectral transform surface. A synthesized audio signal generating section 121 generates a transform spectral envelope at each instant of time by scaling a reference spectral envelope based on the spectral transform surface. Then, the synthesized audio signal generating section 121 generates an audio signal of a synthesized singing voice reflecting timbre changes of an input singing voice, based on the transform spectral envelope and a fundamental frequency contained in a reference singing voice source data.Type: GrantFiled: July 19, 2011Date of Patent: April 14, 2015Assignee: National Institute of Advanced Industrial Science and TechnologyInventors: Tomoyasu Nakano, Masataka Goto
-
Patent number: 9002711Abstract: According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.Type: GrantFiled: December 16, 2010Date of Patent: April 7, 2015Assignee: Kabushiki Kaisha ToshibaInventors: Ryo Morinaka, Takehiko Kagoshima
-
Patent number: 8990075Abstract: Provided are a method, apparatus, and medium for encoding/decoding a high frequency band signal by using a low frequency band signal corresponding to an audio signal or a speech signal. Accordingly, since the high frequency band signal is encoded and decoded by using the low frequency band signal, encoding and decoding can be carried out with a small data size while avoiding deterioration of sound quality.Type: GrantFiled: July 9, 2012Date of Patent: March 24, 2015Assignee: Samsung Electronics Co., Ltd.Inventors: Eun-mi Oh, Ki-Hyun Choo, Jung-hoo Kim
-
Patent number: 8972260Abstract: In accordance with one embodiment, a method of generating language models for speech recognition includes identifying a plurality of utterances in training data corresponding to speech, generating a frequency count of each utterance in the plurality of utterances, generating a high-frequency plurality of utterances from the plurality of utterances having a frequency that exceeds a predetermined frequency threshold, generating a low-frequency plurality of utterances from the plurality of utterances having a frequency that is below the predetermined frequency threshold, generating a grammar-based language model using the high-frequency plurality of utterances as training data, and generating a statistical language model using the low-frequency plurality of utterances as training data.Type: GrantFiled: April 19, 2012Date of Patent: March 3, 2015Assignee: Robert Bosch GmbHInventors: Fuliang Weng, Zhe Feng, Kui Xu, Lin Zhao
-
Patent number: 8949123Abstract: The voice conversion method of a display apparatus includes: in response to the receipt of a first video frame, detecting one or more entities from the first video frame; in response to the selection of one of the detected entities, storing the selected entity; in response to the selection of one of a plurality of previously-stored voice samples, storing the selected voice sample in connection with the selected entity; and in response to the receipt of a second video frame including the selected entity, changing a voice of the selected entity based on the selected voice sample and outputting the changed voice.Type: GrantFiled: April 11, 2012Date of Patent: February 3, 2015Assignee: Samsung Electronics Co., Ltd.Inventors: Aditi Garg, Kasthuri Jayachand Yadlapalli
-
Patent number: 8892436Abstract: A method of recognizing speech is provided. The method includes the operations of (a) dividing first speech that is input to a speech recognizing apparatus into frames; (b) converting the frames of the first speech into frames of second speech by applying conversion rules to the divided frames, respectively; and (c) recognizing, by the speech recognizing apparatus, the frames of the second speech, wherein (b) comprises converting the frames of the first speech into the frames of the second speech by reflecting at least one frame from among the frames that are previously positioned with respect to a frame of the first speech.Type: GrantFiled: October 19, 2011Date of Patent: November 18, 2014Assignees: Samsung Electronics Co., Ltd., Seoul National University Industry FoundationInventors: Ki-wan Eom, Chang-woo Han, Tae-gyoon Kang, Nam-soo Kim, Doo-hwa Hong, Jae-won Lee, Hyung-joon Lim
-
Patent number: 8856008Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.Type: GrantFiled: September 18, 2013Date of Patent: October 7, 2014Assignee: Morphism LLCInventor: James H. Stephens, Jr.
-
Patent number: 8825483Abstract: A method of audio processing comprises composing one or more transformation profiles for transforming audio characteristics of an audio recording and then generating for the or each transformation profile, a metadata set comprising transformation profile data and location data indicative of where in the recording the transformation profile data is to be applied; the or each metadata set is then stored in association with the corresponding recording. A corresponding method of audio reproduction comprises reading a recording and a meta-data set associated with that recording from storage, applying transformations to the recording data in accordance with the metadata set transformation profile; and then outputting the transformed recording.Type: GrantFiled: October 17, 2007Date of Patent: September 2, 2014Assignee: Sony Computer Entertainment Europe LimitedInventors: Daniele Giuseppe Bardino, Richard James Griffiths
-
Patent number: 8805687Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for unit selection synthesis. The method causes a computing device to add a supplemental phoneset to a speech synthesizer front end having an existing phoneset, modify a unit preselection process based on the supplemental phoneset, preselect units from the supplemental phoneset and the existing phoneset based on the modified unit preselection process, and generate speech based on the preselected units. The supplemental phoneset can be a variation of the existing phoneset, can include a word boundary feature, can include a cluster feature where initial consonant clusters and some word boundaries are marked with diacritics, can include a function word feature which marks units as originating from a function word or a content word, and/or can include a pre-vocalic or post-vocalic feature. The speech synthesizer front end can incorporates the supplemental phoneset as an extra feature.Type: GrantFiled: September 21, 2009Date of Patent: August 12, 2014Assignee: AT&T Intellectual Property I, L.P.Inventors: Alistair D. Conkie, Mark Beutnagel, Yeon-Jun Kim, Ann K. Syrdal
-
Patent number: 8781836Abstract: Broadly speaking, the embodiments disclosed herein describe an apparatus, system, and method that allows a user of a hearing assistance system to perceive consistent human speech. The consistent human speech can be based upon user specific preferences.Type: GrantFiled: February 22, 2011Date of Patent: July 15, 2014Assignee: Apple Inc.Inventors: Edwin W. Foo, Gregory F. Hughes
-
Patent number: 8751239Abstract: An apparatus for providing text independent voice conversion may include a first voice conversion model and a second voice conversion model. The first voice conversion model may be trained with respect to conversion of training source speech to synthetic speech corresponding to the training source speech. The second voice conversion model may be trained with respect to conversion to training target speech from synthetic speech corresponding to the training target speech. An output of the first voice conversion model may be communicated to the second voice conversion model to process source speech input into the first voice conversion model into target speech corresponding to the source speech as the output of the second voice conversion model.Type: GrantFiled: October 4, 2007Date of Patent: June 10, 2014Assignee: Core Wireless Licensing, S.a.r.l.Inventors: Jilei Tian, Victor Popa, Jani K. Nurminen
-
Publication number: 20140156280Abstract: A method of deriving speech synthesis parameters from an audio signal, the method comprising: receiving an input speech signal; estimating the position of glottal closure incidents from said audio signal; deriving a pulsed excitation signal from the position of the glottal closure incidents; segmenting said audio signal on the basis of said glottal closure incidents, to obtain segments of said audio signal; processing the segments of the audio signal to obtain the complex cepstrum and deriving a synthesis filter from said complex cepstrum; reconstructing said speech audio signal to produce a reconstructed speech signal using an excitation model where the pulsed excitation signal is passed through said synthesis filter; comparing said reconstructed speech signal with said input speech signal; and calculating the difference between the reconstructed speech signal and the input speech signal and modifying either the pulsed excitation signal or the complex cepstrum to reduce the difference between the reconstrType: ApplicationFiled: November 26, 2013Publication date: June 5, 2014Applicant: Kabushiki Kaisha ToshibaInventor: Maia Ranniery
-
Patent number: 8744854Abstract: The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary waveforms, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition.Type: GrantFiled: September 24, 2012Date of Patent: June 3, 2014Inventor: Chengjun Julian Chen
-
Patent number: 8744841Abstract: An adaptive time/frequency-based encoding mode determination apparatus including a time domain feature extraction unit to generate a time domain feature by analysis of a time domain signal of an input audio signal, a frequency domain feature extraction unit to generate a frequency domain feature corresponding to each frequency band generated by division of a frequency domain corresponding to a frame of the input audio signal into a plurality of frequency domains, by analysis of a frequency domain signal of the input audio signal, and a mode determination unit to determine any one of a time-based encoding mode and a frequency-based encoding mode, with respect to the each frequency band, by use of the time domain feature and the frequency domain feature.Type: GrantFiled: September 21, 2006Date of Patent: June 3, 2014Assignee: SAMSUNG Electronics Co., Ltd.Inventors: Eun Mi Oh, Ki Hyun Choo, Jung-Hoe Kim, Chang Yong Son
-
Patent number: 8712074Abstract: A method estimates noise power spectral density (PSD) in an input sound signal to generate an output for noise reduction of the input sound signal. The method includes storing frames of a digitized version of the input signal, each frame having a predefined number N2 of samples corresponding to a frame length in time of L2=N2/sampling frequency. It further includes performing a time to frequency transformation, deriving a periodogram comprising an energy content |Y|2 from the corresponding spectrum Y, applying a gain function G(k,m)=f(?s2(km),?w2l (k,m?1), |Y(k,m)|2), to estimate a noise energy level |?|2 in each frequency sample, where ?s2 is the speech PSD and ?w2 the noise PSD. It further includes dividing spectra into a number of sub-bands, and providing a first estimate |{circumflex over (N)}|2 of the noise PSD level in a sub-band and a second, improved estimate |{circumflex over (N)}|2 of the noise PSD level in a subband by applying a bias compensation factor B to the first estimate.Type: GrantFiled: August 31, 2009Date of Patent: April 29, 2014Assignee: Oticon A/SInventors: Richard C. Hendriks, Jesper Jensen, Ulrik Kjems, Richard Heusdens
-
Patent number: 8600753Abstract: An arrangement provides for improved synthesis of speech arising from a message text. The arrangement stores prerecorded prompts and speech related characteristics for those prompts. A message is parsed to determine if any message portions have been recorded previously. If so then speech related characteristics for those portions are retrieved. The arrangement generates speech related characteristics for those parties not previously stored. The retrieved and generated characteristics are combined. The combination of characteristics is then used as the input to a speech synthesizer.Type: GrantFiled: December 30, 2005Date of Patent: December 3, 2013Assignee: AT&T Intellectual Property II, L.P.Inventor: Alistair Conkie
-
Patent number: 8583442Abstract: A method for mimicking the auditory system's response to rhythm of an input signal having a time varying structure comprising the steps of receiving a time varying input signal x(t) to a network of n nonlinear oscillators, each oscillator having a different natural frequency of oscillation and obeying a dynamical equation of the form r . = r ? ( ? + ? 1 ? ? z ? 2 + ? ? ? 2 ? ? z ? 4 1 - ? ? ? z ? 2 ) + c ? ? x ? ( t ) ? cos ? ? ? - r ? ? ? ? ? r 2 - 2 ? ? ? r ? ? cos ? ? ? + 1 ? . = ? + ? 1 ? r 2 + ? ? ? 2 ? r 4 1 - ? ? ? r 2 - c ? ? x ? ( t ) ? sin ? ( ? ) ? ? ? r 2 - 2 ? ? ? r ? ? cos ? ( ? ) + 1 ? ? ? . = - k ? ? x ? ( t ) ? sin ? ? ? ? ? ? r 2 - 2 ? ? ? r ? ? cos ? ? ? + 1 wherein ? represents the response frequency, r is the amplitude of the oscillator and ? is the phase of the oscillator.Type: GrantFiled: January 28, 2011Date of Patent: November 12, 2013Assignees: Circular Logic, LLC, Florida Atlantic University Research CorporationInventor: Edward W. Large
-
Patent number: 8560320Abstract: Speech enhancement based on a psycho-acoustic model is disclosed that is capable of preserving the fidelity of speech while sufficiently suppressing noise including the processing artifact known as “musical noise”.Type: GrantFiled: March 14, 2008Date of Patent: October 15, 2013Assignee: Dolby Laboratories Licensing CorporationInventor: Rongshan Yu
-
Patent number: 8554566Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.Type: GrantFiled: November 29, 2012Date of Patent: October 8, 2013Assignee: Morphism LLCInventor: James H. Stephens, Jr.
-
Patent number: 8527281Abstract: Methods and systems for sculpting synthesized speech using a graphic user interface are disclosed. An operator enters a stream of text that is used to produce a stream of target phonetic-units. The stream of target phonetic-units is then submitted to a unit-selection process to produce a stream of selected phonetic-units, each selected phonetic-unit derived from a database of sample phonetic-units. After the stream of sample phonetic-units is selected, an operator can remove various selected phonetic-units from the stream of selected phonetic-units, prune the sample phonetic-database and edit various cost functions using the graphic user interface. The edited speech information can then be submitted to the unit-selection process to produce a second stream of selected phonetic-units.Type: GrantFiled: June 29, 2012Date of Patent: September 3, 2013Assignee: Nuance Communications, Inc.Inventors: Peter Rutten, Paul A. Taylor
-
Patent number: 8433573Abstract: A prosody modification device includes: a real voice prosody input part that receives real voice prosody information extracted from an utterance of a human; a regular prosody generating part that generates regular prosody information having a regular phoneme boundary that determines a boundary between phonemes and a regular phoneme length of a phoneme by using data representing a regular or statistical phoneme length in an utterance of a human with respect to a section including at least a phoneme or a phoneme string to be modified in the real voice prosody information; and a real voice prosody modification part that resets a real voice phoneme boundary by using the generated regular prosody information so that the real voice phoneme boundary and a real voice phoneme length of the phoneme or the phoneme string to be modified in the real voice prosody information are approximate to an actual phoneme boundary and an actual phoneme length of the utterance of the human, thereby modifying the real voice prosody inType: GrantFiled: February 11, 2008Date of Patent: April 30, 2013Assignee: Fujitsu LimitedInventors: Kentaro Murase, Nobuyuki Katae
-
Patent number: 8433584Abstract: Provided is a multi-channel audio decoding method and apparatus therefor, the method involving decoding filter bank coefficients of a plurality of bands from a bitstream having a predetermined format; performing frequency transformation on the decoded filter bank coefficients of the plurality of bands, with respect to each of the plurality of bands; compensating for a phase of each of the plurality of bands according to a predetermined phase compensation value, and serially band-synthesizing the frequency-transformed coefficients of each of the plurality of phase-compensated bands on a frequency domain; and decoding a multi-channel audio signal from the band-synthesized frequency-transformed coefficients.Type: GrantFiled: January 26, 2010Date of Patent: April 30, 2013Assignee: Samsung Electronics Co., Ltd.Inventors: Hyun-wook Kim, Jong-hoon Jeong, Han-gil Moon
-
Patent number: 8386265Abstract: A computer program product for communicating across channels with emotion preservation includes a computer usable storage medium having computer useable program code embodied therewith, the computer usable program code including: computer usable program code to receive a first language communication comprising text marked up with emotion metadata; computer usable program code to translate the emotion metadata into second language emotion metadata; computer usable program code to translate the text to second language text; computer usable program code to analyze the second language emotion metadata for second language emotion information; and computer usable program code to combine the second language emotion information in first language communication with the second language text.Type: GrantFiled: April 4, 2011Date of Patent: February 26, 2013Assignee: International Business Machines CorporationInventors: Balan Subramanian, Deepa Srinivasan, Mohamad Reza Salahshoor
-
Patent number: 8374873Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.Type: GrantFiled: August 11, 2009Date of Patent: February 12, 2013Assignee: Morphism, LLCInventor: James H. Stephens, Jr.
-
Patent number: 8332225Abstract: Techniques to create and share custom voice fonts are described. An apparatus may include a preprocessing component to receive voice audio data and a corresponding text script from a client and to process the voice audio data to produce prosody labels and a rich script. The apparatus may further include a verification component to automatically verify the voice audio data and the text script. The apparatus may further include a training component to train a custom voice font from the verified voice audio data and rich script and to generate custom voice font data usable by the TTS component. Other embodiments are described and claimed.Type: GrantFiled: June 4, 2009Date of Patent: December 11, 2012Assignee: Microsoft CorporationInventors: Sheng Zhao, Zhi Li, Shenghao Qin, Chiwei Che, Jingyang Xu, Binggong Ding
-
Patent number: 8321222Abstract: A speech synthesis system and method is described. A speech segment database references speech segments having various different speech representational structures. A speech segment selector selects from the speech segment database a sequence of speech segment candidates corresponding to a target text. A speech segment sequencer generates from the speech segment candidates sequenced speech segments corresponding to the target text. A speech segment synthesizer combines the selected sequenced speech segments to produce a synthesized speech signal output corresponding to the target text.Type: GrantFiled: August 14, 2007Date of Patent: November 27, 2012Assignee: Nuance Communications, Inc.Inventors: Vincent Pollet, Andrew Breen
-
Patent number: 8321224Abstract: A text-to-speech system adapted to operate on text in a first language including sections in a second language, includes a grapheme/phoneme transcriptor for converting the sections in the second language into phonemes of the second language; a mapping module configured for mapping at least part of the phonemes of the second language onto sets of phonemes of the first language; and a speech-synthesis module adapted to be fed with a resulting stream of phonemes including the sets of phonemes of the first language resulting from mapping and the stream of phonemes of the first language representative of the text, and to generate a speech signal from the resulting stream of phonemes.Type: GrantFiled: January 10, 2012Date of Patent: November 27, 2012Assignee: Loquendo S.p.A.Inventors: Leonardo Badino, Claudia Barolo, Silvia Quazza
-
Patent number: 8315871Abstract: A rope-jumping algorithm is employed in a Hidden Markov Model based text to speech system to determine start and end models and to modify the start and end models by setting small co-variances. Disordered acoustic parameters due to violation of parameter constraints are avoided through the modification and result in stable line frequency spectrum for the generated speech.Type: GrantFiled: June 4, 2009Date of Patent: November 20, 2012Assignee: Microsoft CorporationInventors: Wenlin Wang, Guoliang Zhang, Jingyang Xu
-
Patent number: 8301279Abstract: A signal processing apparatus subjects an audio signal to musical pitch analysis using different analysis techniques for the higher and lower frequencies. When an audio signal is input, a first extractor extracts a high-frequency signal, and a second extractor extracts a low-frequency signal from the audio signal. A high-frequency processor extracts pitch components from the high-frequency signal by applying the short-time Fourier transform. A low-frequency processor extracts pitch components from the low-frequency signal by dividing the low-frequency signal into a plurality of octave components. A synthesizing unit then combines the pitch components thus extracted from the high-frequency signal and the low-frequency signal and outputs the analysis result.Type: GrantFiled: October 3, 2008Date of Patent: October 30, 2012Assignee: Sony CorporationInventor: Yoshiyuki Kobayashi
-
Patent number: 8253527Abstract: An alarm system and method for warning of emergencies are provided. The method predefines a sign language list, and stores the sign language list in a storage device of a terminal device connected to at least one video camera. The method can control the video camera to capture sign images of a person when the person warns of an emergency using sign language, and combine the sign images to create a combined image. In addition, the method analyzes each of the sign images of the combined image to generate a group of sign numbers according to the sign language list stored in the storage device, generates a sign event according to the group of sign numbers, and responds to the sign event using a corresponding alarm.Type: GrantFiled: January 28, 2010Date of Patent: August 28, 2012Assignee: Hon Hai Precision Industry Co., Ltd.Inventors: Tse Yang, Pi-Jye Tsaur
-
Patent number: 8239193Abstract: Provided are a method, apparatus, and medium for encoding/decoding a high frequency band signal by using a low frequency band signal corresponding to an audio signal or a speech signal. Accordingly, since the high frequency band signal is encoded and decoded by using the low frequency band signal, encoding and decoding can be carried out with a small data size while avoiding deterioration of sound quality.Type: GrantFiled: September 17, 2009Date of Patent: August 7, 2012Assignee: Samsung Electronics Co., Ltd.Inventors: Eun-mi Oh, Ki-hyun Choo, Jung-hoo Kim
-
Patent number: 8224648Abstract: A hybrid approach is described for combining frequency warping and Gaussian Mixture Modeling (GMM) to achieve better speaker identity and speech quality. To train the voice conversion GMM model, line spectral frequency and other features are extracted from a set of source sounds to generate a source feature vector and from a set of target sounds to generate a target feature vector. The GMM model is estimated based on the aligned source feature vector and the target feature vector. A mixture specific warping function is generated each set of mixture mean pairs of the GMM model, and a warping function is generated based on a weighting of each of the mixture specific warping functions. The warping function can be used to convert sounds received from a source speaker to approximate speech of a target speaker.Type: GrantFiled: December 28, 2007Date of Patent: July 17, 2012Assignee: Nokia CorporationInventors: Jilei Tian, Victor Popa, Jani Kristian Nurminen
-
Patent number: 8201074Abstract: A range of unified software authoring tools for creating a talking paper application for integration in an end user platform are described herein. The authoring tools are easy to use and are interoperable to provide an easy and cost-effective method of creating a talking paper application. The authoring tools provide a framework for creating audio content and image content and interactively linking the audio content and the image content. The authoring tools also provide for verifying the interactively linked audio and image content, reviewing the audio content, the image content and the interactive linking on a display device. Finally, the authoring tools provide for saving the audio content, the video content and the interactive linking for publication to a manufacturer for integration in an end user platform or talking paper platform.Type: GrantFiled: October 8, 2008Date of Patent: June 12, 2012Assignee: Microsoft CorporationInventors: Kentaro Toyama, Gerald Chu, Ravin Balakrishnan
-
Patent number: 8190426Abstract: An audio enhancement refines a short-time spectrum. The refinement may reduce overlap between audio sub-bands. The sub-bands are transformed into sub-band short-time spectra. A portion of the spectra are time-delayed. The sub-band short-time spectrum and the time-delayed portion are filtered to obtain a refined sub-band short-time spectrum. The refined spectrum improves audio processing.Type: GrantFiled: November 30, 2007Date of Patent: May 29, 2012Assignee: Nuance Communications, Inc.Inventors: Mohamed Krini, Gerhard Uwe Schmidt
-
Patent number: 8160880Abstract: Techniques for operating a reading machine are disclosed. The techniques include forming an N-dimensional features vector based on features of an image, the features corresponding to characteristics of at least one object depicted in the image, representing the features vector as a point in n-dimensional space, where n corresponds to N, the number of features in the features vector and comparing the point in n-dimensional space to a centroid that represents a cluster of points in the n-dimensional space corresponding to a class of objects to determine whether the point belongs in the class of objects corresponding to the centroid.Type: GrantFiled: April 28, 2008Date of Patent: April 17, 2012Assignee: K-NFB Reading Technology, Inc.Inventors: Paul Albrecht, Rafael Maya Zetune, Lucy Gibson, Raymond C. Kurzweil
-
Patent number: 8155966Abstract: [Problems] To convert a signal of non-audible murmur obtained through an in-vivo conduction microphone into a signal of a speech that is recognizable for (hardly misrecognized by) a receiving person with maximum accuracy.Type: GrantFiled: February 7, 2007Date of Patent: April 10, 2012Assignee: National University Corporation Nara Institute of Science and TechnologyInventors: Tomoki Toda, Mikihiro Nakagiri, Hideki Kashioka, Kiyohiro Shikano
-
Patent number: 8145497Abstract: Provided are a user interface for processing digital data, a method for processing a media interface, and a recording medium thereof. The user interface is used for converting a selected script into voice to generate digital data having a form of a voice file corresponding to the script, or for managing the generated digital data. In the method, the user interface is displayed. The user interface includes at least a text window on which a script to be converted into voice is written, and an icon to be selected for converting the script written on the text window into voice.Type: GrantFiled: July 10, 2008Date of Patent: March 27, 2012Assignee: LG Electronics Inc.Inventors: Tae Hee Ahn, Sung Hun Kim, Dong Hoon Lee
-
Patent number: 8127302Abstract: A method for dynamically arranging DSP tasks. The method comprises receiving an audio bit stream, checking a remaining execution time as the DSP transforms the audio information into spectral information, simplifying the step of transforming the audio information when the DSP detects that the remaining execution time is shorter then a predetermined interval, and skipping one section of the audio information and decoding the remaining section when the execution time is less than a predetermined interval.Type: GrantFiled: January 4, 2011Date of Patent: February 28, 2012Assignee: Mediatek Inc.Inventors: Chih-Chiang Chuang, Pei-Yun Kuo
-
Patent number: 8121841Abstract: A text-to-speech system adapted to operate on text in a first language including sections in a second language, includes a grapheme/phoneme transcriptor for converting the sections in the second language into phonemes of the second language; a mapping module configured for mapping at least part of the phonemes of the second language onto sets of phonemes of the first language; and a speech-synthesis module adapted to be fed with a resulting stream of phonemes including the sets of phonemes of the first language resulting from mapping and the stream of phonemes of the first language representative of the text, and to generate a speech signal from the resulting stream of phonemes.Type: GrantFiled: December 16, 2003Date of Patent: February 21, 2012Assignee: Loquendo S.p.A.Inventors: Leonardo Badino, Claudia Barolo, Silvia Quazza
-
Patent number: 8121831Abstract: Provided are a method, apparatus, and medium for encoding/decoding a high frequency band signal by using a low frequency band signal corresponding to an audio signal or a speech signal. Accordingly, since the high frequency band signal is encoded and decoded by using the low frequency band signal, encoding and decoding can be carried out with a small data size while avoiding deterioration of sound quality.Type: GrantFiled: October 26, 2007Date of Patent: February 21, 2012Assignee: Samsung Electronics Co., Ltd.Inventors: Eun-mi Oh, Ki-hyun Choo, Jung-hoe Kim
-
Patent number: 8095359Abstract: Perceptual audio codecs make use of filter banks and MDCT in order to achieve a compact representation of the audio signal, by removing redundancy and irrelevancy from the original audio signal. During quasi-stationary parts of the audio signal a high frequency resolution of the filter bank is advantageous in order to achieve a high coding gain, but this high frequency resolution is coupled to a coarse temporal resolution that becomes a problem during transient signal parts by producing audible pre-echo effects. The invention achieves improved coding/decoding quality by applying on top of the output of a first filter bank a second non-uniform filter bank, i.e. a cascaded MDCT. The inventive codec uses switching to an additional extension filter bank (or multi-resolution filter bank) in order to re-group the time-frequency representation during transient or fast changing audio signal sections.Type: GrantFiled: June 4, 2008Date of Patent: January 10, 2012Assignee: Thomson LicensingInventors: Johannes Boehm, Sven Kordon