Frequency Element Patents (Class 704/268)

Speech enhancement method and apparatus, device, and storage medium

Patent number: 12361959

Abstract: A speech enhancement method includes: determining a glottal parameter corresponding to a target speech frame according to a frequency domain representation of the target speech frame; determining a gain corresponding to the target speech frame according to a gain corresponding to a historical speech frame of the target speech frame; determining an excitation signal corresponding to the target speech frame according to the frequency domain representation of the target speech frame; and synthesizing the glottal parameter corresponding to the target speech frame, the gain corresponding to the target speech frame, and the excitation signal corresponding to the target speech frame, to obtain an enhanced speech signal corresponding to the target speech frame.

Type: Grant

Filed: October 31, 2022

Date of Patent: July 15, 2025

Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED

Inventors: Wei Xiao, Yupeng Shi, Meng Wang, Shidong Shang, Zurong Wu
Unsupervised alignment for text to speech synthesis using neural networks

Patent number: 11869483

Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.

Type: Grant

Filed: October 7, 2021

Date of Patent: January 9, 2024

Assignee: Nvidia Corporation

Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
Unsupervised alignment for text to speech synthesis using neural networks

Patent number: 11769481

Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.

Type: Grant

Filed: October 7, 2021

Date of Patent: September 26, 2023

Assignee: Nvidia Corporation

Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
Transformer with gaussian weighted self-attention for speech enhancement

Patent number: 11195541

Abstract: A method and system for providing Gaussian weighted self-attention for speech enhancement are herein provided. According to one embodiment, the method includes receiving a input noise signal, generating a score matrix based on the received input noise signal, and applying a Gaussian weighted function to the generated score matrix.

Type: Grant

Filed: October 2, 2019

Date of Patent: December 7, 2021

Inventors: JaeYoung Kim, Mostafa El-Khamy, Jungwon Lee
Inter-channel encoding and decoding of multiple high-band audio signals

Patent number: 11087771

Abstract: A device includes an encoder and a transmitter. The encoder is configured to generate a first high-band portion of a first signal based on a left signal and a right signal. The encoder is also configured to generate a set of adjustment gain parameters based on a high-band non-reference signal. The high-band non-reference signal corresponds to one of a left high-band portion of the left signal or a right high-band portion of the right signal as a high-band non-reference signal. The transmitter is configured to transmit information corresponding to the first high-band portion of the first signal. The transmitter is also configured to transmit the set of adjustment gain parameters corresponding to the high-band non-reference signal.

Type: Grant

Filed: June 26, 2019

Date of Patent: August 10, 2021

Assignee: QUALCOMM Incorporated

Inventors: Venkatraman Atti, Venkata Subrahmanyam Chandra Sekhar Chebiyyam
Enhancing hybrid self-attention structure with relative-position-aware bias for speech synthesis

Patent number: 11011154

Abstract: A method of performing speech synthesis, includes encoding character embeddings, using any one or any combination of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), applying a relative-position-aware self attention function to each of the character embeddings and an input mel-scale spectrogram, and encoding the character embeddings to which the relative-position-aware self attention function is applied. The method further includes concatenating the encoded character embeddings and the encoded character embeddings to which the relative-position-aware self attention function is applied, to generate an encoder output, applying a multi-head attention function to the encoder output and the input mel-scale spectrogram to which the relative-position-aware self attention function is applied, and predicting an output mel-scale spectrogram, based on the encoder output and the input mel-scale spectrogram to which the multi-head attention function is applied.

Type: Grant

Filed: February 8, 2019

Date of Patent: May 18, 2021

Assignee: TENCENT AMERICA LLC

Inventors: Shan Yang, Heng Lu, Shiyin Kang, Dong Yu
Method and apparatus for enhancing alveolar trill

Patent number: 10127916

Abstract: A method and apparatus for enhancing audio processing between a transmit radio (230) and a receive radio (240) are provided. Digitized audio frames (202) are applied to parallel inputs of both a vocoder encoder (204) and a trill encoder 212 of the transmit radio (230). The vocoder encoder (204) generates voice bits which are communicated over a voice bits channel (206) to a vocoder decoder (208) of the receive radio (240). Trill encoder (212) generates signaling bits which are communicated over a signaling bits channel (214) to a trill decoder (216) of the receive radio (240) for recovery of trill information (218). At the receive radio (240), a decoded audio signal (209) generated from the vocoder decoder (208), and the recovered trill information (218) are both provided as inputs to a trill reconstructor stage (220) to generate a recovered audio signal (222) having a reconstructed trill.

Type: Grant

Filed: April 24, 2014

Date of Patent: November 13, 2018

Assignee: MOTOROLA SOLUTIONS, INC.

Inventors: Yi Gao, Qing Yan
Iterative text-to-speech with user feedback

Patent number: 9978359

Abstract: A text-to-speech (TTS) processing system may be configured for iterative processing. Speech units for unit selection may be tagged according to extra segmental features, such as emotional features, dramatic features, etc. Preliminary TTS results based on input text may be provided to a user through a user interface. The user may offer corrections to the preliminary results. Those corrections may correspond to the extra segmental features. The user corrections may then be input into the TTS system along with the input text to provide refined TTS results. This process may be repeated iteratively to obtain desired TTS results.

Type: Grant

Filed: December 6, 2013

Date of Patent: May 22, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Michal Tadeusz Kaszczuk, Jeffrey Penrod Adams, Adam Franciszek Nadolski
Method and apparatus for obtaining spectrum coefficients for a replacement frame of an audio signal, audio decoder, audio receiver, and system for transmitting audio signals

Patent number: 9916834

Abstract: An approach is described that obtains spectrum coefficients for a replacement frame of an audio signal. A tonal component of a spectrum of an audio signal is detected based on a peak that exists in the spectra of frames preceding a replacement frame. For the tonal component of the spectrum a spectrum coefficients for the peak and its surrounding in the spectrum of the replacement frame is predicted, and for the non-tonal component of the spectrum a non-predicted spectrum coefficient for the replacement frame or a corresponding spectrum coefficient of a frame preceding the replacement frame is used.

Type: Grant

Filed: December 21, 2015

Date of Patent: March 13, 2018

Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.

Inventors: Janine Sukowski, Ralph Sperschneider, Goran Markovic, Wolfgang Jaegers, Christian Helmrich, Bernd Edler, Ralf Geiger
Voice synthesis device, voice synthesis method, and recording medium having a voice synthesis program recorded thereon

Patent number: 9711123

Abstract: Provided is a voice synthesis device, including: a voice synthesis information acquisition unit configured to acquire voice synthesis information for specifying a sound generating character; a replacement unit configured to replace at least a part of sound generating characters specified by the voice synthesis information with an alternative sound generating character different from the sound generating character; and a voice synthesis unit configured to execute a second synthesis process for generating a voice signal of an utterance sound obtained by the replacing.

Type: Grant

Filed: November 6, 2015

Date of Patent: July 18, 2017

Assignee: YAMAHA CORPORATION

Inventor: Motoki Ogasawara
Method and system for comparing performance statistics with respect to location

Patent number: 9664518

Abstract: One embodiment of an invention which computes a location based alignment of two tracks over a set route. Once aligned, a comparison of performance statistics is made at each position along the track. Time and distance gap information is also computed at each position. The results are then displayed in a plot (17) so one can see where different performance statistics changed, including time gap information (19). The data is also linked to a map (8) so one can visualize the locations more clearly. It is also possible to compare multiple tracks (25) to one reference track (23) for greater insight.

Type: Grant

Filed: August 22, 2011

Date of Patent: May 30, 2017

Assignee: Strava, Inc.

Inventor: Paul Mach
Reconstructing an audio signal having a baseband and high frequency components above the baseband

Patent number: 9653085

Abstract: A method and system for reconstructing an original audio signal is disclosed. The original audio signal has a baseband up to a cutoff frequency and high-frequency components not included in the baseband above the cutoff frequency. The system includes a bitstream deformatter that extracts a representation of the baseband, an estimated spectral envelope, and noise-blending parameters from an audio bitstream. The system also includes a spectral component regenerator that copies or translates all or at least some of the baseband spectral components to non-overlapping frequency ranges of the high-frequency components not included in the baseband to generate regenerated spectral components. The system further includes a gain adjuster that modifies a spectral envelope of the regenerated spectral components based at least in part on the estimated spectral envelope and the noise-blending parameters to generate gain-adjusted regenerated spectral components.

Type: Grant

Filed: December 6, 2016

Date of Patent: May 16, 2017

Assignee: Dolby Laboratories Licensing Corporation

Inventors: Michael M. Truman, Mark S. Vinton
Electronic device and voice control method thereof

Patent number: 9437194

Abstract: A voice control method is applied in an electronic device. The electronic device includes a voice input unit, a play unit, and a storage unit storing a conversation database and an association table between different ranges of voice characteristics and styles of response voice. The method includes the following steps. Obtaining voice signals input via the voice input unit. Determining which content is input according to the obtained voice signals. Searching in the conversation database to find a response corresponding to the input content. Analyzing voice characteristics of the obtained voice signals. Comparing the voice characteristics of the obtained voice signals with the pre-stored ranges. Selecting the associated response voice. Finally, outputting the found response using the associated response voice via the play unit.

Type: Grant

Filed: May 21, 2013

Date of Patent: September 6, 2016

Assignees: Fu Tai Hua Industry (Shenzhen) Co., Ltd., HON HAI PRECISION INDUSTRY CO., LTD.

Inventor: Ren-Wen Huang
Segment information generation device, speech synthesis device, speech synthesis method, and speech synthesis program

Patent number: 9401138

Abstract: A segment information generation device includes a waveform cutout unit cuts out a speech waveform from natural speech at a time period not depending on a pitch frequency of the natural speech. A feature parameter extraction unit extracts a feature parameter of a speech waveform from the speech waveform cut out by the waveform cutout unit. A time domain waveform generation unit generates a time domain waveform based on the feature parameter.

Type: Grant

Filed: May 10, 2012

Date of Patent: July 26, 2016

Assignee: NEC Corporation

Inventor: Masanori Kato
System and method for synthesizing human speech using multiple speakers and context

Patent number: 9368104

Abstract: A system and method for realistic speech synthesis which converts text into synthetic human speech with qualities appropriate to the context such as the language and dialect of the speaker, as well as expanding a speaker's phonetic inventory to produce more natural sounding speech.

Type: Grant

Filed: March 15, 2013

Date of Patent: June 14, 2016

Assignee: SRC, INC.

Inventors: David Donald Eller, Steven Brian Morphet, Watson Brent Boyett
Device for receiving interleaved communication signals

Patent number: 9350586

Abstract: A signal decoder in a communication system is for decoding signal elements in a communication signal having interleaved carrier frequencies. The decoder receives antenna signals in a frequency domain, and has a multiplier for multiplying the antenna signals by a complex-valued mathematical sequence such as the Zadoff-Chu sequence, to generate multiplied antenna signals. An inverse frequency to time converter converts the multiplied antenna signals to time domain signals. A signal quality detector detects a signal quality from the time domain signals based on a subset of the carrier frequencies. The complex-valued mathematical sequence is provided with zero values corresponding to carrier frequencies that are not included in the subset, and the inverse frequency to time converter has a transform size corresponding to the multiplied antenna signals including all carrier frequencies.

Type: Grant

Filed: August 25, 2014

Date of Patent: May 24, 2016

Assignee: FREESCALE SEMICONDUCTOR, INC.

Inventor: Vincent Pierre Martinez
Method of generating speech from text in a client/server architecture

Patent number: 9286885

Abstract: In a method of generating speech from text the speech segments necessary to put together the text to be output as speech by a terminal are determined; it is checked, which speech segments are already present in the terminal and which ones need to be transmitted from a server to the terminal; the segments to be transmitted to the terminal are indexed; the speech segments and the indices of segments to be output at the terminal are transmitted; an index sequence of speech segments to be put together to form the speech to be output is transmitted; and the segments are concatenated according to the index sequence. This method allows to realize a distributed speech synthesis system requiring only a low transmission capacity, a small memory and low computational power in the terminal.

Type: Grant

Filed: April 6, 2004

Date of Patent: March 15, 2016

Assignee: Alcatel Lucent

Inventors: Jürgen Sienel, Dieter Kopp
Speech synthesis system, speech synthesis program product, and speech synthesis method

Patent number: 9275631

Abstract: Waveform concatenation speech synthesis with high sound quality. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. An accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values.

Type: Grant

Filed: December 31, 2012

Date of Patent: March 1, 2016

Assignee: Nuance Communications, Inc.

Inventors: Ryuki Tachibana, Masafumi Nishimura
Training and applying prosody models

Patent number: 9070365

Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.

Type: Grant

Filed: September 10, 2014

Date of Patent: June 30, 2015

Assignee: Morphism LLC

Inventor: James H. Stephens, Jr.
Speech synthesis with fuzzy heteronym prediction using decision trees

Patent number: 9058811

Abstract: According to one embodiment, a method, apparatus for synthesizing speech, and a method for training acoustic model used in speech synthesis is provided. The method for synthesizing speech may include determining data generated by text analysis as fuzzy heteronym data, performing fuzzy heteronym prediction on the fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and probabilities thereof, generating fuzzy context feature labels based on the plurality of candidate pronunciations and probabilities thereof, determining model parameters for the fuzzy context feature labels based on acoustic model with fuzzy decision tree, generating speech parameters from the model parameters, and synthesizing the speech parameters via synthesizer as speech.

Type: Grant

Filed: February 22, 2012

Date of Patent: June 16, 2015

Assignee: Kabushiki Kaisha Toshiba

Inventors: Xi Wang, Xiaoyan Lou, Jian Li
Method for identifying speech and music components of a sound signal

Patent number: 9026440

Abstract: The present invention relates to means and methods of automated difference recognition between speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music.

Type: Grant

Filed: March 21, 2014

Date of Patent: May 5, 2015

Inventor: Alon Konchitsky
GAIN SHAPE ESTIMATION FOR IMPROVED TRACKING OF HIGH-BAND TEMPORAL CHARACTERISTICS

Publication number: 20150106102

Abstract: A method includes determining, at a speech encoder, first gain shape parameters based on a harmonically extended signal and/or based on a high-band residual signal associated with a high-band portion of an audio signal. The method also includes determining second gain shape parameters based on a synthesized high-band signal and based on the high-band portion of the audio signal. The method further includes inserting the first gain parameters and the second gain shape parameters into an encoded version of the audio signal to enable gain adjustment during reproduction of the audio signal from the encoded version of the audio signal.

Type: Application

Filed: October 7, 2014

Publication date: April 16, 2015

Inventors: Venkata Subrahmanyam Chandra Sekhar Chebiyyam, Venkatraman S. Atti
System and method for singing synthesis capable of reflecting voice timbre changes

Patent number: 9009052

Abstract: Herein provided is a system for singing synthesis capable of reflecting not only pitch and dynamics changes but also timbre changes of a user's singing. A spectral transform surface generating section 119 temporally concatenates all the spectral transform curves estimated by a second spectral transform curve estimating section 117 to define a spectral transform surface. A synthesized audio signal generating section 121 generates a transform spectral envelope at each instant of time by scaling a reference spectral envelope based on the spectral transform surface. Then, the synthesized audio signal generating section 121 generates an audio signal of a synthesized singing voice reflecting timbre changes of an input singing voice, based on the transform spectral envelope and a fundamental frequency contained in a reference singing voice source data.

Type: Grant

Filed: July 19, 2011

Date of Patent: April 14, 2015

Assignee: National Institute of Advanced Industrial Science and Technology

Inventors: Tomoyasu Nakano, Masataka Goto
Method and system for enhancing a speech database

Patent number: 8977552

Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

Type: Grant

Filed: May 28, 2014

Date of Patent: March 10, 2015

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Alistair D. Conkie, Ann K. Syrdal
METHOD AND SYSTEM FOR TEMPLATE-BASED PERSONALIZED SINGING SYNTHESIS

Publication number: 20150025892

Abstract: A system and method for speech-to-singing synthesis is provided. The method includes deriving characteristics of a singing voice for a first individual and modifying vocal characteristics of a voice for a second individual in response to the characteristics of the singing voice of the first individual to generate a synthesized singing voice for the second individual.

Type: Application

Filed: March 6, 2013

Publication date: January 22, 2015

Applicant: Agency for Science, Technology and Research

Inventors: Siu Wa Lee, Ling Cen, Haizhou Li, Yaozhu Paul Chan, Minghui Dong
Method and apparatus for wind noise detection and suppression using multiple microphones

Patent number: 8924204

Abstract: Unlike sound based pressure waves that go everywhere, air turbulence caused by wind is usually a fairly local event. Therefore, in a system that utilizes two or more spatially separated microphones to pick up sound signals (e.g., speech), wind noise picked up by one of the microphones often will not be picked up (or at least not to the same extent) by the other microphone(s). Embodiments of methods and apparatuses that utilize this fact and others to effectively detect and suppress wind noise using multiple microphones that are spatially separated are described.

Type: Grant

Filed: September 30, 2011

Date of Patent: December 30, 2014

Assignee: Broadcom Corporation

Inventors: Juin-Hwey Chen, Jes Thyssen, Xianxian Zhang, Huaiyu Zeng
Enhanced interface for use with speech recognition

Patent number: 8909538

Abstract: Improved methods of presenting speech prompts to a user as part of an automated system that employs speech recognition or other voice input are described. The invention improves the user interface by providing in combination with at least one user prompt seeking a voice response, an enhanced user keyword prompt intended to facilitate the user selecting a keyword to speak in response to the user prompt. The enhanced keyword prompts may be the same words as those a user can speak as a reply to the user prompt but presented using a different audio presentation method, e.g., speech rate, audio level, or speaker voice, than used for the user prompt. In some cases, the user keyword prompts are different words from the expected user response keywords, or portions of words, e.g., truncated versions of keywords.

Type: Grant

Filed: November 11, 2013

Date of Patent: December 9, 2014

Assignee: Verizon Patent and Licensing Inc.

Inventor: James Mark Kondziela
Strained-rough-voice conversion device, voice conversion device, voice synthesis device, voice conversion method, voice synthesis method, and program

Patent number: 8898062

Abstract: A strained-rough-voice conversion unit (10) is included in a voice conversion device that can generate a “strained rough” voice produced in a part of a speech when speaking forcefully with excitement, nervousness, anger, or emphasis and thereby richly express vocal expression such as anger, excitement, or an animated or lively way of speaking, using voice quality change. The strained-rough-voice conversion unit (10) includes: a strained phoneme position designation unit (11) designating a phoneme to be uttered as a “strained rough” voice in a speech; and an amplitude modulation unit (14) performing modulation including periodic amplitude fluctuation on a speech waveform.

Type: Grant

Filed: January 22, 2008

Date of Patent: November 25, 2014

Assignee: Panasonic Intellectual Property Corporation of America

Inventors: Yumiko Kato, Takahiro Kamai
Encoding apparatus, decoding apparatus and methods thereof

Patent number: 8898057

Abstract: Disclosed is an encoding apparatus that can efficiently encode a signal that is a broad or extra-broad band signal or the like, thereby improving the quality of a decoded signal. This encoding apparatus includes a band establishing unit (301) that generate, based on the characteristic of the input signal, band establishment information to be used for dividing the band of the input signal to establish a first band part of lower frequency side and a second band part of higher frequency side; a lower frequency encoding unit (302) for encoding, based on the band establishment information, the input signal of the first band part to generate encoded lower frequency part information; and a higher frequency encoding unit (303) for encoding, based on the band establishment information, the input signal of the second band part to generate encoded higher frequency part information.

Type: Grant

Filed: October 22, 2010

Date of Patent: November 25, 2014

Assignee: Panasonic Intellectual Property Corporation of America

Inventor: Tomofumi Yamanashi
Prosody generation using syllable-centered polynomial representation of pitch contours

Patent number: 8886539

Abstract: The present invention discloses a parametrical representation of prosody based on polynomial expansion coefficients of the pitch contour near the center of each syllable. The said syllable pitch expansion coefficients are generated from a recorded speech database, read from a number of sentences by a reference speaker. By correlating the stress level and context information of each syllable in the text with the polynomial expansion coefficients of the corresponding spoken syllable, a correlation database is formed. To generate prosody for an input text, stress level and context information of each syllable in the text is identified. The prosody is generated by using the said correlation database to find the best set of pitch parameters for each syllable. By adding to global pitch contours and using interpolation formulas, complete pitch contour for the input text is generated. Duration and intensity profile are generated using a similar procedure.

Type: Grant

Filed: March 17, 2014

Date of Patent: November 11, 2014

Inventor: Chengjun Julian Chen
Systems and methods for text-to-speech synthesis using spoken example

Patent number: 8886538

Abstract: Systems and methods for speech synthesis and, in particular, text-to-speech systems and methods for converting a text input to a synthetic waveform by processing prosodic and phonetic content of a spoken example of the text input to accurately mimic the input speech style and pronunciation. Systems and methods provide an interface to a TTS system to allow a user to input a text string and a spoken utterance of the text string, extract prosodic parameters from the spoken input, and process the prosodic parameters to derive corresponding markup for the text input to enable a more natural sounding synthesized speech.

Type: Grant

Filed: September 26, 2003

Date of Patent: November 11, 2014

Assignee: Nuance Communications, Inc.

Inventors: Andy Aaron, Raimo Bakis, Ellen M. Eide, Wael M. Hamza
Storing a representative speech unit waveform for speech synthesis based on searching for similar speech units

Patent number: 8868422

Abstract: According to one embodiment, a method for editing speech is disclosed. The method can generate speech information from a text. The speech information includes phonologic information and prosody information. The method can divide the speech information into a plurality of speech units, based on at least one of the phonologic information and the prosody information. The method can search at least two speech units from the plurality of speech units. At least one of the phonologic information and the prosody information in the at least two speech units are identical or similar. In addition, the method can store a speech unit waveform corresponding to one of the at least two speech units as a representative speech unit into a memory.

Type: Grant

Filed: September 13, 2010

Date of Patent: October 21, 2014

Assignee: Kabushiki Kaisha Toshiba

Inventors: Gou Hirabayashi, Takehiko Kagoshima
Recognition dictionary creation device and voice recognition device

Patent number: 8868431

Abstract: A recognition dictionary creation device identifies the language of a reading of an inputted text which is a target to be registered and adds a reading with phonemes in the language identified thereby to the target text to be registered, and also converts the reading of the target text to be registered from the phonemes in the language identified thereby to phonemes in a language to be recognized which is handled in voice recognition to create a recognition dictionary in which the converted reading of the target text to be registered is registered.

Type: Grant

Filed: February 5, 2010

Date of Patent: October 21, 2014

Assignee: Mitsubishi Electric Corporation

Inventors: Michihiro Yamazaki, Jun Ishii, Yasushi Ishikawa
Audio signal bandwidth extension in CELP-based speech coder

Patent number: 8868432

Abstract: A method for decoding an audio signal having a bandwidth that extends beyond a bandwidth of a CELP excitation signal in an audio decoder including a CELP-based decoder element. The method includes obtaining a second excitation signal having an audio bandwidth extending beyond the audio bandwidth of the CELP excitation signal, obtaining a set of signals by filtering the second excitation signal with a set of bandpass filters, scaling the set of signals using a set of energy-based parameters, and obtaining a composite output signal by combining the scaled set of signals with a signal based on the audio signal decoded by the CELP-based decoder element.

Type: Grant

Filed: September 28, 2011

Date of Patent: October 21, 2014

Assignee: Motorola Mobility LLC

Inventors: Jonathan A. Gibbs, James P. Ashley, Udar Mittal
Apparatus and method of encoding and decoding signals

Patent number: 8856012

Abstract: A method of encoding an audio signal, where signals including two or more channel signals are downmixed to a mono signal, the mono signal is divided into a low-frequency signal and a high-frequency signal, the low-frequency signal is encoded through algebraic code excited linear prediction (ACELP) or transform coded excitation (TCX), and the high-frequency signal is encoded using the low-frequency signal. A method of decoding of an audio signal, a low-frequency signal encoded through ACELP or TCX is decoded, a high-frequency signal is decoded using the low-frequency signal, the low-frequency signal and the high-frequency signal are combined to generate a mono signal, and the mono signal is upmixed by decoding spatial parameters regarding signals including two or more channel signals.

Type: Grant

Filed: February 3, 2014

Date of Patent: October 7, 2014

Assignee: SAMSUNG Electronics Co., Ltd.

Inventors: Ho-sang Sung, Eun-mi Oh, Jung-hoe Kim, Ki-hyun Choo, Mi-young Kim
Training and applying prosody models

Patent number: 8856008

Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.

Type: Grant

Filed: September 18, 2013

Date of Patent: October 7, 2014

Assignee: Morphism LLC

Inventor: James H. Stephens, Jr.
SYSTEMS AND METHODS FOR ENHANCING PLACE-OF-ARTICULATION FEATURES IN FREQUENCY-LOWERED SPEECH

Publication number: 20140288938

Abstract: To improve the intelligibility of speech for users with high-frequency hearing loss, the present systems and methods provide an improved frequency lowering system with enhancement of spectral features responsive to place-of-articulation of the input speech. High frequency components of speech, such as fricatives, may be classified based on one or more features that distinguish place of articulation, including spectral slope, peak location, relative amplitudes in various frequency bands, or a combination of these or other such features. Responsive to the classification of the input speech, a signal or signals may be added to the input speech in a frequency band audible to the hearing-impaired listener, said signal or signals having predetermined distinct spectral features corresponding to the classification, and allowing a listener to easily distinguish various consonants in the input.

Type: Application

Filed: November 1, 2012

Publication date: September 25, 2014

Inventor: Ying-Yee Kong
Coding, modification and synthesis of speech segments

Patent number: 8812324

Abstract: The invention relates to a method for speech signal analysis, modification and synthesis comprising a phase for the location of analysis windows by means of an iterative process for the determination of the phase of the first sinusoidal component and comparison between the phase value of said component and a predetermined value, a phase for the selection of analysis frames corresponding to an allophone and readjustment of the duration and the fundamental frequency according to certain thresholds and a phase for the generation of synthetic speech from synthesis frames taking the information of the closest analysis frame as spectral information of the synthesis frame and taking as many synthesis frames as periods that the synthetic signal has. The method allows a coherent location of the analysis windows within the periods of the signal and the exact generation of the synthesis instants in a manner synchronous with the fundamental period.

Type: Grant

Filed: December 21, 2010

Date of Patent: August 19, 2014

Assignee: Telefonica, S.A.

Inventors: Miguel Angel Rodriguez Crespo, Jose Gregorio Escalada Sardina, Ana Armenta Lopez de Vicuna
AUDIO SIGNAL SYNTHESIZER AND AUDIO SIGNAL ENCODER

Publication number: 20140222434

Abstract: An audio signal synthesizer generates a synthesis audio signal having a first frequency band and a second synthesized frequency band derived from the first frequency band and comprises a patch generator, a spectral converter, a raw signal processor and a combiner. The patch generator performs at least two different patching algorithms, each patching algorithm generating a raw signal. The patch generator is adapted to select one of the at least two different patching algorithms in response to a control information. The spectral converter converts the raw signal into a raw signal spectral representation. The raw signal processor processes the raw signal spectral representation in response to spectral domain spectral band replication parameters to obtain an adjusted raw signal spectral representation.

Type: Application

Filed: April 10, 2014

Publication date: August 7, 2014

Applicant: Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.

Inventors: Frederik NAGEL, Sascha DISCH, Nikolaus RETTELBACH, Max NEUENDORF, Bernhard GRILL, Ulrich KRAEMER, Stefan WABNIK
Method and system for enhancing a speech database

Patent number: 8744851

Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

Type: Grant

Filed: August 13, 2013

Date of Patent: June 3, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Alistair Conkie, Ann K Syrdal
Prosody generating devise, prosody generating method, and program

Patent number: 8738381

Abstract: A prosody generation apparatus capable of suppressing distortion that occurs when generating prosodic patterns and therefore generating a natural prosody is provided. A prosody changing point extraction unit in this apparatus extracts a prosody changing point located at the beginning and the ending of a sentence, the beginning and the ending of a breath group, an accent position and the like. A selection rule and a transformation rule of a prosodic pattern including the prosody changing point is generated by means of a statistical or learning technique and the thus generate rules are stored in a representative prosodic pattern selection rule table and a transformation rule table beforehand. A pattern selection unit selects a representative prosodic pattern from the representative prosodic pattern selection rule table according to the selection rule.

Type: Grant

Filed: January 17, 2007

Date of Patent: May 27, 2014

Assignee: Panasonic Corporation

Inventors: Yumiko Kato, Takahiro Kamai
Audio signal transforming by utilizing a computational cost function

Patent number: 8706496

Abstract: A sequence is received of time domain digital audio samples representing sound (e.g., a sound generated by a human voice or a musical instrument). The time domain digital audio samples are processed to derive a corresponding sequence of audio pulses in the time domain. Each of the audio pulses is associated with a characteristic frequency. Frequency domain information is derived about each of at least some of the audio pulses. The sound represented by the time domain digital audio samples is transformed by processing the audio pulses using the frequency domain information. The sound transformation utilizes overlapping windows and a computational cost function which depends on a product of the number of the pitch periods and the inverse of the minimum fundamental frequency within the window is determined.

Type: Grant

Filed: September 13, 2007

Date of Patent: April 22, 2014

Assignee: Universitat Pompeu Fabra

Inventor: Jordi Bonada Sanjaume
Controllable prosody re-estimation system and method and computer program product thereof

Patent number: 8706493

Abstract: In one embodiment of a controllable prosody re-estimation system, a TTS/STS engine consists of a prosody prediction/estimation module, a prosody re-estimation module and a speech synthesis module. The prosody prediction/estimation module generates predicted or estimated prosody information. And then the prosody re-estimation module re-estimates the predicted or estimated prosody information and produces new prosody information, according to a set of controllable parameters provided by a controllable prosody parameter interface. The new prosody information is provided to the speech synthesis module to produce a synthesized speech.

Type: Grant

Filed: July 11, 2011

Date of Patent: April 22, 2014

Assignee: Industrial Technology Research Institute

Inventors: Cheng-Yuan Lin, Chien-Hung Huang, Chih-Chung Kuo
Speech signal restoration device and speech signal restoration method

Patent number: 8706497

Abstract: A synthesis filter 106 synthesizes a plurality of wide-band speech signals by combining wide-band phoneme signals and sound source signals from a speech signal code book 105, and a distortion evaluation unit 107 selects one of the wide-band speech signals with a minimum waveform distortion with respect to an up-sampled narrow-band speech signal output from a sampling conversion unit 101. A first bandpass filter 103 extracts a frequency component outside a narrow-band of the wide-band speech signal and a band synthesis unit 104 combines it with the up-sampled narrow-band speech signal.

Type: Grant

Filed: October 22, 2010

Date of Patent: April 22, 2014

Assignee: Mitsubishi Electric Corporation

Inventors: Satoru Furuta, Hirohisa Tasaki
Systems and methods for classifying sports video

Patent number: 8682654

Abstract: Disclosed are systems, methods, and computer readable media having programs for classifying sports video. In one embodiment, a method includes: extracting, from an audio stream of a video clip, a plurality of key audio components contained therein; and classifying, using at least one of the plurality of key audio components, a sport type contained in the video clip. In one embodiment, a computer readable medium having a computer program for classifying ports video includes: logic configured to extract a plurality of key audio components from a video clip; and logic configured to classify a sport type corresponding to the video clip.

Type: Grant

Filed: April 25, 2006

Date of Patent: March 25, 2014

Assignee: Cyberlink Corp.

Inventors: Ming-Jun Chen, Jiun-Fu Chen, Shih-Min Tang, Ho-Chao Huang
Personalized text-to-speech synthesis and personalized speech feature extraction

Patent number: 8655659

Abstract: A personalized text-to-speech synthesizing device includes: a personalized speech feature library creator, configured to recognize personalized speech features of a specific speaker by comparing a random speech fragment of the specific speaker with preset keywords, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker and created by the personalized speech feature library creator, thereby to generate and output a speech fragment having pronunciation characteristics of the specific speaker.

Type: Grant

Filed: August 12, 2010

Date of Patent: February 18, 2014

Assignees: Sony Corporation, Sony Mobile Communications AB

Inventors: Qingfang Wang, Shouchun He
Multiple stream decoder

Patent number: 8655650

Abstract: A method is provided for decoding data streams in a voice communication system. The method includes: receiving two or more data streams having voice data encoded therein; decoding each data stream into a set of speech coding parameters; forming a set of combined speech coding parameters by combining the sets of decoded speech coding parameters, where speech coding parameters of a given type are combined with speech coding parameters of the same type; and inputting the set of combined speech coding parameters into a speech synthesizer.

Type: Grant

Filed: March 28, 2007

Date of Patent: February 18, 2014

Assignee: Harris Corporation

Inventor: Mark W. Chamberlain
Electronic device and method of associating a voice font with a contact for text-to-speech conversion at the electronic device

Patent number: 8645140

Abstract: A method of associating a voice font with a contact for text-to-speech conversion at an electronic device includes obtaining, at the electronic device, the voice font for the contact, and storing the voice font in association with a contact data record stored in a contacts database at the electronic device. The contact data record includes contact data for the contact.

Type: Grant

Filed: February 25, 2009

Date of Patent: February 4, 2014

Assignee: BlackBerry Limited

Inventor: Yuriy Lobzakov
Apparatus and method of encoding and decoding signals

Patent number: 8645126

Abstract: A method of encoding an audio signal, where signals including two or more channel signals are downmixed to a mono signal, the mono signal is divided into a low-frequency signal and a high-frequency signal, the low-frequency signal is encoded through algebraic code excited linear prediction (ACELP) or transform coded excitation (TCX), and the high-frequency signal is encoded using the low-frequency signal. A method of decoding of an audio signal, a low-frequency signal encoded through ACELP or TCX is decoded, a high-frequency signal is decoded using the low-frequency signal, the low-frequency signal and the high-frequency signal are combined to generate a mono signal, and the mono signal is upmixed by decoding spatial parameters regarding signals including two or more channel signals.

Type: Grant

Filed: March 26, 2013

Date of Patent: February 4, 2014

Assignee: Samsung Electronics Co., Ltd

Inventors: Ho-sang Sung, Eun-mi Oh, Jung-hoe Kim, Ki-hyun Choo, Mi-young Kim
Apparatus and method for automatic extraction of important events in audio signals

Patent number: 8635065

Abstract: The present invention discloses an apparatus for automatic extraction of important events in audio signals comprising: signal input means for supplying audio signals; audio signal fragmenting means for partitioning audio signals supplied by the signal input means into audio fragments of a predetermined length and for allocating a sequence of one or more audio fragments to a respective audio window; feature extracting means for analyzing acoustic characteristics of the audio signals comprised in the audio fragments and for analyzing acoustic characteristics of the audio signals comprised in the audio windows; and important event extraction means for extracting important events in audio signals supplied by the audio signal fragmenting means based on predetermined important event classifying rules depending on acoustic characteristics of the audio signals comprised in the audio fragments and on acoustic characteristics of the audio signals comprised in the audio windows, wherein each important event extracted

Type: Grant

Filed: November 10, 2004

Date of Patent: January 21, 2014

Assignee: Sony Deutschland GmbH

Inventors: Silke Goronzy-Thomae, Thomas Kemp, Ralf Kompe, Yin Hay Lam, Krzysztof Marasek, Raquel Tato

1 2 3 4 5 … next