Frequency Element Patents (Class 704/268)
  • Patent number: 11869483
    Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
    Type: Grant
    Filed: October 7, 2021
    Date of Patent: January 9, 2024
    Assignee: Nvidia Corporation
    Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
  • Patent number: 11769481
    Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.
    Type: Grant
    Filed: October 7, 2021
    Date of Patent: September 26, 2023
    Assignee: Nvidia Corporation
    Inventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
  • Patent number: 11195541
    Abstract: A method and system for providing Gaussian weighted self-attention for speech enhancement are herein provided. According to one embodiment, the method includes receiving a input noise signal, generating a score matrix based on the received input noise signal, and applying a Gaussian weighted function to the generated score matrix.
    Type: Grant
    Filed: October 2, 2019
    Date of Patent: December 7, 2021
    Inventors: JaeYoung Kim, Mostafa El-Khamy, Jungwon Lee
  • Patent number: 11087771
    Abstract: A device includes an encoder and a transmitter. The encoder is configured to generate a first high-band portion of a first signal based on a left signal and a right signal. The encoder is also configured to generate a set of adjustment gain parameters based on a high-band non-reference signal. The high-band non-reference signal corresponds to one of a left high-band portion of the left signal or a right high-band portion of the right signal as a high-band non-reference signal. The transmitter is configured to transmit information corresponding to the first high-band portion of the first signal. The transmitter is also configured to transmit the set of adjustment gain parameters corresponding to the high-band non-reference signal.
    Type: Grant
    Filed: June 26, 2019
    Date of Patent: August 10, 2021
    Assignee: QUALCOMM Incorporated
    Inventors: Venkatraman Atti, Venkata Subrahmanyam Chandra Sekhar Chebiyyam
  • Patent number: 11011154
    Abstract: A method of performing speech synthesis, includes encoding character embeddings, using any one or any combination of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), applying a relative-position-aware self attention function to each of the character embeddings and an input mel-scale spectrogram, and encoding the character embeddings to which the relative-position-aware self attention function is applied. The method further includes concatenating the encoded character embeddings and the encoded character embeddings to which the relative-position-aware self attention function is applied, to generate an encoder output, applying a multi-head attention function to the encoder output and the input mel-scale spectrogram to which the relative-position-aware self attention function is applied, and predicting an output mel-scale spectrogram, based on the encoder output and the input mel-scale spectrogram to which the multi-head attention function is applied.
    Type: Grant
    Filed: February 8, 2019
    Date of Patent: May 18, 2021
    Assignee: TENCENT AMERICA LLC
    Inventors: Shan Yang, Heng Lu, Shiyin Kang, Dong Yu
  • Patent number: 10127916
    Abstract: A method and apparatus for enhancing audio processing between a transmit radio (230) and a receive radio (240) are provided. Digitized audio frames (202) are applied to parallel inputs of both a vocoder encoder (204) and a trill encoder 212 of the transmit radio (230). The vocoder encoder (204) generates voice bits which are communicated over a voice bits channel (206) to a vocoder decoder (208) of the receive radio (240). Trill encoder (212) generates signaling bits which are communicated over a signaling bits channel (214) to a trill decoder (216) of the receive radio (240) for recovery of trill information (218). At the receive radio (240), a decoded audio signal (209) generated from the vocoder decoder (208), and the recovered trill information (218) are both provided as inputs to a trill reconstructor stage (220) to generate a recovered audio signal (222) having a reconstructed trill.
    Type: Grant
    Filed: April 24, 2014
    Date of Patent: November 13, 2018
    Assignee: MOTOROLA SOLUTIONS, INC.
    Inventors: Yi Gao, Qing Yan
  • Patent number: 9978359
    Abstract: A text-to-speech (TTS) processing system may be configured for iterative processing. Speech units for unit selection may be tagged according to extra segmental features, such as emotional features, dramatic features, etc. Preliminary TTS results based on input text may be provided to a user through a user interface. The user may offer corrections to the preliminary results. Those corrections may correspond to the extra segmental features. The user corrections may then be input into the TTS system along with the input text to provide refined TTS results. This process may be repeated iteratively to obtain desired TTS results.
    Type: Grant
    Filed: December 6, 2013
    Date of Patent: May 22, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: Michal Tadeusz Kaszczuk, Jeffrey Penrod Adams, Adam Franciszek Nadolski
  • Patent number: 9916834
    Abstract: An approach is described that obtains spectrum coefficients for a replacement frame of an audio signal. A tonal component of a spectrum of an audio signal is detected based on a peak that exists in the spectra of frames preceding a replacement frame. For the tonal component of the spectrum a spectrum coefficients for the peak and its surrounding in the spectrum of the replacement frame is predicted, and for the non-tonal component of the spectrum a non-predicted spectrum coefficient for the replacement frame or a corresponding spectrum coefficient of a frame preceding the replacement frame is used.
    Type: Grant
    Filed: December 21, 2015
    Date of Patent: March 13, 2018
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
    Inventors: Janine Sukowski, Ralph Sperschneider, Goran Markovic, Wolfgang Jaegers, Christian Helmrich, Bernd Edler, Ralf Geiger
  • Patent number: 9711123
    Abstract: Provided is a voice synthesis device, including: a voice synthesis information acquisition unit configured to acquire voice synthesis information for specifying a sound generating character; a replacement unit configured to replace at least a part of sound generating characters specified by the voice synthesis information with an alternative sound generating character different from the sound generating character; and a voice synthesis unit configured to execute a second synthesis process for generating a voice signal of an utterance sound obtained by the replacing.
    Type: Grant
    Filed: November 6, 2015
    Date of Patent: July 18, 2017
    Assignee: YAMAHA CORPORATION
    Inventor: Motoki Ogasawara
  • Patent number: 9664518
    Abstract: One embodiment of an invention which computes a location based alignment of two tracks over a set route. Once aligned, a comparison of performance statistics is made at each position along the track. Time and distance gap information is also computed at each position. The results are then displayed in a plot (17) so one can see where different performance statistics changed, including time gap information (19). The data is also linked to a map (8) so one can visualize the locations more clearly. It is also possible to compare multiple tracks (25) to one reference track (23) for greater insight.
    Type: Grant
    Filed: August 22, 2011
    Date of Patent: May 30, 2017
    Assignee: Strava, Inc.
    Inventor: Paul Mach
  • Patent number: 9653085
    Abstract: A method and system for reconstructing an original audio signal is disclosed. The original audio signal has a baseband up to a cutoff frequency and high-frequency components not included in the baseband above the cutoff frequency. The system includes a bitstream deformatter that extracts a representation of the baseband, an estimated spectral envelope, and noise-blending parameters from an audio bitstream. The system also includes a spectral component regenerator that copies or translates all or at least some of the baseband spectral components to non-overlapping frequency ranges of the high-frequency components not included in the baseband to generate regenerated spectral components. The system further includes a gain adjuster that modifies a spectral envelope of the regenerated spectral components based at least in part on the estimated spectral envelope and the noise-blending parameters to generate gain-adjusted regenerated spectral components.
    Type: Grant
    Filed: December 6, 2016
    Date of Patent: May 16, 2017
    Assignee: Dolby Laboratories Licensing Corporation
    Inventors: Michael M. Truman, Mark S. Vinton
  • Patent number: 9437194
    Abstract: A voice control method is applied in an electronic device. The electronic device includes a voice input unit, a play unit, and a storage unit storing a conversation database and an association table between different ranges of voice characteristics and styles of response voice. The method includes the following steps. Obtaining voice signals input via the voice input unit. Determining which content is input according to the obtained voice signals. Searching in the conversation database to find a response corresponding to the input content. Analyzing voice characteristics of the obtained voice signals. Comparing the voice characteristics of the obtained voice signals with the pre-stored ranges. Selecting the associated response voice. Finally, outputting the found response using the associated response voice via the play unit.
    Type: Grant
    Filed: May 21, 2013
    Date of Patent: September 6, 2016
    Assignees: Fu Tai Hua Industry (Shenzhen) Co., Ltd., HON HAI PRECISION INDUSTRY CO., LTD.
    Inventor: Ren-Wen Huang
  • Patent number: 9401138
    Abstract: A segment information generation device includes a waveform cutout unit cuts out a speech waveform from natural speech at a time period not depending on a pitch frequency of the natural speech. A feature parameter extraction unit extracts a feature parameter of a speech waveform from the speech waveform cut out by the waveform cutout unit. A time domain waveform generation unit generates a time domain waveform based on the feature parameter.
    Type: Grant
    Filed: May 10, 2012
    Date of Patent: July 26, 2016
    Assignee: NEC Corporation
    Inventor: Masanori Kato
  • Patent number: 9368104
    Abstract: A system and method for realistic speech synthesis which converts text into synthetic human speech with qualities appropriate to the context such as the language and dialect of the speaker, as well as expanding a speaker's phonetic inventory to produce more natural sounding speech.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: June 14, 2016
    Assignee: SRC, INC.
    Inventors: David Donald Eller, Steven Brian Morphet, Watson Brent Boyett
  • Patent number: 9350586
    Abstract: A signal decoder in a communication system is for decoding signal elements in a communication signal having interleaved carrier frequencies. The decoder receives antenna signals in a frequency domain, and has a multiplier for multiplying the antenna signals by a complex-valued mathematical sequence such as the Zadoff-Chu sequence, to generate multiplied antenna signals. An inverse frequency to time converter converts the multiplied antenna signals to time domain signals. A signal quality detector detects a signal quality from the time domain signals based on a subset of the carrier frequencies. The complex-valued mathematical sequence is provided with zero values corresponding to carrier frequencies that are not included in the subset, and the inverse frequency to time converter has a transform size corresponding to the multiplied antenna signals including all carrier frequencies.
    Type: Grant
    Filed: August 25, 2014
    Date of Patent: May 24, 2016
    Assignee: FREESCALE SEMICONDUCTOR, INC.
    Inventor: Vincent Pierre Martinez
  • Patent number: 9286885
    Abstract: In a method of generating speech from text the speech segments necessary to put together the text to be output as speech by a terminal are determined; it is checked, which speech segments are already present in the terminal and which ones need to be transmitted from a server to the terminal; the segments to be transmitted to the terminal are indexed; the speech segments and the indices of segments to be output at the terminal are transmitted; an index sequence of speech segments to be put together to form the speech to be output is transmitted; and the segments are concatenated according to the index sequence. This method allows to realize a distributed speech synthesis system requiring only a low transmission capacity, a small memory and low computational power in the terminal.
    Type: Grant
    Filed: April 6, 2004
    Date of Patent: March 15, 2016
    Assignee: Alcatel Lucent
    Inventors: Jürgen Sienel, Dieter Kopp
  • Patent number: 9275631
    Abstract: Waveform concatenation speech synthesis with high sound quality. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. An accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values.
    Type: Grant
    Filed: December 31, 2012
    Date of Patent: March 1, 2016
    Assignee: Nuance Communications, Inc.
    Inventors: Ryuki Tachibana, Masafumi Nishimura
  • Patent number: 9070365
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: September 10, 2014
    Date of Patent: June 30, 2015
    Assignee: Morphism LLC
    Inventor: James H. Stephens, Jr.
  • Patent number: 9058811
    Abstract: According to one embodiment, a method, apparatus for synthesizing speech, and a method for training acoustic model used in speech synthesis is provided. The method for synthesizing speech may include determining data generated by text analysis as fuzzy heteronym data, performing fuzzy heteronym prediction on the fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and probabilities thereof, generating fuzzy context feature labels based on the plurality of candidate pronunciations and probabilities thereof, determining model parameters for the fuzzy context feature labels based on acoustic model with fuzzy decision tree, generating speech parameters from the model parameters, and synthesizing the speech parameters via synthesizer as speech.
    Type: Grant
    Filed: February 22, 2012
    Date of Patent: June 16, 2015
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Xi Wang, Xiaoyan Lou, Jian Li
  • Patent number: 9026440
    Abstract: The present invention relates to means and methods of automated difference recognition between speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music.
    Type: Grant
    Filed: March 21, 2014
    Date of Patent: May 5, 2015
    Inventor: Alon Konchitsky
  • Publication number: 20150106102
    Abstract: A method includes determining, at a speech encoder, first gain shape parameters based on a harmonically extended signal and/or based on a high-band residual signal associated with a high-band portion of an audio signal. The method also includes determining second gain shape parameters based on a synthesized high-band signal and based on the high-band portion of the audio signal. The method further includes inserting the first gain parameters and the second gain shape parameters into an encoded version of the audio signal to enable gain adjustment during reproduction of the audio signal from the encoded version of the audio signal.
    Type: Application
    Filed: October 7, 2014
    Publication date: April 16, 2015
    Inventors: Venkata Subrahmanyam Chandra Sekhar Chebiyyam, Venkatraman S. Atti
  • Patent number: 9009052
    Abstract: Herein provided is a system for singing synthesis capable of reflecting not only pitch and dynamics changes but also timbre changes of a user's singing. A spectral transform surface generating section 119 temporally concatenates all the spectral transform curves estimated by a second spectral transform curve estimating section 117 to define a spectral transform surface. A synthesized audio signal generating section 121 generates a transform spectral envelope at each instant of time by scaling a reference spectral envelope based on the spectral transform surface. Then, the synthesized audio signal generating section 121 generates an audio signal of a synthesized singing voice reflecting timbre changes of an input singing voice, based on the transform spectral envelope and a fundamental frequency contained in a reference singing voice source data.
    Type: Grant
    Filed: July 19, 2011
    Date of Patent: April 14, 2015
    Assignee: National Institute of Advanced Industrial Science and Technology
    Inventors: Tomoyasu Nakano, Masataka Goto
  • Patent number: 8977552
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: May 28, 2014
    Date of Patent: March 10, 2015
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair D. Conkie, Ann K. Syrdal
  • Publication number: 20150025892
    Abstract: A system and method for speech-to-singing synthesis is provided. The method includes deriving characteristics of a singing voice for a first individual and modifying vocal characteristics of a voice for a second individual in response to the characteristics of the singing voice of the first individual to generate a synthesized singing voice for the second individual.
    Type: Application
    Filed: March 6, 2013
    Publication date: January 22, 2015
    Applicant: Agency for Science, Technology and Research
    Inventors: Siu Wa Lee, Ling Cen, Haizhou Li, Yaozhu Paul Chan, Minghui Dong
  • Patent number: 8924204
    Abstract: Unlike sound based pressure waves that go everywhere, air turbulence caused by wind is usually a fairly local event. Therefore, in a system that utilizes two or more spatially separated microphones to pick up sound signals (e.g., speech), wind noise picked up by one of the microphones often will not be picked up (or at least not to the same extent) by the other microphone(s). Embodiments of methods and apparatuses that utilize this fact and others to effectively detect and suppress wind noise using multiple microphones that are spatially separated are described.
    Type: Grant
    Filed: September 30, 2011
    Date of Patent: December 30, 2014
    Assignee: Broadcom Corporation
    Inventors: Juin-Hwey Chen, Jes Thyssen, Xianxian Zhang, Huaiyu Zeng
  • Patent number: 8909538
    Abstract: Improved methods of presenting speech prompts to a user as part of an automated system that employs speech recognition or other voice input are described. The invention improves the user interface by providing in combination with at least one user prompt seeking a voice response, an enhanced user keyword prompt intended to facilitate the user selecting a keyword to speak in response to the user prompt. The enhanced keyword prompts may be the same words as those a user can speak as a reply to the user prompt but presented using a different audio presentation method, e.g., speech rate, audio level, or speaker voice, than used for the user prompt. In some cases, the user keyword prompts are different words from the expected user response keywords, or portions of words, e.g., truncated versions of keywords.
    Type: Grant
    Filed: November 11, 2013
    Date of Patent: December 9, 2014
    Assignee: Verizon Patent and Licensing Inc.
    Inventor: James Mark Kondziela
  • Patent number: 8898057
    Abstract: Disclosed is an encoding apparatus that can efficiently encode a signal that is a broad or extra-broad band signal or the like, thereby improving the quality of a decoded signal. This encoding apparatus includes a band establishing unit (301) that generate, based on the characteristic of the input signal, band establishment information to be used for dividing the band of the input signal to establish a first band part of lower frequency side and a second band part of higher frequency side; a lower frequency encoding unit (302) for encoding, based on the band establishment information, the input signal of the first band part to generate encoded lower frequency part information; and a higher frequency encoding unit (303) for encoding, based on the band establishment information, the input signal of the second band part to generate encoded higher frequency part information.
    Type: Grant
    Filed: October 22, 2010
    Date of Patent: November 25, 2014
    Assignee: Panasonic Intellectual Property Corporation of America
    Inventor: Tomofumi Yamanashi
  • Patent number: 8898062
    Abstract: A strained-rough-voice conversion unit (10) is included in a voice conversion device that can generate a “strained rough” voice produced in a part of a speech when speaking forcefully with excitement, nervousness, anger, or emphasis and thereby richly express vocal expression such as anger, excitement, or an animated or lively way of speaking, using voice quality change. The strained-rough-voice conversion unit (10) includes: a strained phoneme position designation unit (11) designating a phoneme to be uttered as a “strained rough” voice in a speech; and an amplitude modulation unit (14) performing modulation including periodic amplitude fluctuation on a speech waveform.
    Type: Grant
    Filed: January 22, 2008
    Date of Patent: November 25, 2014
    Assignee: Panasonic Intellectual Property Corporation of America
    Inventors: Yumiko Kato, Takahiro Kamai
  • Patent number: 8886539
    Abstract: The present invention discloses a parametrical representation of prosody based on polynomial expansion coefficients of the pitch contour near the center of each syllable. The said syllable pitch expansion coefficients are generated from a recorded speech database, read from a number of sentences by a reference speaker. By correlating the stress level and context information of each syllable in the text with the polynomial expansion coefficients of the corresponding spoken syllable, a correlation database is formed. To generate prosody for an input text, stress level and context information of each syllable in the text is identified. The prosody is generated by using the said correlation database to find the best set of pitch parameters for each syllable. By adding to global pitch contours and using interpolation formulas, complete pitch contour for the input text is generated. Duration and intensity profile are generated using a similar procedure.
    Type: Grant
    Filed: March 17, 2014
    Date of Patent: November 11, 2014
    Inventor: Chengjun Julian Chen
  • Patent number: 8886538
    Abstract: Systems and methods for speech synthesis and, in particular, text-to-speech systems and methods for converting a text input to a synthetic waveform by processing prosodic and phonetic content of a spoken example of the text input to accurately mimic the input speech style and pronunciation. Systems and methods provide an interface to a TTS system to allow a user to input a text string and a spoken utterance of the text string, extract prosodic parameters from the spoken input, and process the prosodic parameters to derive corresponding markup for the text input to enable a more natural sounding synthesized speech.
    Type: Grant
    Filed: September 26, 2003
    Date of Patent: November 11, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Andy Aaron, Raimo Bakis, Ellen M. Eide, Wael M. Hamza
  • Patent number: 8868432
    Abstract: A method for decoding an audio signal having a bandwidth that extends beyond a bandwidth of a CELP excitation signal in an audio decoder including a CELP-based decoder element. The method includes obtaining a second excitation signal having an audio bandwidth extending beyond the audio bandwidth of the CELP excitation signal, obtaining a set of signals by filtering the second excitation signal with a set of bandpass filters, scaling the set of signals using a set of energy-based parameters, and obtaining a composite output signal by combining the scaled set of signals with a signal based on the audio signal decoded by the CELP-based decoder element.
    Type: Grant
    Filed: September 28, 2011
    Date of Patent: October 21, 2014
    Assignee: Motorola Mobility LLC
    Inventors: Jonathan A. Gibbs, James P. Ashley, Udar Mittal
  • Patent number: 8868431
    Abstract: A recognition dictionary creation device identifies the language of a reading of an inputted text which is a target to be registered and adds a reading with phonemes in the language identified thereby to the target text to be registered, and also converts the reading of the target text to be registered from the phonemes in the language identified thereby to phonemes in a language to be recognized which is handled in voice recognition to create a recognition dictionary in which the converted reading of the target text to be registered is registered.
    Type: Grant
    Filed: February 5, 2010
    Date of Patent: October 21, 2014
    Assignee: Mitsubishi Electric Corporation
    Inventors: Michihiro Yamazaki, Jun Ishii, Yasushi Ishikawa
  • Patent number: 8868422
    Abstract: According to one embodiment, a method for editing speech is disclosed. The method can generate speech information from a text. The speech information includes phonologic information and prosody information. The method can divide the speech information into a plurality of speech units, based on at least one of the phonologic information and the prosody information. The method can search at least two speech units from the plurality of speech units. At least one of the phonologic information and the prosody information in the at least two speech units are identical or similar. In addition, the method can store a speech unit waveform corresponding to one of the at least two speech units as a representative speech unit into a memory.
    Type: Grant
    Filed: September 13, 2010
    Date of Patent: October 21, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Gou Hirabayashi, Takehiko Kagoshima
  • Patent number: 8856008
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: September 18, 2013
    Date of Patent: October 7, 2014
    Assignee: Morphism LLC
    Inventor: James H. Stephens, Jr.
  • Patent number: 8856012
    Abstract: A method of encoding an audio signal, where signals including two or more channel signals are downmixed to a mono signal, the mono signal is divided into a low-frequency signal and a high-frequency signal, the low-frequency signal is encoded through algebraic code excited linear prediction (ACELP) or transform coded excitation (TCX), and the high-frequency signal is encoded using the low-frequency signal. A method of decoding of an audio signal, a low-frequency signal encoded through ACELP or TCX is decoded, a high-frequency signal is decoded using the low-frequency signal, the low-frequency signal and the high-frequency signal are combined to generate a mono signal, and the mono signal is upmixed by decoding spatial parameters regarding signals including two or more channel signals.
    Type: Grant
    Filed: February 3, 2014
    Date of Patent: October 7, 2014
    Assignee: SAMSUNG Electronics Co., Ltd.
    Inventors: Ho-sang Sung, Eun-mi Oh, Jung-hoe Kim, Ki-hyun Choo, Mi-young Kim
  • Publication number: 20140288938
    Abstract: To improve the intelligibility of speech for users with high-frequency hearing loss, the present systems and methods provide an improved frequency lowering system with enhancement of spectral features responsive to place-of-articulation of the input speech. High frequency components of speech, such as fricatives, may be classified based on one or more features that distinguish place of articulation, including spectral slope, peak location, relative amplitudes in various frequency bands, or a combination of these or other such features. Responsive to the classification of the input speech, a signal or signals may be added to the input speech in a frequency band audible to the hearing-impaired listener, said signal or signals having predetermined distinct spectral features corresponding to the classification, and allowing a listener to easily distinguish various consonants in the input.
    Type: Application
    Filed: November 1, 2012
    Publication date: September 25, 2014
    Inventor: Ying-Yee Kong
  • Patent number: 8812324
    Abstract: The invention relates to a method for speech signal analysis, modification and synthesis comprising a phase for the location of analysis windows by means of an iterative process for the determination of the phase of the first sinusoidal component and comparison between the phase value of said component and a predetermined value, a phase for the selection of analysis frames corresponding to an allophone and readjustment of the duration and the fundamental frequency according to certain thresholds and a phase for the generation of synthetic speech from synthesis frames taking the information of the closest analysis frame as spectral information of the synthesis frame and taking as many synthesis frames as periods that the synthetic signal has. The method allows a coherent location of the analysis windows within the periods of the signal and the exact generation of the synthesis instants in a manner synchronous with the fundamental period.
    Type: Grant
    Filed: December 21, 2010
    Date of Patent: August 19, 2014
    Assignee: Telefonica, S.A.
    Inventors: Miguel Angel Rodriguez Crespo, Jose Gregorio Escalada Sardina, Ana Armenta Lopez de Vicuna
  • Publication number: 20140222434
    Abstract: An audio signal synthesizer generates a synthesis audio signal having a first frequency band and a second synthesized frequency band derived from the first frequency band and comprises a patch generator, a spectral converter, a raw signal processor and a combiner. The patch generator performs at least two different patching algorithms, each patching algorithm generating a raw signal. The patch generator is adapted to select one of the at least two different patching algorithms in response to a control information. The spectral converter converts the raw signal into a raw signal spectral representation. The raw signal processor processes the raw signal spectral representation in response to spectral domain spectral band replication parameters to obtain an adjusted raw signal spectral representation.
    Type: Application
    Filed: April 10, 2014
    Publication date: August 7, 2014
    Applicant: Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.
    Inventors: Frederik NAGEL, Sascha DISCH, Nikolaus RETTELBACH, Max NEUENDORF, Bernhard GRILL, Ulrich KRAEMER, Stefan WABNIK
  • Patent number: 8744851
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: August 13, 2013
    Date of Patent: June 3, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair Conkie, Ann K Syrdal
  • Patent number: 8738381
    Abstract: A prosody generation apparatus capable of suppressing distortion that occurs when generating prosodic patterns and therefore generating a natural prosody is provided. A prosody changing point extraction unit in this apparatus extracts a prosody changing point located at the beginning and the ending of a sentence, the beginning and the ending of a breath group, an accent position and the like. A selection rule and a transformation rule of a prosodic pattern including the prosody changing point is generated by means of a statistical or learning technique and the thus generate rules are stored in a representative prosodic pattern selection rule table and a transformation rule table beforehand. A pattern selection unit selects a representative prosodic pattern from the representative prosodic pattern selection rule table according to the selection rule.
    Type: Grant
    Filed: January 17, 2007
    Date of Patent: May 27, 2014
    Assignee: Panasonic Corporation
    Inventors: Yumiko Kato, Takahiro Kamai
  • Patent number: 8706497
    Abstract: A synthesis filter 106 synthesizes a plurality of wide-band speech signals by combining wide-band phoneme signals and sound source signals from a speech signal code book 105, and a distortion evaluation unit 107 selects one of the wide-band speech signals with a minimum waveform distortion with respect to an up-sampled narrow-band speech signal output from a sampling conversion unit 101. A first bandpass filter 103 extracts a frequency component outside a narrow-band of the wide-band speech signal and a band synthesis unit 104 combines it with the up-sampled narrow-band speech signal.
    Type: Grant
    Filed: October 22, 2010
    Date of Patent: April 22, 2014
    Assignee: Mitsubishi Electric Corporation
    Inventors: Satoru Furuta, Hirohisa Tasaki
  • Patent number: 8706496
    Abstract: A sequence is received of time domain digital audio samples representing sound (e.g., a sound generated by a human voice or a musical instrument). The time domain digital audio samples are processed to derive a corresponding sequence of audio pulses in the time domain. Each of the audio pulses is associated with a characteristic frequency. Frequency domain information is derived about each of at least some of the audio pulses. The sound represented by the time domain digital audio samples is transformed by processing the audio pulses using the frequency domain information. The sound transformation utilizes overlapping windows and a computational cost function which depends on a product of the number of the pitch periods and the inverse of the minimum fundamental frequency within the window is determined.
    Type: Grant
    Filed: September 13, 2007
    Date of Patent: April 22, 2014
    Assignee: Universitat Pompeu Fabra
    Inventor: Jordi Bonada Sanjaume
  • Patent number: 8706493
    Abstract: In one embodiment of a controllable prosody re-estimation system, a TTS/STS engine consists of a prosody prediction/estimation module, a prosody re-estimation module and a speech synthesis module. The prosody prediction/estimation module generates predicted or estimated prosody information. And then the prosody re-estimation module re-estimates the predicted or estimated prosody information and produces new prosody information, according to a set of controllable parameters provided by a controllable prosody parameter interface. The new prosody information is provided to the speech synthesis module to produce a synthesized speech.
    Type: Grant
    Filed: July 11, 2011
    Date of Patent: April 22, 2014
    Assignee: Industrial Technology Research Institute
    Inventors: Cheng-Yuan Lin, Chien-Hung Huang, Chih-Chung Kuo
  • Patent number: 8682654
    Abstract: Disclosed are systems, methods, and computer readable media having programs for classifying sports video. In one embodiment, a method includes: extracting, from an audio stream of a video clip, a plurality of key audio components contained therein; and classifying, using at least one of the plurality of key audio components, a sport type contained in the video clip. In one embodiment, a computer readable medium having a computer program for classifying ports video includes: logic configured to extract a plurality of key audio components from a video clip; and logic configured to classify a sport type corresponding to the video clip.
    Type: Grant
    Filed: April 25, 2006
    Date of Patent: March 25, 2014
    Assignee: Cyberlink Corp.
    Inventors: Ming-Jun Chen, Jiun-Fu Chen, Shih-Min Tang, Ho-Chao Huang
  • Patent number: 8655659
    Abstract: A personalized text-to-speech synthesizing device includes: a personalized speech feature library creator, configured to recognize personalized speech features of a specific speaker by comparing a random speech fragment of the specific speaker with preset keywords, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker and created by the personalized speech feature library creator, thereby to generate and output a speech fragment having pronunciation characteristics of the specific speaker.
    Type: Grant
    Filed: August 12, 2010
    Date of Patent: February 18, 2014
    Assignees: Sony Corporation, Sony Mobile Communications AB
    Inventors: Qingfang Wang, Shouchun He
  • Patent number: 8655650
    Abstract: A method is provided for decoding data streams in a voice communication system. The method includes: receiving two or more data streams having voice data encoded therein; decoding each data stream into a set of speech coding parameters; forming a set of combined speech coding parameters by combining the sets of decoded speech coding parameters, where speech coding parameters of a given type are combined with speech coding parameters of the same type; and inputting the set of combined speech coding parameters into a speech synthesizer.
    Type: Grant
    Filed: March 28, 2007
    Date of Patent: February 18, 2014
    Assignee: Harris Corporation
    Inventor: Mark W. Chamberlain
  • Patent number: 8645140
    Abstract: A method of associating a voice font with a contact for text-to-speech conversion at an electronic device includes obtaining, at the electronic device, the voice font for the contact, and storing the voice font in association with a contact data record stored in a contacts database at the electronic device. The contact data record includes contact data for the contact.
    Type: Grant
    Filed: February 25, 2009
    Date of Patent: February 4, 2014
    Assignee: BlackBerry Limited
    Inventor: Yuriy Lobzakov
  • Patent number: 8645126
    Abstract: A method of encoding an audio signal, where signals including two or more channel signals are downmixed to a mono signal, the mono signal is divided into a low-frequency signal and a high-frequency signal, the low-frequency signal is encoded through algebraic code excited linear prediction (ACELP) or transform coded excitation (TCX), and the high-frequency signal is encoded using the low-frequency signal. A method of decoding of an audio signal, a low-frequency signal encoded through ACELP or TCX is decoded, a high-frequency signal is decoded using the low-frequency signal, the low-frequency signal and the high-frequency signal are combined to generate a mono signal, and the mono signal is upmixed by decoding spatial parameters regarding signals including two or more channel signals.
    Type: Grant
    Filed: March 26, 2013
    Date of Patent: February 4, 2014
    Assignee: Samsung Electronics Co., Ltd
    Inventors: Ho-sang Sung, Eun-mi Oh, Jung-hoe Kim, Ki-hyun Choo, Mi-young Kim
  • Patent number: 8634783
    Abstract: A communication device includes memory, an input interface, a processing module, and a transmitter. The processing module receives a digital signal from the input interface, wherein the digital signal includes a desired digital signal component and an undesired digital signal component. The processing module identifies one of a plurality of codebooks based on the undesired digital signal component. The processing module then identifies a codebook entry from the one of the plurality of codebooks based on the desired digital signal component to produce a selected codebook entry. The processing module then generates a coded signal based on the selected codebook entry, wherein the coded signal includes a substantially unattenuated representation of the desired digital signal component and an attenuated representation of the undesired digital signal component. The transmitter converts the coded signal into an outbound signal in accordance with a signaling protocol and transmits it.
    Type: Grant
    Filed: January 31, 2013
    Date of Patent: January 21, 2014
    Assignee: Broadcom Corporation
    Inventor: Nambirajan Seshadri
  • Patent number: 8635065
    Abstract: The present invention discloses an apparatus for automatic extraction of important events in audio signals comprising: signal input means for supplying audio signals; audio signal fragmenting means for partitioning audio signals supplied by the signal input means into audio fragments of a predetermined length and for allocating a sequence of one or more audio fragments to a respective audio window; feature extracting means for analyzing acoustic characteristics of the audio signals comprised in the audio fragments and for analyzing acoustic characteristics of the audio signals comprised in the audio windows; and important event extraction means for extracting important events in audio signals supplied by the audio signal fragmenting means based on predetermined important event classifying rules depending on acoustic characteristics of the audio signals comprised in the audio fragments and on acoustic characteristics of the audio signals comprised in the audio windows, wherein each important event extracted
    Type: Grant
    Filed: November 10, 2004
    Date of Patent: January 21, 2014
    Assignee: Sony Deutschland GmbH
    Inventors: Silke Goronzy-Thomae, Thomas Kemp, Ralf Kompe, Yin Hay Lam, Krzysztof Marasek, Raquel Tato