Frequency Element Patents (Class 704/268)
-
Patent number: 11869483Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.Type: GrantFiled: October 7, 2021Date of Patent: January 9, 2024Assignee: Nvidia CorporationInventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
-
Patent number: 11769481Abstract: Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.Type: GrantFiled: October 7, 2021Date of Patent: September 26, 2023Assignee: Nvidia CorporationInventors: Kevin Shih, Jose Rafael Valle Gomes da Costa, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro
-
Patent number: 11195541Abstract: A method and system for providing Gaussian weighted self-attention for speech enhancement are herein provided. According to one embodiment, the method includes receiving a input noise signal, generating a score matrix based on the received input noise signal, and applying a Gaussian weighted function to the generated score matrix.Type: GrantFiled: October 2, 2019Date of Patent: December 7, 2021Inventors: JaeYoung Kim, Mostafa El-Khamy, Jungwon Lee
-
Patent number: 11087771Abstract: A device includes an encoder and a transmitter. The encoder is configured to generate a first high-band portion of a first signal based on a left signal and a right signal. The encoder is also configured to generate a set of adjustment gain parameters based on a high-band non-reference signal. The high-band non-reference signal corresponds to one of a left high-band portion of the left signal or a right high-band portion of the right signal as a high-band non-reference signal. The transmitter is configured to transmit information corresponding to the first high-band portion of the first signal. The transmitter is also configured to transmit the set of adjustment gain parameters corresponding to the high-band non-reference signal.Type: GrantFiled: June 26, 2019Date of Patent: August 10, 2021Assignee: QUALCOMM IncorporatedInventors: Venkatraman Atti, Venkata Subrahmanyam Chandra Sekhar Chebiyyam
-
Patent number: 11011154Abstract: A method of performing speech synthesis, includes encoding character embeddings, using any one or any combination of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), applying a relative-position-aware self attention function to each of the character embeddings and an input mel-scale spectrogram, and encoding the character embeddings to which the relative-position-aware self attention function is applied. The method further includes concatenating the encoded character embeddings and the encoded character embeddings to which the relative-position-aware self attention function is applied, to generate an encoder output, applying a multi-head attention function to the encoder output and the input mel-scale spectrogram to which the relative-position-aware self attention function is applied, and predicting an output mel-scale spectrogram, based on the encoder output and the input mel-scale spectrogram to which the multi-head attention function is applied.Type: GrantFiled: February 8, 2019Date of Patent: May 18, 2021Assignee: TENCENT AMERICA LLCInventors: Shan Yang, Heng Lu, Shiyin Kang, Dong Yu
-
Patent number: 10127916Abstract: A method and apparatus for enhancing audio processing between a transmit radio (230) and a receive radio (240) are provided. Digitized audio frames (202) are applied to parallel inputs of both a vocoder encoder (204) and a trill encoder 212 of the transmit radio (230). The vocoder encoder (204) generates voice bits which are communicated over a voice bits channel (206) to a vocoder decoder (208) of the receive radio (240). Trill encoder (212) generates signaling bits which are communicated over a signaling bits channel (214) to a trill decoder (216) of the receive radio (240) for recovery of trill information (218). At the receive radio (240), a decoded audio signal (209) generated from the vocoder decoder (208), and the recovered trill information (218) are both provided as inputs to a trill reconstructor stage (220) to generate a recovered audio signal (222) having a reconstructed trill.Type: GrantFiled: April 24, 2014Date of Patent: November 13, 2018Assignee: MOTOROLA SOLUTIONS, INC.Inventors: Yi Gao, Qing Yan
-
Patent number: 9978359Abstract: A text-to-speech (TTS) processing system may be configured for iterative processing. Speech units for unit selection may be tagged according to extra segmental features, such as emotional features, dramatic features, etc. Preliminary TTS results based on input text may be provided to a user through a user interface. The user may offer corrections to the preliminary results. Those corrections may correspond to the extra segmental features. The user corrections may then be input into the TTS system along with the input text to provide refined TTS results. This process may be repeated iteratively to obtain desired TTS results.Type: GrantFiled: December 6, 2013Date of Patent: May 22, 2018Assignee: Amazon Technologies, Inc.Inventors: Michal Tadeusz Kaszczuk, Jeffrey Penrod Adams, Adam Franciszek Nadolski
-
Patent number: 9916834Abstract: An approach is described that obtains spectrum coefficients for a replacement frame of an audio signal. A tonal component of a spectrum of an audio signal is detected based on a peak that exists in the spectra of frames preceding a replacement frame. For the tonal component of the spectrum a spectrum coefficients for the peak and its surrounding in the spectrum of the replacement frame is predicted, and for the non-tonal component of the spectrum a non-predicted spectrum coefficient for the replacement frame or a corresponding spectrum coefficient of a frame preceding the replacement frame is used.Type: GrantFiled: December 21, 2015Date of Patent: March 13, 2018Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.Inventors: Janine Sukowski, Ralph Sperschneider, Goran Markovic, Wolfgang Jaegers, Christian Helmrich, Bernd Edler, Ralf Geiger
-
Patent number: 9711123Abstract: Provided is a voice synthesis device, including: a voice synthesis information acquisition unit configured to acquire voice synthesis information for specifying a sound generating character; a replacement unit configured to replace at least a part of sound generating characters specified by the voice synthesis information with an alternative sound generating character different from the sound generating character; and a voice synthesis unit configured to execute a second synthesis process for generating a voice signal of an utterance sound obtained by the replacing.Type: GrantFiled: November 6, 2015Date of Patent: July 18, 2017Assignee: YAMAHA CORPORATIONInventor: Motoki Ogasawara
-
Patent number: 9664518Abstract: One embodiment of an invention which computes a location based alignment of two tracks over a set route. Once aligned, a comparison of performance statistics is made at each position along the track. Time and distance gap information is also computed at each position. The results are then displayed in a plot (17) so one can see where different performance statistics changed, including time gap information (19). The data is also linked to a map (8) so one can visualize the locations more clearly. It is also possible to compare multiple tracks (25) to one reference track (23) for greater insight.Type: GrantFiled: August 22, 2011Date of Patent: May 30, 2017Assignee: Strava, Inc.Inventor: Paul Mach
-
Patent number: 9653085Abstract: A method and system for reconstructing an original audio signal is disclosed. The original audio signal has a baseband up to a cutoff frequency and high-frequency components not included in the baseband above the cutoff frequency. The system includes a bitstream deformatter that extracts a representation of the baseband, an estimated spectral envelope, and noise-blending parameters from an audio bitstream. The system also includes a spectral component regenerator that copies or translates all or at least some of the baseband spectral components to non-overlapping frequency ranges of the high-frequency components not included in the baseband to generate regenerated spectral components. The system further includes a gain adjuster that modifies a spectral envelope of the regenerated spectral components based at least in part on the estimated spectral envelope and the noise-blending parameters to generate gain-adjusted regenerated spectral components.Type: GrantFiled: December 6, 2016Date of Patent: May 16, 2017Assignee: Dolby Laboratories Licensing CorporationInventors: Michael M. Truman, Mark S. Vinton
-
Patent number: 9437194Abstract: A voice control method is applied in an electronic device. The electronic device includes a voice input unit, a play unit, and a storage unit storing a conversation database and an association table between different ranges of voice characteristics and styles of response voice. The method includes the following steps. Obtaining voice signals input via the voice input unit. Determining which content is input according to the obtained voice signals. Searching in the conversation database to find a response corresponding to the input content. Analyzing voice characteristics of the obtained voice signals. Comparing the voice characteristics of the obtained voice signals with the pre-stored ranges. Selecting the associated response voice. Finally, outputting the found response using the associated response voice via the play unit.Type: GrantFiled: May 21, 2013Date of Patent: September 6, 2016Assignees: Fu Tai Hua Industry (Shenzhen) Co., Ltd., HON HAI PRECISION INDUSTRY CO., LTD.Inventor: Ren-Wen Huang
-
Patent number: 9401138Abstract: A segment information generation device includes a waveform cutout unit cuts out a speech waveform from natural speech at a time period not depending on a pitch frequency of the natural speech. A feature parameter extraction unit extracts a feature parameter of a speech waveform from the speech waveform cut out by the waveform cutout unit. A time domain waveform generation unit generates a time domain waveform based on the feature parameter.Type: GrantFiled: May 10, 2012Date of Patent: July 26, 2016Assignee: NEC CorporationInventor: Masanori Kato
-
Patent number: 9368104Abstract: A system and method for realistic speech synthesis which converts text into synthetic human speech with qualities appropriate to the context such as the language and dialect of the speaker, as well as expanding a speaker's phonetic inventory to produce more natural sounding speech.Type: GrantFiled: March 15, 2013Date of Patent: June 14, 2016Assignee: SRC, INC.Inventors: David Donald Eller, Steven Brian Morphet, Watson Brent Boyett
-
Patent number: 9350586Abstract: A signal decoder in a communication system is for decoding signal elements in a communication signal having interleaved carrier frequencies. The decoder receives antenna signals in a frequency domain, and has a multiplier for multiplying the antenna signals by a complex-valued mathematical sequence such as the Zadoff-Chu sequence, to generate multiplied antenna signals. An inverse frequency to time converter converts the multiplied antenna signals to time domain signals. A signal quality detector detects a signal quality from the time domain signals based on a subset of the carrier frequencies. The complex-valued mathematical sequence is provided with zero values corresponding to carrier frequencies that are not included in the subset, and the inverse frequency to time converter has a transform size corresponding to the multiplied antenna signals including all carrier frequencies.Type: GrantFiled: August 25, 2014Date of Patent: May 24, 2016Assignee: FREESCALE SEMICONDUCTOR, INC.Inventor: Vincent Pierre Martinez
-
Patent number: 9286885Abstract: In a method of generating speech from text the speech segments necessary to put together the text to be output as speech by a terminal are determined; it is checked, which speech segments are already present in the terminal and which ones need to be transmitted from a server to the terminal; the segments to be transmitted to the terminal are indexed; the speech segments and the indices of segments to be output at the terminal are transmitted; an index sequence of speech segments to be put together to form the speech to be output is transmitted; and the segments are concatenated according to the index sequence. This method allows to realize a distributed speech synthesis system requiring only a low transmission capacity, a small memory and low computational power in the terminal.Type: GrantFiled: April 6, 2004Date of Patent: March 15, 2016Assignee: Alcatel LucentInventors: Jürgen Sienel, Dieter Kopp
-
Patent number: 9275631Abstract: Waveform concatenation speech synthesis with high sound quality. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. An accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values.Type: GrantFiled: December 31, 2012Date of Patent: March 1, 2016Assignee: Nuance Communications, Inc.Inventors: Ryuki Tachibana, Masafumi Nishimura
-
Patent number: 9070365Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.Type: GrantFiled: September 10, 2014Date of Patent: June 30, 2015Assignee: Morphism LLCInventor: James H. Stephens, Jr.
-
Patent number: 9058811Abstract: According to one embodiment, a method, apparatus for synthesizing speech, and a method for training acoustic model used in speech synthesis is provided. The method for synthesizing speech may include determining data generated by text analysis as fuzzy heteronym data, performing fuzzy heteronym prediction on the fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and probabilities thereof, generating fuzzy context feature labels based on the plurality of candidate pronunciations and probabilities thereof, determining model parameters for the fuzzy context feature labels based on acoustic model with fuzzy decision tree, generating speech parameters from the model parameters, and synthesizing the speech parameters via synthesizer as speech.Type: GrantFiled: February 22, 2012Date of Patent: June 16, 2015Assignee: Kabushiki Kaisha ToshibaInventors: Xi Wang, Xiaoyan Lou, Jian Li
-
Patent number: 9026440Abstract: The present invention relates to means and methods of automated difference recognition between speech and music signals in voice communication systems, devices, telephones, and methods, and more specifically, to systems, devices, and methods that automate control when either speech or music is detected over communication links. The present invention provides a novel system and method for monitoring the audio signal, analyze selected audio signal components, compare the results of analysis with a pre-determined threshold value, and classify the audio signal either as speech or music.Type: GrantFiled: March 21, 2014Date of Patent: May 5, 2015Inventor: Alon Konchitsky
-
Publication number: 20150106102Abstract: A method includes determining, at a speech encoder, first gain shape parameters based on a harmonically extended signal and/or based on a high-band residual signal associated with a high-band portion of an audio signal. The method also includes determining second gain shape parameters based on a synthesized high-band signal and based on the high-band portion of the audio signal. The method further includes inserting the first gain parameters and the second gain shape parameters into an encoded version of the audio signal to enable gain adjustment during reproduction of the audio signal from the encoded version of the audio signal.Type: ApplicationFiled: October 7, 2014Publication date: April 16, 2015Inventors: Venkata Subrahmanyam Chandra Sekhar Chebiyyam, Venkatraman S. Atti
-
Patent number: 9009052Abstract: Herein provided is a system for singing synthesis capable of reflecting not only pitch and dynamics changes but also timbre changes of a user's singing. A spectral transform surface generating section 119 temporally concatenates all the spectral transform curves estimated by a second spectral transform curve estimating section 117 to define a spectral transform surface. A synthesized audio signal generating section 121 generates a transform spectral envelope at each instant of time by scaling a reference spectral envelope based on the spectral transform surface. Then, the synthesized audio signal generating section 121 generates an audio signal of a synthesized singing voice reflecting timbre changes of an input singing voice, based on the transform spectral envelope and a fundamental frequency contained in a reference singing voice source data.Type: GrantFiled: July 19, 2011Date of Patent: April 14, 2015Assignee: National Institute of Advanced Industrial Science and TechnologyInventors: Tomoyasu Nakano, Masataka Goto
-
Patent number: 8977552Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.Type: GrantFiled: May 28, 2014Date of Patent: March 10, 2015Assignee: AT&T Intellectual Property II, L.P.Inventors: Alistair D. Conkie, Ann K. Syrdal
-
Publication number: 20150025892Abstract: A system and method for speech-to-singing synthesis is provided. The method includes deriving characteristics of a singing voice for a first individual and modifying vocal characteristics of a voice for a second individual in response to the characteristics of the singing voice of the first individual to generate a synthesized singing voice for the second individual.Type: ApplicationFiled: March 6, 2013Publication date: January 22, 2015Applicant: Agency for Science, Technology and ResearchInventors: Siu Wa Lee, Ling Cen, Haizhou Li, Yaozhu Paul Chan, Minghui Dong
-
Patent number: 8924204Abstract: Unlike sound based pressure waves that go everywhere, air turbulence caused by wind is usually a fairly local event. Therefore, in a system that utilizes two or more spatially separated microphones to pick up sound signals (e.g., speech), wind noise picked up by one of the microphones often will not be picked up (or at least not to the same extent) by the other microphone(s). Embodiments of methods and apparatuses that utilize this fact and others to effectively detect and suppress wind noise using multiple microphones that are spatially separated are described.Type: GrantFiled: September 30, 2011Date of Patent: December 30, 2014Assignee: Broadcom CorporationInventors: Juin-Hwey Chen, Jes Thyssen, Xianxian Zhang, Huaiyu Zeng
-
Patent number: 8909538Abstract: Improved methods of presenting speech prompts to a user as part of an automated system that employs speech recognition or other voice input are described. The invention improves the user interface by providing in combination with at least one user prompt seeking a voice response, an enhanced user keyword prompt intended to facilitate the user selecting a keyword to speak in response to the user prompt. The enhanced keyword prompts may be the same words as those a user can speak as a reply to the user prompt but presented using a different audio presentation method, e.g., speech rate, audio level, or speaker voice, than used for the user prompt. In some cases, the user keyword prompts are different words from the expected user response keywords, or portions of words, e.g., truncated versions of keywords.Type: GrantFiled: November 11, 2013Date of Patent: December 9, 2014Assignee: Verizon Patent and Licensing Inc.Inventor: James Mark Kondziela
-
Patent number: 8898062Abstract: A strained-rough-voice conversion unit (10) is included in a voice conversion device that can generate a “strained rough” voice produced in a part of a speech when speaking forcefully with excitement, nervousness, anger, or emphasis and thereby richly express vocal expression such as anger, excitement, or an animated or lively way of speaking, using voice quality change. The strained-rough-voice conversion unit (10) includes: a strained phoneme position designation unit (11) designating a phoneme to be uttered as a “strained rough” voice in a speech; and an amplitude modulation unit (14) performing modulation including periodic amplitude fluctuation on a speech waveform.Type: GrantFiled: January 22, 2008Date of Patent: November 25, 2014Assignee: Panasonic Intellectual Property Corporation of AmericaInventors: Yumiko Kato, Takahiro Kamai
-
Patent number: 8898057Abstract: Disclosed is an encoding apparatus that can efficiently encode a signal that is a broad or extra-broad band signal or the like, thereby improving the quality of a decoded signal. This encoding apparatus includes a band establishing unit (301) that generate, based on the characteristic of the input signal, band establishment information to be used for dividing the band of the input signal to establish a first band part of lower frequency side and a second band part of higher frequency side; a lower frequency encoding unit (302) for encoding, based on the band establishment information, the input signal of the first band part to generate encoded lower frequency part information; and a higher frequency encoding unit (303) for encoding, based on the band establishment information, the input signal of the second band part to generate encoded higher frequency part information.Type: GrantFiled: October 22, 2010Date of Patent: November 25, 2014Assignee: Panasonic Intellectual Property Corporation of AmericaInventor: Tomofumi Yamanashi
-
Patent number: 8886539Abstract: The present invention discloses a parametrical representation of prosody based on polynomial expansion coefficients of the pitch contour near the center of each syllable. The said syllable pitch expansion coefficients are generated from a recorded speech database, read from a number of sentences by a reference speaker. By correlating the stress level and context information of each syllable in the text with the polynomial expansion coefficients of the corresponding spoken syllable, a correlation database is formed. To generate prosody for an input text, stress level and context information of each syllable in the text is identified. The prosody is generated by using the said correlation database to find the best set of pitch parameters for each syllable. By adding to global pitch contours and using interpolation formulas, complete pitch contour for the input text is generated. Duration and intensity profile are generated using a similar procedure.Type: GrantFiled: March 17, 2014Date of Patent: November 11, 2014Inventor: Chengjun Julian Chen
-
Patent number: 8886538Abstract: Systems and methods for speech synthesis and, in particular, text-to-speech systems and methods for converting a text input to a synthetic waveform by processing prosodic and phonetic content of a spoken example of the text input to accurately mimic the input speech style and pronunciation. Systems and methods provide an interface to a TTS system to allow a user to input a text string and a spoken utterance of the text string, extract prosodic parameters from the spoken input, and process the prosodic parameters to derive corresponding markup for the text input to enable a more natural sounding synthesized speech.Type: GrantFiled: September 26, 2003Date of Patent: November 11, 2014Assignee: Nuance Communications, Inc.Inventors: Andy Aaron, Raimo Bakis, Ellen M. Eide, Wael M. Hamza
-
Patent number: 8868432Abstract: A method for decoding an audio signal having a bandwidth that extends beyond a bandwidth of a CELP excitation signal in an audio decoder including a CELP-based decoder element. The method includes obtaining a second excitation signal having an audio bandwidth extending beyond the audio bandwidth of the CELP excitation signal, obtaining a set of signals by filtering the second excitation signal with a set of bandpass filters, scaling the set of signals using a set of energy-based parameters, and obtaining a composite output signal by combining the scaled set of signals with a signal based on the audio signal decoded by the CELP-based decoder element.Type: GrantFiled: September 28, 2011Date of Patent: October 21, 2014Assignee: Motorola Mobility LLCInventors: Jonathan A. Gibbs, James P. Ashley, Udar Mittal
-
Patent number: 8868431Abstract: A recognition dictionary creation device identifies the language of a reading of an inputted text which is a target to be registered and adds a reading with phonemes in the language identified thereby to the target text to be registered, and also converts the reading of the target text to be registered from the phonemes in the language identified thereby to phonemes in a language to be recognized which is handled in voice recognition to create a recognition dictionary in which the converted reading of the target text to be registered is registered.Type: GrantFiled: February 5, 2010Date of Patent: October 21, 2014Assignee: Mitsubishi Electric CorporationInventors: Michihiro Yamazaki, Jun Ishii, Yasushi Ishikawa
-
Patent number: 8868422Abstract: According to one embodiment, a method for editing speech is disclosed. The method can generate speech information from a text. The speech information includes phonologic information and prosody information. The method can divide the speech information into a plurality of speech units, based on at least one of the phonologic information and the prosody information. The method can search at least two speech units from the plurality of speech units. At least one of the phonologic information and the prosody information in the at least two speech units are identical or similar. In addition, the method can store a speech unit waveform corresponding to one of the at least two speech units as a representative speech unit into a memory.Type: GrantFiled: September 13, 2010Date of Patent: October 21, 2014Assignee: Kabushiki Kaisha ToshibaInventors: Gou Hirabayashi, Takehiko Kagoshima
-
Patent number: 8856012Abstract: A method of encoding an audio signal, where signals including two or more channel signals are downmixed to a mono signal, the mono signal is divided into a low-frequency signal and a high-frequency signal, the low-frequency signal is encoded through algebraic code excited linear prediction (ACELP) or transform coded excitation (TCX), and the high-frequency signal is encoded using the low-frequency signal. A method of decoding of an audio signal, a low-frequency signal encoded through ACELP or TCX is decoded, a high-frequency signal is decoded using the low-frequency signal, the low-frequency signal and the high-frequency signal are combined to generate a mono signal, and the mono signal is upmixed by decoding spatial parameters regarding signals including two or more channel signals.Type: GrantFiled: February 3, 2014Date of Patent: October 7, 2014Assignee: SAMSUNG Electronics Co., Ltd.Inventors: Ho-sang Sung, Eun-mi Oh, Jung-hoe Kim, Ki-hyun Choo, Mi-young Kim
-
Patent number: 8856008Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.Type: GrantFiled: September 18, 2013Date of Patent: October 7, 2014Assignee: Morphism LLCInventor: James H. Stephens, Jr.
-
Publication number: 20140288938Abstract: To improve the intelligibility of speech for users with high-frequency hearing loss, the present systems and methods provide an improved frequency lowering system with enhancement of spectral features responsive to place-of-articulation of the input speech. High frequency components of speech, such as fricatives, may be classified based on one or more features that distinguish place of articulation, including spectral slope, peak location, relative amplitudes in various frequency bands, or a combination of these or other such features. Responsive to the classification of the input speech, a signal or signals may be added to the input speech in a frequency band audible to the hearing-impaired listener, said signal or signals having predetermined distinct spectral features corresponding to the classification, and allowing a listener to easily distinguish various consonants in the input.Type: ApplicationFiled: November 1, 2012Publication date: September 25, 2014Inventor: Ying-Yee Kong
-
Patent number: 8812324Abstract: The invention relates to a method for speech signal analysis, modification and synthesis comprising a phase for the location of analysis windows by means of an iterative process for the determination of the phase of the first sinusoidal component and comparison between the phase value of said component and a predetermined value, a phase for the selection of analysis frames corresponding to an allophone and readjustment of the duration and the fundamental frequency according to certain thresholds and a phase for the generation of synthetic speech from synthesis frames taking the information of the closest analysis frame as spectral information of the synthesis frame and taking as many synthesis frames as periods that the synthetic signal has. The method allows a coherent location of the analysis windows within the periods of the signal and the exact generation of the synthesis instants in a manner synchronous with the fundamental period.Type: GrantFiled: December 21, 2010Date of Patent: August 19, 2014Assignee: Telefonica, S.A.Inventors: Miguel Angel Rodriguez Crespo, Jose Gregorio Escalada Sardina, Ana Armenta Lopez de Vicuna
-
Publication number: 20140222434Abstract: An audio signal synthesizer generates a synthesis audio signal having a first frequency band and a second synthesized frequency band derived from the first frequency band and comprises a patch generator, a spectral converter, a raw signal processor and a combiner. The patch generator performs at least two different patching algorithms, each patching algorithm generating a raw signal. The patch generator is adapted to select one of the at least two different patching algorithms in response to a control information. The spectral converter converts the raw signal into a raw signal spectral representation. The raw signal processor processes the raw signal spectral representation in response to spectral domain spectral band replication parameters to obtain an adjusted raw signal spectral representation.Type: ApplicationFiled: April 10, 2014Publication date: August 7, 2014Applicant: Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.Inventors: Frederik NAGEL, Sascha DISCH, Nikolaus RETTELBACH, Max NEUENDORF, Bernhard GRILL, Ulrich KRAEMER, Stefan WABNIK
-
Patent number: 8744851Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.Type: GrantFiled: August 13, 2013Date of Patent: June 3, 2014Assignee: AT&T Intellectual Property II, L.P.Inventors: Alistair Conkie, Ann K Syrdal
-
Patent number: 8738381Abstract: A prosody generation apparatus capable of suppressing distortion that occurs when generating prosodic patterns and therefore generating a natural prosody is provided. A prosody changing point extraction unit in this apparatus extracts a prosody changing point located at the beginning and the ending of a sentence, the beginning and the ending of a breath group, an accent position and the like. A selection rule and a transformation rule of a prosodic pattern including the prosody changing point is generated by means of a statistical or learning technique and the thus generate rules are stored in a representative prosodic pattern selection rule table and a transformation rule table beforehand. A pattern selection unit selects a representative prosodic pattern from the representative prosodic pattern selection rule table according to the selection rule.Type: GrantFiled: January 17, 2007Date of Patent: May 27, 2014Assignee: Panasonic CorporationInventors: Yumiko Kato, Takahiro Kamai
-
Patent number: 8706493Abstract: In one embodiment of a controllable prosody re-estimation system, a TTS/STS engine consists of a prosody prediction/estimation module, a prosody re-estimation module and a speech synthesis module. The prosody prediction/estimation module generates predicted or estimated prosody information. And then the prosody re-estimation module re-estimates the predicted or estimated prosody information and produces new prosody information, according to a set of controllable parameters provided by a controllable prosody parameter interface. The new prosody information is provided to the speech synthesis module to produce a synthesized speech.Type: GrantFiled: July 11, 2011Date of Patent: April 22, 2014Assignee: Industrial Technology Research InstituteInventors: Cheng-Yuan Lin, Chien-Hung Huang, Chih-Chung Kuo
-
Patent number: 8706497Abstract: A synthesis filter 106 synthesizes a plurality of wide-band speech signals by combining wide-band phoneme signals and sound source signals from a speech signal code book 105, and a distortion evaluation unit 107 selects one of the wide-band speech signals with a minimum waveform distortion with respect to an up-sampled narrow-band speech signal output from a sampling conversion unit 101. A first bandpass filter 103 extracts a frequency component outside a narrow-band of the wide-band speech signal and a band synthesis unit 104 combines it with the up-sampled narrow-band speech signal.Type: GrantFiled: October 22, 2010Date of Patent: April 22, 2014Assignee: Mitsubishi Electric CorporationInventors: Satoru Furuta, Hirohisa Tasaki
-
Patent number: 8706496Abstract: A sequence is received of time domain digital audio samples representing sound (e.g., a sound generated by a human voice or a musical instrument). The time domain digital audio samples are processed to derive a corresponding sequence of audio pulses in the time domain. Each of the audio pulses is associated with a characteristic frequency. Frequency domain information is derived about each of at least some of the audio pulses. The sound represented by the time domain digital audio samples is transformed by processing the audio pulses using the frequency domain information. The sound transformation utilizes overlapping windows and a computational cost function which depends on a product of the number of the pitch periods and the inverse of the minimum fundamental frequency within the window is determined.Type: GrantFiled: September 13, 2007Date of Patent: April 22, 2014Assignee: Universitat Pompeu FabraInventor: Jordi Bonada Sanjaume
-
Patent number: 8682654Abstract: Disclosed are systems, methods, and computer readable media having programs for classifying sports video. In one embodiment, a method includes: extracting, from an audio stream of a video clip, a plurality of key audio components contained therein; and classifying, using at least one of the plurality of key audio components, a sport type contained in the video clip. In one embodiment, a computer readable medium having a computer program for classifying ports video includes: logic configured to extract a plurality of key audio components from a video clip; and logic configured to classify a sport type corresponding to the video clip.Type: GrantFiled: April 25, 2006Date of Patent: March 25, 2014Assignee: Cyberlink Corp.Inventors: Ming-Jun Chen, Jiun-Fu Chen, Shih-Min Tang, Ho-Chao Huang
-
Patent number: 8655650Abstract: A method is provided for decoding data streams in a voice communication system. The method includes: receiving two or more data streams having voice data encoded therein; decoding each data stream into a set of speech coding parameters; forming a set of combined speech coding parameters by combining the sets of decoded speech coding parameters, where speech coding parameters of a given type are combined with speech coding parameters of the same type; and inputting the set of combined speech coding parameters into a speech synthesizer.Type: GrantFiled: March 28, 2007Date of Patent: February 18, 2014Assignee: Harris CorporationInventor: Mark W. Chamberlain
-
Patent number: 8655659Abstract: A personalized text-to-speech synthesizing device includes: a personalized speech feature library creator, configured to recognize personalized speech features of a specific speaker by comparing a random speech fragment of the specific speaker with preset keywords, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker and created by the personalized speech feature library creator, thereby to generate and output a speech fragment having pronunciation characteristics of the specific speaker.Type: GrantFiled: August 12, 2010Date of Patent: February 18, 2014Assignees: Sony Corporation, Sony Mobile Communications ABInventors: Qingfang Wang, Shouchun He
-
Patent number: 8645126Abstract: A method of encoding an audio signal, where signals including two or more channel signals are downmixed to a mono signal, the mono signal is divided into a low-frequency signal and a high-frequency signal, the low-frequency signal is encoded through algebraic code excited linear prediction (ACELP) or transform coded excitation (TCX), and the high-frequency signal is encoded using the low-frequency signal. A method of decoding of an audio signal, a low-frequency signal encoded through ACELP or TCX is decoded, a high-frequency signal is decoded using the low-frequency signal, the low-frequency signal and the high-frequency signal are combined to generate a mono signal, and the mono signal is upmixed by decoding spatial parameters regarding signals including two or more channel signals.Type: GrantFiled: March 26, 2013Date of Patent: February 4, 2014Assignee: Samsung Electronics Co., LtdInventors: Ho-sang Sung, Eun-mi Oh, Jung-hoe Kim, Ki-hyun Choo, Mi-young Kim
-
Patent number: 8645140Abstract: A method of associating a voice font with a contact for text-to-speech conversion at an electronic device includes obtaining, at the electronic device, the voice font for the contact, and storing the voice font in association with a contact data record stored in a contacts database at the electronic device. The contact data record includes contact data for the contact.Type: GrantFiled: February 25, 2009Date of Patent: February 4, 2014Assignee: BlackBerry LimitedInventor: Yuriy Lobzakov
-
Patent number: 8635065Abstract: The present invention discloses an apparatus for automatic extraction of important events in audio signals comprising: signal input means for supplying audio signals; audio signal fragmenting means for partitioning audio signals supplied by the signal input means into audio fragments of a predetermined length and for allocating a sequence of one or more audio fragments to a respective audio window; feature extracting means for analyzing acoustic characteristics of the audio signals comprised in the audio fragments and for analyzing acoustic characteristics of the audio signals comprised in the audio windows; and important event extraction means for extracting important events in audio signals supplied by the audio signal fragmenting means based on predetermined important event classifying rules depending on acoustic characteristics of the audio signals comprised in the audio fragments and on acoustic characteristics of the audio signals comprised in the audio windows, wherein each important event extractedType: GrantFiled: November 10, 2004Date of Patent: January 21, 2014Assignee: Sony Deutschland GmbHInventors: Silke Goronzy-Thomae, Thomas Kemp, Ralf Kompe, Yin Hay Lam, Krzysztof Marasek, Raquel Tato
-
Patent number: 8634783Abstract: A communication device includes memory, an input interface, a processing module, and a transmitter. The processing module receives a digital signal from the input interface, wherein the digital signal includes a desired digital signal component and an undesired digital signal component. The processing module identifies one of a plurality of codebooks based on the undesired digital signal component. The processing module then identifies a codebook entry from the one of the plurality of codebooks based on the desired digital signal component to produce a selected codebook entry. The processing module then generates a coded signal based on the selected codebook entry, wherein the coded signal includes a substantially unattenuated representation of the desired digital signal component and an attenuated representation of the undesired digital signal component. The transmitter converts the coded signal into an outbound signal in accordance with a signaling protocol and transmits it.Type: GrantFiled: January 31, 2013Date of Patent: January 21, 2014Assignee: Broadcom CorporationInventor: Nambirajan Seshadri