Correlation Patents (Class 704/263)

Generating a synthetic voice using neural networks

Patent number: 11935515

Abstract: A method of generating a synthetic voice by capturing audio data, cutting it into discrete phoneme and pitch segments, forming superior phoneme and pitch segments by averaging segments having similar phoneme, pitch, and other sound qualities, and training neural networks to correctly concatenate the segments.

Type: Grant

Filed: December 27, 2021

Date of Patent: March 19, 2024

Inventor: Claude Polonov
Method for synthesized speech generation using emotion information correction and apparatus

Patent number: 11636845

Abstract: A method includes generating first synthesized speech by using text and a first emotion vector configured for the text, extracting a second emotion vector included in the first synthesized speech, determining whether correction of the second emotion information vector is needed by comparing a loss value calculated by using the first emotion information vector and the second emotion information vector with a preconfigured threshold, re-performing speech synthesis by using a third emotion information vector generated by correcting the second emotion information vector, and outputting the generated synthesized speech, thereby configuring emotion information of speech in a more effective manner. A speech synthesis apparatus may be associated with an artificial intelligence module, drone (unmanned aerial vehicle, UAV), robot, augmented reality (AR) devices, virtual reality (VR) devices, devices related to 5G services, and the like.

Type: Grant

Filed: July 14, 2020

Date of Patent: April 25, 2023

Assignee: LG ELECTRONICS INC.

Inventors: Siyoung Yang, Yongchul Park, Sungmin Han, Sangki Kim, Juyeong Jang, Minook Kim
Learning multimedia semantics from large-scale unstructured data

Patent number: 9875301

Abstract: Systems and methods for learning topic models from unstructured data and applying the learned topic models to recognize semantics for new data items are described herein. In at least one embodiment, a corpus of multimedia data items associated with a set of labels may be processed to generate a refined corpus of multimedia data items associated with the set of labels. Such processing may include arranging the multimedia data items in clusters based on similarities of extracted multimedia features and generating intra-cluster and inter-cluster features. The intra-cluster and the inter-cluster features may be used for removing multimedia data items from the corpus to generate the refined corpus. The refined corpus may be used for training topic models for identifying labels. The resulting models may be stored and subsequently used for identifying semantics of a multimedia data item input by a user.

Type: Grant

Filed: April 30, 2014

Date of Patent: January 23, 2018

Assignee: Microsoft Technology Licensing, LLC

Inventors: Xian-Sheng Hua, Jin Li, Yoshitaka Ushiku
Sparse maximum a posteriori (map) adaption

Patent number: 8972258

Abstract: Techniques disclosed herein include using a Maximum A Posteriori (MAP) adaptation process that imposes sparseness constraints to generate acoustic parameter adaptation data for specific users based on a relatively small set of training data. The resulting acoustic parameter adaptation data identifies changes for a relatively small fraction of acoustic parameters from a baseline acoustic speech model instead of changes to all acoustic parameters. This results in user-specific acoustic parameter adaptation data that is several orders of magnitude smaller than storage amounts otherwise required for a complete acoustic model. This provides customized acoustic speech models that increase recognition accuracy at a fraction of expected data storage requirements.

Type: Grant

Filed: May 22, 2014

Date of Patent: March 3, 2015

Assignee: Nuance Communications, Inc.

Inventors: Vaibhava Goel, Peder A. Olsen, Steven J. Rennie, Jing Huang
Vector joint encoding/decoding method and vector joint encoder/decoder

Patent number: 8930200

Abstract: A vector joint encoding/decoding method and a vector joint encoder/decoder are provided, more than two vectors are jointly encoded, and an encoding index of at least one vector is split and then combined between different vectors, so that encoding idle spaces of different vectors can be recombined, thereby facilitating saving of encoding bits, and because an encoding index of a vector is split and then shorter split indexes are recombined, thereby facilitating reduction of requirements for the bit width of operating parts in encoding/decoding calculation.

Type: Grant

Filed: July 24, 2013

Date of Patent: January 6, 2015

Assignee: Huawei Technologies Co., Ltd

Inventors: Fuwei Ma, Dejun Zhang, Lei Miao, Fengyan Qi
HYBRID PREDICTIVE MODEL FOR ENHANCING PROSODIC EXPRESSIVENESS

Publication number: 20140358547

Abstract: Systems and methods for prosody prediction include extracting features from runtime data using a parametric model. The features from runtime data are compared with features from training data using an exemplar-based model to predict prosody of the runtime data. The features from the training data are paired with exemplars from the training data and stored on a computer readable storage medium.

Type: Application

Filed: September 17, 2013

Publication date: December 4, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Raul Fernandez, Asaf Rendel
HYBRID PREDICTIVE MODEL FOR ENHANCING PROSODIC EXPRESSIVENESS

Publication number: 20140358546

Abstract: Systems and methods for prosody prediction include extracting features from runtime data using a parametric model. The features from runtime data are compared with features from training data using an exemplar-based model to predict prosody of the runtime data. The features from the training data are paired with exemplars from the training data and stored on a computer readable storage medium.

Type: Application

Filed: August 28, 2013

Publication date: December 4, 2014

Applicant: International Business Machines Corporation

Inventors: Raul Fernandez, Asaf Rendel
Speech synthesis from acoustic units with default values of concatenation cost

Patent number: 8788268

Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. When a pair of acoustic units in the database does not have an associated concatenation cost, the system assigns a default concatenation cost. The system then synthesizes speech, identifies the acoustic unit sequential pairs generated and their respective concatenation costs, and stores those concatenation costs likely to occur.

Type: Grant

Filed: November 19, 2012

Date of Patent: July 22, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Mark Charles Beutnagel, Mehryar Mohri, Michael Dennis Riley
Methods and systems for automated generation of nativized multi-lingual lexicons

Patent number: 8768704

Abstract: An input signal that includes linguistic content in a first language may be received by a computing device. The linguistic content may include text or speech. Based on an acoustic feature comparison between a plurality of first-language speech sounds and a plurality of second-language speech sounds, the computing device may associate the linguistic content in the first language with one or more phonemes from a second language. The computing device may also determine a phonemic representation of the linguistic content in the first language based on use of the one or more phonemes from the second language. The phonemic representation may be indicative of a pronunciation of the linguistic content in the first language according to speech sounds of the second language.

Type: Grant

Filed: October 14, 2013

Date of Patent: July 1, 2014

Assignee: Google Inc.

Inventors: Javier Gonzalvo Fructuoso, Ioannis Agiomyrgiannakis
Sparse maximum a posteriori (MAP) adaptation

Patent number: 8738376

Abstract: Techniques disclosed herein include using a Maximum A Posteriori (MAP) adaptation process that imposes sparseness constraints to generate acoustic parameter adaptation data for specific users based on a relatively small set of training data. The resulting acoustic parameter adaptation data identifies changes for a relatively small fraction of acoustic parameters from a baseline acoustic speech model instead of changes to all acoustic parameters. This results in user-specific acoustic parameter adaptation data that is several orders of magnitude smaller than storage amounts otherwise required for a complete acoustic model. This provides customized acoustic speech models that increase recognition accuracy at a fraction of expected data storage requirements.

Type: Grant

Filed: October 28, 2011

Date of Patent: May 27, 2014

Assignee: Nuance Communications, Inc.

Inventors: Vaibhava Goel, Peder A. Olsen, Steven J. Rennie, Jing Huang
Computer-implemented system and method for identifying and masking special information within recorded speech

Patent number: 8731938

Abstract: A computer-implemented system and method for identifying and masking special information within recorded speech is provided. A field for entry of special information is identified. Movement of a pointer device along a trajectory towards the field is also identified. A correlation of the pointer device movement and entry of the special information is determined based on a location of the trajectory in relation to the field. A threshold is applied to the correlation. The special information is received as verbal speech. A recording of the special information is rendered unintelligible when the threshold is satisfied.

Type: Grant

Filed: April 26, 2013

Date of Patent: May 20, 2014

Assignee: Intellisist, Inc.

Inventor: G. Kevin Doren
Controllable prosody re-estimation system and method and computer program product thereof

Patent number: 8706493

Abstract: In one embodiment of a controllable prosody re-estimation system, a TTS/STS engine consists of a prosody prediction/estimation module, a prosody re-estimation module and a speech synthesis module. The prosody prediction/estimation module generates predicted or estimated prosody information. And then the prosody re-estimation module re-estimates the predicted or estimated prosody information and produces new prosody information, according to a set of controllable parameters provided by a controllable prosody parameter interface. The new prosody information is provided to the speech synthesis module to produce a synthesized speech.

Type: Grant

Filed: July 11, 2011

Date of Patent: April 22, 2014

Assignee: Industrial Technology Research Institute

Inventors: Cheng-Yuan Lin, Chien-Hung Huang, Chih-Chung Kuo
System and method for adaptive overlap and add length estimation

Patent number: 8576961

Abstract: A method for determining an overlap and add length estimate comprises determining a plurality of correlation values of a plurality of ordered frequency domain samples obtained from a data frame; comparing the correlation values of a first subset of the samples to a first predetermined threshold to determine a first edge sample; comparing the correlation values of a second subset of the samples to a second predetermined threshold to determine a second edge sample; using the first and second edge samples to determine an overlap and add length estimate; and providing the overlap and add length estimate to an overlap and add circuit.

Type: Grant

Filed: June 15, 2009

Date of Patent: November 5, 2013

Assignee: Olympus Corporation

Inventors: Haidong Zhu, Dumitru Mihai Ionescu, Abu Amanullah
Method and device for fast algebraic codebook search in speech and audio coding

Patent number: 8566106

Abstract: A method and device for searching an algebraic codebook during encoding of a sound signal, wherein the algebraic codebook comprises a set of codevectors formed of a number of pulse positions and a number of pulses distributed over the pulse positions. In the algebraic codebook searching method and device, a reference signal for use in searching the algebraic codebook is calculated. In a first stage, a position of a first pulse is determined in relation with the reference signal and among the number of pulse positions. In each of a number of stages subsequent to the first stage, (a) an algebraic codebook gain is recomputed, (b) the reference signal is updated using the recomputed algebraic codebook gain and (c) a position of another pulse is determined in relation with the updated reference signal and among the number of pulse positions.

Type: Grant

Filed: September 11, 2008

Date of Patent: October 22, 2013

Assignee: Voiceage Corporation

Inventors: Redwan Salami, Vaclav Eksler, Milan Jelinek
Signal distortion elimination apparatus, method, program, and recording medium having the program recorded thereon

Patent number: 8494845

Abstract: Provided is a signal distortion elimination apparatus comprising: an inverse filter application means that outputs the signal obtained by applying an inverse filter to an observed signal as a restored signal when a predetermined iteration termination condition is met and outputs the signal obtained by applying the inverse filter to the observed signal as an ad-hoc signal when the predetermined iteration termination condition is not met; a prediction error filter calculation means that segments the ad-hoc signal into frames and outputs a prediction error filter of each frame obtained by performing linear prediction analysis of the ad-hoc signal of each frame; an inverse filter calculation means that calculates an inverse filter such that a concatenation of innovation estimates of the respective frames becomes mutually independent among their samples, where the innovation estimate of a single frame (an innovation estimate) is the signal obtained by applying the prediction error filter of the corresponding frame

Type: Grant

Filed: February 16, 2007

Date of Patent: July 23, 2013

Assignee: Nippon Telegraph and Telephone Corporation

Inventors: Takuya Yoshioka, Takafumi Hikichi, Masato Miyoshi
Speech analyzer and speech analysis method

Patent number: 8370153

Abstract: A speech analyzer includes a vocal tract and sound source separating unit which separates a vocal tract feature and a sound source feature from an input speech, based on a speech generation model, a fundamental frequency stability calculating unit which calculates a temporal stability of a fundamental frequency of the input speech in the sound source feature, from the separated sound source feature, a stable analyzed period extracting unit which extracts time information of a stable period, based on the temporal stability, and a vocal tract feature interpolation unit which interpolates a vocal tract feature which is not included in the stable period, using a vocal tract feature included in the extracted stable period.

Type: Grant

Filed: May 3, 2010

Date of Patent: February 5, 2013

Assignee: Panasonic Corporation

Inventors: Yoshifumi Hirose, Takahiro Kamai
Generating prosodic contours for synthesized speech

Patent number: 8321225

Abstract: The subject matter of this specification can be implemented in, among other things, a computer-implemented method including receiving text to be synthesized as a spoken utterance. The method includes analyzing the received text to determine attributes of the received text and selecting one or more utterances from a database based on a comparison between the attributes of the received text and attributes of text representing the stored utterances. The method includes determining, for each utterance, a distance between a contour of the utterance and a hypothetical contour of the spoken utterance, the determination based on a model that relates distances between pairs of contours of the utterances to relationships between attributes of text for the pairs. The method includes selecting a final utterance having a contour with a closest distance to the hypothetical contour and generating a contour for the received text based on the contour of the final utterance.

Type: Grant

Filed: November 14, 2008

Date of Patent: November 27, 2012

Assignee: Google Inc.

Inventors: Martin Jansche, Michael D. Riley, Andrew M. Rosenberg, Terry Tai
Methods and apparatus for rapid acoustic unit selection from a large speech corpus

Patent number: 8315872

Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice.

Type: Grant

Filed: November 29, 2011

Date of Patent: November 20, 2012

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Mark Charles Beutnagel, Mehryar Mohri, Michael Dennis Riley
Speech synthesis for synthesizing missing parts

Patent number: 8214216

Abstract: A simply configured speech synthesis device and the like for producing a natural synthetic speech at high speed. When data representing a message template is supplied, a voice unit editor (5) searches a voice unit database (7) for voice unit data on a voice unit whose sound matches a voice unit in the message template. Further, the voice unit editor (5) predicts the cadence of the message template and selects, one at a time, a best match of each voice unit in the message template from the voice unit data that has been retrieved, according to the cadence prediction result. For a voice unit for which no match can be selected, an acoustic processor (41) is instructed to supply waveform data representing the waveform of each unit voice. The voice unit data that is selected and the waveform data that is supplied by the acoustic processor (41) are combined to generate data representing a synthetic speech.

Type: Grant

Filed: June 3, 2004

Date of Patent: July 3, 2012

Assignee: Kabushiki Kaisha Kenwood

Inventor: Yasushi Sato
Systems, methods, and apparatus for computationally efficient, iterative alignment of speech waveforms

Patent number: 8145477

Abstract: Systems, methods, and apparatus described include waveform alignment operations in which a single set of evaluated cosines and sines is used to calculate cross-correlations of two periodic waveforms at two different phase shifts.

Type: Grant

Filed: December 1, 2006

Date of Patent: March 27, 2012

Inventors: Sharath Manjunath, Ananthapadmanabhan A. Kandhadai
Coarticulation method for audio-visual text-to-speech synthesis

Patent number: 8078466

Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data, second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.

Type: Grant

Filed: November 30, 2009

Date of Patent: December 13, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
Method and apparatus for normalizing voice feature vector by backward cumulative histogram

Patent number: 7835909

Abstract: A method and apparatus for normalizing a histogram utilizing a backward cumulative histogram which can cumulate a probability distribution function in an order from a greatest to smallest value so as to estimate a noise robust histogram. A method of normalizing a speech feature vector includes: extracting the speech feature vector from a speech signal; calculating a probability distribution function using the extracted speech feature vector; calculating a backward cumulative distribution function by cumulating the probability distribution function in an order from a largest to smallest value; and normalizing a histogram using the backward cumulative distribution function.

Type: Grant

Filed: December 12, 2006

Date of Patent: November 16, 2010

Assignee: Samsung Electronics Co., Ltd.

Inventors: So-Young Jeong, Gil Jin Jang, Kwang Cheol Oh
Apparatus, method, and medium for processing audio signal using correlation between bands

Patent number: 7756715

Abstract: Apparatus, method, and medium for processing an audio signal using a correlation between bands are provided. The apparatus includes an encoding unit encoding an input audio signal and a decoding unit decoding the encoded input audio signal.

Type: Grant

Filed: November 17, 2005

Date of Patent: July 13, 2010

Assignee: Samsung Electronics Co., Ltd.

Inventors: Junghoe Kim, Dohyung Kim, Sihwa Lee
Method and apparatus for speech decoding based on a parameter of the adaptive code vector

Patent number: 7747441

Abstract: A high quality speech is reproduced with a small data amount in speech coding and decoding for performing compression coding and decoding of a speech signal to a digital signal. In speech coding method according to a code-excited linear prediction (CELP) speech coding, a noise level of a speech in a concerning coding period is evaluated by using a code or coding result of at least one of spectrum information, power information, and pitch information, and various excitation codebooks are used based on an evaluation result.

Type: Grant

Filed: January 16, 2007

Date of Patent: June 29, 2010

Assignee: Mitsubishi Denki Kabushiki Kaisha

Inventor: Tadashi Yamaura
Noise suppression device

Patent number: 7660714

Abstract: A noise suppression device comprises subband SN ratio calculation means which receives a noise likeness signal, an input signal spectrum and a subband-based estimated noise spectrum, calculates the subband-based input signal average spectrum, calculates a subband-based mixture ratio of the subband-based estimated noise spectrum to the subband-based input signal average spectrum on the basis of the noise likeness signal, and calculates the subband-based SN ratio on the basis of the subband-based estimated noise spectrum, the subband-based input signal average spectrum and the mixture ratio.

Type: Grant

Filed: October 29, 2007

Date of Patent: February 9, 2010

Assignee: Mitsubishi Denki Kabushiki Kaisha

Inventors: Satoru Furuta, Shinya Takahashi
Coarticulation method for audio-visual text-to-speech synthesis

Patent number: 7630897

Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data. second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.

Type: Grant

Filed: May 19, 2008

Date of Patent: December 8, 2009

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
AUDIO DECODING DEVICE AND AUDIO DECODING METHOD

Publication number: 20090234653

Abstract: Provided is an audio decoding device performing frame loss compensation capable of obtaining a decoded audio which is natural for ears with little noise. The audio decoding device includes: a non-cyclic pulse waveform detection unit (19) for detecting a non-cyclic pulse waveform section in a n?1-th frame which is repeatedly used with a pitch cycle in the n-th frame upon compensation of loss of the n-th frame; a non-cyclic pulse waveform suppression unit (17) for suppressing a non-cyclic pulse waveform by replacing an audio source signal existing in the non-cyclic pulse waveform section in the n?1-th frame by a noise signal; and a synthesis filter (20) for using a linear prediction coefficient decoded by an LPC decoding unit (11) to perform synthesis by a synthesis filter by using the audio source signal of the n?1-th frame from the non-cyclic pulse waveform suppression unit (17) as a drive audio source, thereby obtaining the decoded audio signal of the n-th frame.

Type: Application

Filed: December 26, 2006

Publication date: September 17, 2009

Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.

Inventors: Takuya Kawashima, Hiroyuki Ehara
Parametric speech codec for representing synthetic speech in the presence of background noise

Patent number: 7257535

Abstract: A system and method are provided for processing audio and speech signals using a pitch and voicing dependent spectral estimation algorithm (voicing algorithm) to accurately represent voiced speech, unvoiced speech, and mixed speech in the presence of background noise, and background noise with a single model. The present invention also modifies the synthesis model based on an estimate of the current input signal to improve the perceptual quality of the speech and background noise under a variety of input conditions. The present invention also improves the voicing dependent spectral estimation algorithm robustness by introducing the use of a Multi-Layer Neural Network in the estimation process. The voicing dependent spectral estimation algorithm provides an accurate and robust estimate of the voicing probability under a variety of background noise conditions. This is essential to providing high quality intelligible speech in the presence of background noise.

Type: Grant

Filed: October 28, 2005

Date of Patent: August 14, 2007

Assignee: Lucent Technologies Inc.

Inventors: Joseph Gerard Aguilar, Juin-Hwey Chen, Wei Wang, Robert W. Zopf
Method and apparatus for an adaptive codebook search in a speech processing system

Patent number: 7003461

Abstract: An adaptive codebook search (ACS) algorithm is based on a set of matrix operations suitable for data processing engines supporting a single instruction multiple data (SIMD) architecture. The result is a reduction in memory access and increased parallelism to produce an overall improvement in the computational efficiency of ACS processing.

Type: Grant

Filed: July 9, 2002

Date of Patent: February 21, 2006

Assignee: Renesas Technology Corporation

Inventor: Clifford Tavares
Systems and methods for correlating images in an image correlation system with reduced computational loads

Patent number: 6996291

Abstract: After one or both of a pair of images are obtained, an auto-correlation function for one of those images is generated to determine a smear amount and possibly a smear direction. The smear amount and direction are used to identify potential locations of a peak portion of the correlation function between the pair of images. The pair of images is then correlated only at offset positions corresponding to the one or more of the potential peak locations. In some embodiments, the pair of images is correlated according to a sparse set of image correlation function value points around the potential peak locations. In other embodiments, the pair of images is correlated at a dense set of correlation function value points around the potential peak locations. The correlation function values of these correlation function value points are then analyzed to determine the offset position of the true correlation function peak.

Type: Grant

Filed: August 6, 2001

Date of Patent: February 7, 2006

Assignee: Mitutoyo Corporation

Inventor: Michael Nahum
Method and apparatus for processing an input speech signal during presentation of an output audio signal

Patent number: 6937977

Abstract: A start of an input speech signal is detected during presentation of an output audio signal and an input start time, relative to the output audio signal, is determined. The input start time is then provided for use in responding to the input speech signal. In another embodiment, the output audio signal has a corresponding identification. When the input speech signal is detected during presentation of the output audio signal, the identification of the output audio signal is provided for use in responding to the input speech signal. Information signals comprising data and/or control signals are provided in response to at least the contextual information provided, i.e., the input start time and/or the identification of the output audio signal. In this manner, the present invention accurately establishes a context of an input speech signal relative to an output audio signal regardless of the delay characteristics of the underlying communication system.

Type: Grant

Filed: October 5, 1999

Date of Patent: August 30, 2005

Assignee: fastmobile, Inc.

Inventor: Ira A. Gerson
Synchronization control apparatus and method, and recording medium

Patent number: 6865535

Abstract: In a synchronization control apparatus, a voice-language-information generating section generates the voice language information of a word which a robot utters. A voice synthesizing section calculates phoneme information and a phoneme continuation duration according to the voice language information, and also generates synthesized-voice data according to an adjusted phoneme continuation duration. An articulation-operation generating section calculates an articulation-operation period according to the phoneme information. A voice-operation adjusting section adjusts the phoneme continuation duration and the articulation-operation period. An articulation-operation executing section operates an organ of articulation according to the adjusted articulation-operation period.

Type: Grant

Filed: December 27, 2000

Date of Patent: March 8, 2005

Assignee: Sony Corporation

Inventors: Keiichi Yamada, Kenichiro Kobayashi, Tomoaki Nitta, Makoto Akabane, Masato Shimakawa, Nobuhide Yamazaki, Erika Kobayashi
Expressivity of voice synthesis by emphasizing source signal features

Patent number: 6804649

Abstract: Voice synthesis with improved expressivity is obtained in a voice synthesiser of source-filter type by making use of a library of source sound categories in the source module. Each source sound category corresponds to a particular morphological category and is derived from analysis of real vocal sounds, by inverse filtering so as to subtract the effect of the vocal tract. The library may be parametrical, that is, the stored data corresponds not to the inverse-filtered sounds themselves but to synthesis coefficients for resynthesising the inverse-filtered sounds using any suitable re-synthesis technique, such as the phase vocoder technique. The coefficients are derived by Short Time Fourier Transform (STFT) analysis.

Type: Grant

Filed: June 1, 2001

Date of Patent: October 12, 2004

Assignee: Sony France S.A.

Inventor: Eduardo Reck Miranda
Wide band synthesis through extension matrix

Patent number: 6681202

Abstract: The invention describes a system that generates a wide band signal (100-7000 Hz) from a telephony band (or narrow band: 300-3400 Hz) speech signal to obtain an extended band speech signal (100-3400 Hz). This technique is particularly advantageous since it increases signal naturalness and listening comfort with keeping compatibility with all current telephony systems. The described technique is inspired on Linear Predictive speech coders. The speech signal is thus split into a spectral envelope and a short-term residual signal. Both signals are extended separately and recombined to create an extended band signal.

Type: Grant

Filed: November 13, 2000

Date of Patent: January 20, 2004

Assignee: Koninklijke Philips Electronics N.V.

Inventors: Giles Miet, Andy Gerrits
Coarticulation method for audio-visual text-to-speech synthesis

Patent number: 6662161

Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. Representative parameters are extracted from the image samples and stored in an animation library. The processor also samples a plurality of multiphones comprising images together with their associated sounds. The processor extracts parameters from these images comprising data characterizing mouth shapes, maps, rules, or equations, and stores the resulting parameters and sound information in a coarticulation library. The animated sequence begins with the processor considering an input phoneme sequence, recalling from the coarticulation library parameters associated with that sequence, and selecting appropriate image samples from the animation library based on that sequence. The image samples are concatenated together, and the corresponding sound is output, to form the animated synthesis.

Type: Grant

Filed: September 7, 1999

Date of Patent: December 9, 2003

Assignee: AT&T Corp.

Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
Voice synthesizing system, segment generation apparatus for generating segments for voice synthesis, voice synthesizing method and storage medium storing program therefor

Publication number: 20030061051

Abstract: A voice synthesizing system can make necessary calculation amount satisfactorily small and can make necessary file size small. The system includes a compressed pitch segment database storing compressed voice waveform segments, a pitch developing portion reading out the voice waveform segment from the database and decompressing the compressed data for reproducing an original voice waveform segment when the voice waveform segment necessary for voice waveform synthesis is demanded, and a cache processing portion temporarily storing the voice waveform segment already used in voice waveform synthesis, and when voice waveform segment necessary for voice waveform synthesis is demanded, returning demanded voice waveform segment to a demander when demanded voice waveform segment is already stored, and obtaining the voice waveform segment from the database via the pitch developing portion to hold the obtained voice waveform segment and return to the demander when demanded voice waveform segment is not stored.

Type: Application

Filed: September 26, 2002

Publication date: March 27, 2003

Applicant: NEC CORPORATION

Inventors: Reishi Kondo, Hiroaki Hattori
Generating synthesized voice and instrumental sound

Patent number: 6513007

Abstract: There is provided a synthesized sound generating apparatus and method which can achieve responsive and high-quality speech synthesis based on a real-time convolution operation. Coefficients are generated by using dynamic cutting to extract characteristic information from a first signal. A convolution operation is performed on a second signal using the generated coefficients to generate a synthesized signal. As the convolution operation, an interpolation process is performed on the coefficients to prevent a rapid change in level of the generated synthesized signal upon switching of the coefficients.

Type: Grant

Filed: July 20, 2000

Date of Patent: January 28, 2003

Assignee: Yamaha Corporation

Inventor: Akio Takahashi
Frequency converter system

Patent number: 6421636

Abstract: An apparatus and method is disclosed for converting an input signal having frequency related information sustained over a first duration of time into an output signal sustained over a second duration of time at substantially the same first frequency by adding or subtracting to the effective wave length of the output signal. Preferably, the signals are converted in digital form with samples added or subtracted to frequency convert the signal.

Type: Grant

Filed: May 30, 2000

Date of Patent: July 16, 2002

Assignee: Pixel Instruments

Inventors: J. Carl Cooper, Steve Anderson
Pitch determination apparatus and method using spectro-temporal autocorrelation

Patent number: 6208958

Abstract: A pitch determination apparatus and method using spectro-temporal autocorrelation to prevent pitch determination errors are provided.

Type: Grant

Filed: January 7, 1999

Date of Patent: March 27, 2001

Assignee: Samsung Electronics Co., Ltd.

Inventors: Yong-duk Cho, Moo-Young Kim
Method and apparatus for reading-out/collating a table document, and computer-readable recording medium with program making computer execute method stored therein

Patent number: 5983181

Abstract: The table document preparation module prepares a table document containing cells, the read-out attribute setting module sets a read-out attribute specifying a way of reading-out cell data supplied through the setting screen from the table document preparation module being assisted by the setting display module, and voice-generating data generation module generates voice-generating data for the table document according to the way of reading out specified by the read-out attribute, and the voice synthesis module synthesizes voices according to the voice-generating data.

Type: Grant

Filed: December 3, 1997

Date of Patent: November 9, 1999

Assignee: Justsystem Corp.

Inventor: Nobuhide Yamazaki
Transmission system with improved tone detection

Patent number: 5850438

Abstract: In a transmission system, a tone is transmitted by a transmitter to a receiver via a transmission channel. In the receiver, a tone detector is used to detect the presence of a signalling tone. In order to improve a reliability of the tone detector when the arrival of the signalling tone is unknown, a number of correlators having mutually displaced measuring periods are used. More than two correlators are used in order to reduce the measuring period, also resulting in an improved reliability of the tone detection.

Type: Grant

Filed: April 16, 1996

Date of Patent: December 15, 1998

Assignee: U.S. Philips Corporation

Inventors: Harm Braams, Cornelis M. Moerman
Transmission system with improved tone detection

Patent number: 5850437

Abstract: In a transmission system, a signalling tone is transmitted by a transmitter to a receiver via a transmission channel. In the receiver, a tone detector detects the presence of a signalling tone. In order to improve a reliability of the tone detector, a number of correlating elements which determine a correlation value between an input signal and a reference signal are used. Absolute output signals of the correlating elements are added by an adder to derive a combined correlation signal to be used for detection.

Type: Grant

Filed: April 16, 1996

Date of Patent: December 15, 1998

Assignee: U.S. Philips Corporation

Inventors: Harm Braams, Cornelis M. Moerman
High-effeciency algorithms using minimum mean absolute error splicing for pitch and rate modification of audio signals

Patent number: 5832442

Abstract: A method is disclosed of modification of parameters of audio signals by dividing a digital signal converted from an original analog signal into sound frames, modifying a pitch and a playing rate of the digital signal within a frame and subsequent successive splicing a last modified frame with a first non-modified frame and calculating the mean absolute error to define the best splicing point in terms of producing minimal or no audible noise such that various sections of sound signals can be spliced together to achieve pitch and playing rate modification.An apparatus is also disclosed for implementing the method, the apparatus comprising input and output amplifiers, a low pass filter at the input and a low pass filter at the output, analog-to-digital and digital-to-analog converters, and a pitch shifting processor.

Type: Grant

Filed: June 23, 1995

Date of Patent: November 3, 1998

Assignee: Electronics Research & Service Organization

Inventors: Gang-Janp Lin, Sau-Gee Chen, Der-Chwan Wu, Yuan-An Kao, Yen-Hui Wang
Voiced speech coding and decoding using phase-adapted single excitation

Patent number: 5809456

Abstract: The present invention relates to a method and to equipment for coding and decoding a sampled speech signal. It belongs to systems used in speech processing, in particular for compression of speech information. The method is based upon a time/frequency description and on a representation of the prototype as a fundamental period of a periodic waveform; moreover the excitation of the synthesis filter is carried out through a single, phase-adapted pulse.

Type: Grant

Filed: June 27, 1996

Date of Patent: September 15, 1998

Assignee: Alcatel Italia S.P.A.

Inventors: Silvio Cucchi, Marco Fratti
Method and apparatus for implementing a long-term synthesis filter

Patent number: 5761635

Abstract: A synthesis filter is disclosed which models the effect of the fundamental frequency of speech for digital speech coders operating on the analysis-by-synthesis principle. High fundamental frequencies having a period shorter than the corresponding cycle length of the frame employed in the analysis-by-synthesis method are optimally encoded. The filter is constructed of a number of parallel, separately updatable synthesis-memory blocks. When analysis delays shorter than the analysis frame are used, a portion of a signal that was stored in memory several frames earlier is selected and scaled to approximate the missing portion of the analysis frame using the available portion of the analysis frame.

Type: Grant

Filed: April 29, 1996

Date of Patent: June 2, 1998

Assignee: Nokia Mobile Phones Ltd.

Inventor: Kari Juhani Jarvinen