Correlation Patents (Class 704/263)
  • Patent number: 11935515
    Abstract: A method of generating a synthetic voice by capturing audio data, cutting it into discrete phoneme and pitch segments, forming superior phoneme and pitch segments by averaging segments having similar phoneme, pitch, and other sound qualities, and training neural networks to correctly concatenate the segments.
    Type: Grant
    Filed: December 27, 2021
    Date of Patent: March 19, 2024
    Inventor: Claude Polonov
  • Patent number: 11636845
    Abstract: A method includes generating first synthesized speech by using text and a first emotion vector configured for the text, extracting a second emotion vector included in the first synthesized speech, determining whether correction of the second emotion information vector is needed by comparing a loss value calculated by using the first emotion information vector and the second emotion information vector with a preconfigured threshold, re-performing speech synthesis by using a third emotion information vector generated by correcting the second emotion information vector, and outputting the generated synthesized speech, thereby configuring emotion information of speech in a more effective manner. A speech synthesis apparatus may be associated with an artificial intelligence module, drone (unmanned aerial vehicle, UAV), robot, augmented reality (AR) devices, virtual reality (VR) devices, devices related to 5G services, and the like.
    Type: Grant
    Filed: July 14, 2020
    Date of Patent: April 25, 2023
    Assignee: LG ELECTRONICS INC.
    Inventors: Siyoung Yang, Yongchul Park, Sungmin Han, Sangki Kim, Juyeong Jang, Minook Kim
  • Patent number: 9875301
    Abstract: Systems and methods for learning topic models from unstructured data and applying the learned topic models to recognize semantics for new data items are described herein. In at least one embodiment, a corpus of multimedia data items associated with a set of labels may be processed to generate a refined corpus of multimedia data items associated with the set of labels. Such processing may include arranging the multimedia data items in clusters based on similarities of extracted multimedia features and generating intra-cluster and inter-cluster features. The intra-cluster and the inter-cluster features may be used for removing multimedia data items from the corpus to generate the refined corpus. The refined corpus may be used for training topic models for identifying labels. The resulting models may be stored and subsequently used for identifying semantics of a multimedia data item input by a user.
    Type: Grant
    Filed: April 30, 2014
    Date of Patent: January 23, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Xian-Sheng Hua, Jin Li, Yoshitaka Ushiku
  • Patent number: 8972258
    Abstract: Techniques disclosed herein include using a Maximum A Posteriori (MAP) adaptation process that imposes sparseness constraints to generate acoustic parameter adaptation data for specific users based on a relatively small set of training data. The resulting acoustic parameter adaptation data identifies changes for a relatively small fraction of acoustic parameters from a baseline acoustic speech model instead of changes to all acoustic parameters. This results in user-specific acoustic parameter adaptation data that is several orders of magnitude smaller than storage amounts otherwise required for a complete acoustic model. This provides customized acoustic speech models that increase recognition accuracy at a fraction of expected data storage requirements.
    Type: Grant
    Filed: May 22, 2014
    Date of Patent: March 3, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: Vaibhava Goel, Peder A. Olsen, Steven J. Rennie, Jing Huang
  • Patent number: 8930200
    Abstract: A vector joint encoding/decoding method and a vector joint encoder/decoder are provided, more than two vectors are jointly encoded, and an encoding index of at least one vector is split and then combined between different vectors, so that encoding idle spaces of different vectors can be recombined, thereby facilitating saving of encoding bits, and because an encoding index of a vector is split and then shorter split indexes are recombined, thereby facilitating reduction of requirements for the bit width of operating parts in encoding/decoding calculation.
    Type: Grant
    Filed: July 24, 2013
    Date of Patent: January 6, 2015
    Assignee: Huawei Technologies Co., Ltd
    Inventors: Fuwei Ma, Dejun Zhang, Lei Miao, Fengyan Qi
  • Publication number: 20140358547
    Abstract: Systems and methods for prosody prediction include extracting features from runtime data using a parametric model. The features from runtime data are compared with features from training data using an exemplar-based model to predict prosody of the runtime data. The features from the training data are paired with exemplars from the training data and stored on a computer readable storage medium.
    Type: Application
    Filed: September 17, 2013
    Publication date: December 4, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Raul Fernandez, Asaf Rendel
  • Publication number: 20140358546
    Abstract: Systems and methods for prosody prediction include extracting features from runtime data using a parametric model. The features from runtime data are compared with features from training data using an exemplar-based model to predict prosody of the runtime data. The features from the training data are paired with exemplars from the training data and stored on a computer readable storage medium.
    Type: Application
    Filed: August 28, 2013
    Publication date: December 4, 2014
    Applicant: International Business Machines Corporation
    Inventors: Raul Fernandez, Asaf Rendel
  • Patent number: 8788268
    Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. When a pair of acoustic units in the database does not have an associated concatenation cost, the system assigns a default concatenation cost. The system then synthesizes speech, identifies the acoustic unit sequential pairs generated and their respective concatenation costs, and stores those concatenation costs likely to occur.
    Type: Grant
    Filed: November 19, 2012
    Date of Patent: July 22, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Mark Charles Beutnagel, Mehryar Mohri, Michael Dennis Riley
  • Patent number: 8768704
    Abstract: An input signal that includes linguistic content in a first language may be received by a computing device. The linguistic content may include text or speech. Based on an acoustic feature comparison between a plurality of first-language speech sounds and a plurality of second-language speech sounds, the computing device may associate the linguistic content in the first language with one or more phonemes from a second language. The computing device may also determine a phonemic representation of the linguistic content in the first language based on use of the one or more phonemes from the second language. The phonemic representation may be indicative of a pronunciation of the linguistic content in the first language according to speech sounds of the second language.
    Type: Grant
    Filed: October 14, 2013
    Date of Patent: July 1, 2014
    Assignee: Google Inc.
    Inventors: Javier Gonzalvo Fructuoso, Ioannis Agiomyrgiannakis
  • Patent number: 8738376
    Abstract: Techniques disclosed herein include using a Maximum A Posteriori (MAP) adaptation process that imposes sparseness constraints to generate acoustic parameter adaptation data for specific users based on a relatively small set of training data. The resulting acoustic parameter adaptation data identifies changes for a relatively small fraction of acoustic parameters from a baseline acoustic speech model instead of changes to all acoustic parameters. This results in user-specific acoustic parameter adaptation data that is several orders of magnitude smaller than storage amounts otherwise required for a complete acoustic model. This provides customized acoustic speech models that increase recognition accuracy at a fraction of expected data storage requirements.
    Type: Grant
    Filed: October 28, 2011
    Date of Patent: May 27, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Vaibhava Goel, Peder A. Olsen, Steven J. Rennie, Jing Huang
  • Patent number: 8731938
    Abstract: A computer-implemented system and method for identifying and masking special information within recorded speech is provided. A field for entry of special information is identified. Movement of a pointer device along a trajectory towards the field is also identified. A correlation of the pointer device movement and entry of the special information is determined based on a location of the trajectory in relation to the field. A threshold is applied to the correlation. The special information is received as verbal speech. A recording of the special information is rendered unintelligible when the threshold is satisfied.
    Type: Grant
    Filed: April 26, 2013
    Date of Patent: May 20, 2014
    Assignee: Intellisist, Inc.
    Inventor: G. Kevin Doren
  • Patent number: 8706493
    Abstract: In one embodiment of a controllable prosody re-estimation system, a TTS/STS engine consists of a prosody prediction/estimation module, a prosody re-estimation module and a speech synthesis module. The prosody prediction/estimation module generates predicted or estimated prosody information. And then the prosody re-estimation module re-estimates the predicted or estimated prosody information and produces new prosody information, according to a set of controllable parameters provided by a controllable prosody parameter interface. The new prosody information is provided to the speech synthesis module to produce a synthesized speech.
    Type: Grant
    Filed: July 11, 2011
    Date of Patent: April 22, 2014
    Assignee: Industrial Technology Research Institute
    Inventors: Cheng-Yuan Lin, Chien-Hung Huang, Chih-Chung Kuo
  • Patent number: 8576961
    Abstract: A method for determining an overlap and add length estimate comprises determining a plurality of correlation values of a plurality of ordered frequency domain samples obtained from a data frame; comparing the correlation values of a first subset of the samples to a first predetermined threshold to determine a first edge sample; comparing the correlation values of a second subset of the samples to a second predetermined threshold to determine a second edge sample; using the first and second edge samples to determine an overlap and add length estimate; and providing the overlap and add length estimate to an overlap and add circuit.
    Type: Grant
    Filed: June 15, 2009
    Date of Patent: November 5, 2013
    Assignee: Olympus Corporation
    Inventors: Haidong Zhu, Dumitru Mihai Ionescu, Abu Amanullah
  • Patent number: 8566106
    Abstract: A method and device for searching an algebraic codebook during encoding of a sound signal, wherein the algebraic codebook comprises a set of codevectors formed of a number of pulse positions and a number of pulses distributed over the pulse positions. In the algebraic codebook searching method and device, a reference signal for use in searching the algebraic codebook is calculated. In a first stage, a position of a first pulse is determined in relation with the reference signal and among the number of pulse positions. In each of a number of stages subsequent to the first stage, (a) an algebraic codebook gain is recomputed, (b) the reference signal is updated using the recomputed algebraic codebook gain and (c) a position of another pulse is determined in relation with the updated reference signal and among the number of pulse positions.
    Type: Grant
    Filed: September 11, 2008
    Date of Patent: October 22, 2013
    Assignee: Voiceage Corporation
    Inventors: Redwan Salami, Vaclav Eksler, Milan Jelinek
  • Patent number: 8494845
    Abstract: Provided is a signal distortion elimination apparatus comprising: an inverse filter application means that outputs the signal obtained by applying an inverse filter to an observed signal as a restored signal when a predetermined iteration termination condition is met and outputs the signal obtained by applying the inverse filter to the observed signal as an ad-hoc signal when the predetermined iteration termination condition is not met; a prediction error filter calculation means that segments the ad-hoc signal into frames and outputs a prediction error filter of each frame obtained by performing linear prediction analysis of the ad-hoc signal of each frame; an inverse filter calculation means that calculates an inverse filter such that a concatenation of innovation estimates of the respective frames becomes mutually independent among their samples, where the innovation estimate of a single frame (an innovation estimate) is the signal obtained by applying the prediction error filter of the corresponding frame
    Type: Grant
    Filed: February 16, 2007
    Date of Patent: July 23, 2013
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Takuya Yoshioka, Takafumi Hikichi, Masato Miyoshi
  • Patent number: 8370153
    Abstract: A speech analyzer includes a vocal tract and sound source separating unit which separates a vocal tract feature and a sound source feature from an input speech, based on a speech generation model, a fundamental frequency stability calculating unit which calculates a temporal stability of a fundamental frequency of the input speech in the sound source feature, from the separated sound source feature, a stable analyzed period extracting unit which extracts time information of a stable period, based on the temporal stability, and a vocal tract feature interpolation unit which interpolates a vocal tract feature which is not included in the stable period, using a vocal tract feature included in the extracted stable period.
    Type: Grant
    Filed: May 3, 2010
    Date of Patent: February 5, 2013
    Assignee: Panasonic Corporation
    Inventors: Yoshifumi Hirose, Takahiro Kamai
  • Patent number: 8321225
    Abstract: The subject matter of this specification can be implemented in, among other things, a computer-implemented method including receiving text to be synthesized as a spoken utterance. The method includes analyzing the received text to determine attributes of the received text and selecting one or more utterances from a database based on a comparison between the attributes of the received text and attributes of text representing the stored utterances. The method includes determining, for each utterance, a distance between a contour of the utterance and a hypothetical contour of the spoken utterance, the determination based on a model that relates distances between pairs of contours of the utterances to relationships between attributes of text for the pairs. The method includes selecting a final utterance having a contour with a closest distance to the hypothetical contour and generating a contour for the received text based on the contour of the final utterance.
    Type: Grant
    Filed: November 14, 2008
    Date of Patent: November 27, 2012
    Assignee: Google Inc.
    Inventors: Martin Jansche, Michael D. Riley, Andrew M. Rosenberg, Terry Tai
  • Patent number: 8315872
    Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice.
    Type: Grant
    Filed: November 29, 2011
    Date of Patent: November 20, 2012
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Mark Charles Beutnagel, Mehryar Mohri, Michael Dennis Riley
  • Patent number: 8214216
    Abstract: A simply configured speech synthesis device and the like for producing a natural synthetic speech at high speed. When data representing a message template is supplied, a voice unit editor (5) searches a voice unit database (7) for voice unit data on a voice unit whose sound matches a voice unit in the message template. Further, the voice unit editor (5) predicts the cadence of the message template and selects, one at a time, a best match of each voice unit in the message template from the voice unit data that has been retrieved, according to the cadence prediction result. For a voice unit for which no match can be selected, an acoustic processor (41) is instructed to supply waveform data representing the waveform of each unit voice. The voice unit data that is selected and the waveform data that is supplied by the acoustic processor (41) are combined to generate data representing a synthetic speech.
    Type: Grant
    Filed: June 3, 2004
    Date of Patent: July 3, 2012
    Assignee: Kabushiki Kaisha Kenwood
    Inventor: Yasushi Sato
  • Patent number: 8145477
    Abstract: Systems, methods, and apparatus described include waveform alignment operations in which a single set of evaluated cosines and sines is used to calculate cross-correlations of two periodic waveforms at two different phase shifts.
    Type: Grant
    Filed: December 1, 2006
    Date of Patent: March 27, 2012
    Inventors: Sharath Manjunath, Ananthapadmanabhan A. Kandhadai
  • Patent number: 8078466
    Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data, second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.
    Type: Grant
    Filed: November 30, 2009
    Date of Patent: December 13, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
  • Patent number: 7835909
    Abstract: A method and apparatus for normalizing a histogram utilizing a backward cumulative histogram which can cumulate a probability distribution function in an order from a greatest to smallest value so as to estimate a noise robust histogram. A method of normalizing a speech feature vector includes: extracting the speech feature vector from a speech signal; calculating a probability distribution function using the extracted speech feature vector; calculating a backward cumulative distribution function by cumulating the probability distribution function in an order from a largest to smallest value; and normalizing a histogram using the backward cumulative distribution function.
    Type: Grant
    Filed: December 12, 2006
    Date of Patent: November 16, 2010
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: So-Young Jeong, Gil Jin Jang, Kwang Cheol Oh
  • Patent number: 7756715
    Abstract: Apparatus, method, and medium for processing an audio signal using a correlation between bands are provided. The apparatus includes an encoding unit encoding an input audio signal and a decoding unit decoding the encoded input audio signal.
    Type: Grant
    Filed: November 17, 2005
    Date of Patent: July 13, 2010
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Junghoe Kim, Dohyung Kim, Sihwa Lee
  • Patent number: 7747441
    Abstract: A high quality speech is reproduced with a small data amount in speech coding and decoding for performing compression coding and decoding of a speech signal to a digital signal. In speech coding method according to a code-excited linear prediction (CELP) speech coding, a noise level of a speech in a concerning coding period is evaluated by using a code or coding result of at least one of spectrum information, power information, and pitch information, and various excitation codebooks are used based on an evaluation result.
    Type: Grant
    Filed: January 16, 2007
    Date of Patent: June 29, 2010
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventor: Tadashi Yamaura
  • Patent number: 7660714
    Abstract: A noise suppression device comprises subband SN ratio calculation means which receives a noise likeness signal, an input signal spectrum and a subband-based estimated noise spectrum, calculates the subband-based input signal average spectrum, calculates a subband-based mixture ratio of the subband-based estimated noise spectrum to the subband-based input signal average spectrum on the basis of the noise likeness signal, and calculates the subband-based SN ratio on the basis of the subband-based estimated noise spectrum, the subband-based input signal average spectrum and the mixture ratio.
    Type: Grant
    Filed: October 29, 2007
    Date of Patent: February 9, 2010
    Assignee: Mitsubishi Denki Kabushiki Kaisha
    Inventors: Satoru Furuta, Shinya Takahashi
  • Patent number: 7630897
    Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data. second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.
    Type: Grant
    Filed: May 19, 2008
    Date of Patent: December 8, 2009
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
  • Publication number: 20090234653
    Abstract: Provided is an audio decoding device performing frame loss compensation capable of obtaining a decoded audio which is natural for ears with little noise. The audio decoding device includes: a non-cyclic pulse waveform detection unit (19) for detecting a non-cyclic pulse waveform section in a n?1-th frame which is repeatedly used with a pitch cycle in the n-th frame upon compensation of loss of the n-th frame; a non-cyclic pulse waveform suppression unit (17) for suppressing a non-cyclic pulse waveform by replacing an audio source signal existing in the non-cyclic pulse waveform section in the n?1-th frame by a noise signal; and a synthesis filter (20) for using a linear prediction coefficient decoded by an LPC decoding unit (11) to perform synthesis by a synthesis filter by using the audio source signal of the n?1-th frame from the non-cyclic pulse waveform suppression unit (17) as a drive audio source, thereby obtaining the decoded audio signal of the n-th frame.
    Type: Application
    Filed: December 26, 2006
    Publication date: September 17, 2009
    Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.
    Inventors: Takuya Kawashima, Hiroyuki Ehara
  • Patent number: 7257535
    Abstract: A system and method are provided for processing audio and speech signals using a pitch and voicing dependent spectral estimation algorithm (voicing algorithm) to accurately represent voiced speech, unvoiced speech, and mixed speech in the presence of background noise, and background noise with a single model. The present invention also modifies the synthesis model based on an estimate of the current input signal to improve the perceptual quality of the speech and background noise under a variety of input conditions. The present invention also improves the voicing dependent spectral estimation algorithm robustness by introducing the use of a Multi-Layer Neural Network in the estimation process. The voicing dependent spectral estimation algorithm provides an accurate and robust estimate of the voicing probability under a variety of background noise conditions. This is essential to providing high quality intelligible speech in the presence of background noise.
    Type: Grant
    Filed: October 28, 2005
    Date of Patent: August 14, 2007
    Assignee: Lucent Technologies Inc.
    Inventors: Joseph Gerard Aguilar, Juin-Hwey Chen, Wei Wang, Robert W. Zopf
  • Patent number: 7003461
    Abstract: An adaptive codebook search (ACS) algorithm is based on a set of matrix operations suitable for data processing engines supporting a single instruction multiple data (SIMD) architecture. The result is a reduction in memory access and increased parallelism to produce an overall improvement in the computational efficiency of ACS processing.
    Type: Grant
    Filed: July 9, 2002
    Date of Patent: February 21, 2006
    Assignee: Renesas Technology Corporation
    Inventor: Clifford Tavares
  • Patent number: 6996291
    Abstract: After one or both of a pair of images are obtained, an auto-correlation function for one of those images is generated to determine a smear amount and possibly a smear direction. The smear amount and direction are used to identify potential locations of a peak portion of the correlation function between the pair of images. The pair of images is then correlated only at offset positions corresponding to the one or more of the potential peak locations. In some embodiments, the pair of images is correlated according to a sparse set of image correlation function value points around the potential peak locations. In other embodiments, the pair of images is correlated at a dense set of correlation function value points around the potential peak locations. The correlation function values of these correlation function value points are then analyzed to determine the offset position of the true correlation function peak.
    Type: Grant
    Filed: August 6, 2001
    Date of Patent: February 7, 2006
    Assignee: Mitutoyo Corporation
    Inventor: Michael Nahum
  • Patent number: 6937977
    Abstract: A start of an input speech signal is detected during presentation of an output audio signal and an input start time, relative to the output audio signal, is determined. The input start time is then provided for use in responding to the input speech signal. In another embodiment, the output audio signal has a corresponding identification. When the input speech signal is detected during presentation of the output audio signal, the identification of the output audio signal is provided for use in responding to the input speech signal. Information signals comprising data and/or control signals are provided in response to at least the contextual information provided, i.e., the input start time and/or the identification of the output audio signal. In this manner, the present invention accurately establishes a context of an input speech signal relative to an output audio signal regardless of the delay characteristics of the underlying communication system.
    Type: Grant
    Filed: October 5, 1999
    Date of Patent: August 30, 2005
    Assignee: fastmobile, Inc.
    Inventor: Ira A. Gerson
  • Patent number: 6865535
    Abstract: In a synchronization control apparatus, a voice-language-information generating section generates the voice language information of a word which a robot utters. A voice synthesizing section calculates phoneme information and a phoneme continuation duration according to the voice language information, and also generates synthesized-voice data according to an adjusted phoneme continuation duration. An articulation-operation generating section calculates an articulation-operation period according to the phoneme information. A voice-operation adjusting section adjusts the phoneme continuation duration and the articulation-operation period. An articulation-operation executing section operates an organ of articulation according to the adjusted articulation-operation period.
    Type: Grant
    Filed: December 27, 2000
    Date of Patent: March 8, 2005
    Assignee: Sony Corporation
    Inventors: Keiichi Yamada, Kenichiro Kobayashi, Tomoaki Nitta, Makoto Akabane, Masato Shimakawa, Nobuhide Yamazaki, Erika Kobayashi
  • Patent number: 6804649
    Abstract: Voice synthesis with improved expressivity is obtained in a voice synthesiser of source-filter type by making use of a library of source sound categories in the source module. Each source sound category corresponds to a particular morphological category and is derived from analysis of real vocal sounds, by inverse filtering so as to subtract the effect of the vocal tract. The library may be parametrical, that is, the stored data corresponds not to the inverse-filtered sounds themselves but to synthesis coefficients for resynthesising the inverse-filtered sounds using any suitable re-synthesis technique, such as the phase vocoder technique. The coefficients are derived by Short Time Fourier Transform (STFT) analysis.
    Type: Grant
    Filed: June 1, 2001
    Date of Patent: October 12, 2004
    Assignee: Sony France S.A.
    Inventor: Eduardo Reck Miranda
  • Patent number: 6681202
    Abstract: The invention describes a system that generates a wide band signal (100-7000 Hz) from a telephony band (or narrow band: 300-3400 Hz) speech signal to obtain an extended band speech signal (100-3400 Hz). This technique is particularly advantageous since it increases signal naturalness and listening comfort with keeping compatibility with all current telephony systems. The described technique is inspired on Linear Predictive speech coders. The speech signal is thus split into a spectral envelope and a short-term residual signal. Both signals are extended separately and recombined to create an extended band signal.
    Type: Grant
    Filed: November 13, 2000
    Date of Patent: January 20, 2004
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Giles Miet, Andy Gerrits
  • Patent number: 6662161
    Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. Representative parameters are extracted from the image samples and stored in an animation library. The processor also samples a plurality of multiphones comprising images together with their associated sounds. The processor extracts parameters from these images comprising data characterizing mouth shapes, maps, rules, or equations, and stores the resulting parameters and sound information in a coarticulation library. The animated sequence begins with the processor considering an input phoneme sequence, recalling from the coarticulation library parameters associated with that sequence, and selecting appropriate image samples from the animation library based on that sequence. The image samples are concatenated together, and the corresponding sound is output, to form the animated synthesis.
    Type: Grant
    Filed: September 7, 1999
    Date of Patent: December 9, 2003
    Assignee: AT&T Corp.
    Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
  • Publication number: 20030061051
    Abstract: A voice synthesizing system can make necessary calculation amount satisfactorily small and can make necessary file size small. The system includes a compressed pitch segment database storing compressed voice waveform segments, a pitch developing portion reading out the voice waveform segment from the database and decompressing the compressed data for reproducing an original voice waveform segment when the voice waveform segment necessary for voice waveform synthesis is demanded, and a cache processing portion temporarily storing the voice waveform segment already used in voice waveform synthesis, and when voice waveform segment necessary for voice waveform synthesis is demanded, returning demanded voice waveform segment to a demander when demanded voice waveform segment is already stored, and obtaining the voice waveform segment from the database via the pitch developing portion to hold the obtained voice waveform segment and return to the demander when demanded voice waveform segment is not stored.
    Type: Application
    Filed: September 26, 2002
    Publication date: March 27, 2003
    Applicant: NEC CORPORATION
    Inventors: Reishi Kondo, Hiroaki Hattori
  • Patent number: 6513007
    Abstract: There is provided a synthesized sound generating apparatus and method which can achieve responsive and high-quality speech synthesis based on a real-time convolution operation. Coefficients are generated by using dynamic cutting to extract characteristic information from a first signal. A convolution operation is performed on a second signal using the generated coefficients to generate a synthesized signal. As the convolution operation, an interpolation process is performed on the coefficients to prevent a rapid change in level of the generated synthesized signal upon switching of the coefficients.
    Type: Grant
    Filed: July 20, 2000
    Date of Patent: January 28, 2003
    Assignee: Yamaha Corporation
    Inventor: Akio Takahashi
  • Patent number: 6421636
    Abstract: An apparatus and method is disclosed for converting an input signal having frequency related information sustained over a first duration of time into an output signal sustained over a second duration of time at substantially the same first frequency by adding or subtracting to the effective wave length of the output signal. Preferably, the signals are converted in digital form with samples added or subtracted to frequency convert the signal.
    Type: Grant
    Filed: May 30, 2000
    Date of Patent: July 16, 2002
    Assignee: Pixel Instruments
    Inventors: J. Carl Cooper, Steve Anderson
  • Patent number: 6208958
    Abstract: A pitch determination apparatus and method using spectro-temporal autocorrelation to prevent pitch determination errors are provided.
    Type: Grant
    Filed: January 7, 1999
    Date of Patent: March 27, 2001
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Yong-duk Cho, Moo-Young Kim
  • Patent number: 5983181
    Abstract: The table document preparation module prepares a table document containing cells, the read-out attribute setting module sets a read-out attribute specifying a way of reading-out cell data supplied through the setting screen from the table document preparation module being assisted by the setting display module, and voice-generating data generation module generates voice-generating data for the table document according to the way of reading out specified by the read-out attribute, and the voice synthesis module synthesizes voices according to the voice-generating data.
    Type: Grant
    Filed: December 3, 1997
    Date of Patent: November 9, 1999
    Assignee: Justsystem Corp.
    Inventor: Nobuhide Yamazaki
  • Patent number: 5850438
    Abstract: In a transmission system, a tone is transmitted by a transmitter to a receiver via a transmission channel. In the receiver, a tone detector is used to detect the presence of a signalling tone. In order to improve a reliability of the tone detector when the arrival of the signalling tone is unknown, a number of correlators having mutually displaced measuring periods are used. More than two correlators are used in order to reduce the measuring period, also resulting in an improved reliability of the tone detection.
    Type: Grant
    Filed: April 16, 1996
    Date of Patent: December 15, 1998
    Assignee: U.S. Philips Corporation
    Inventors: Harm Braams, Cornelis M. Moerman
  • Patent number: 5850437
    Abstract: In a transmission system, a signalling tone is transmitted by a transmitter to a receiver via a transmission channel. In the receiver, a tone detector detects the presence of a signalling tone. In order to improve a reliability of the tone detector, a number of correlating elements which determine a correlation value between an input signal and a reference signal are used. Absolute output signals of the correlating elements are added by an adder to derive a combined correlation signal to be used for detection.
    Type: Grant
    Filed: April 16, 1996
    Date of Patent: December 15, 1998
    Assignee: U.S. Philips Corporation
    Inventors: Harm Braams, Cornelis M. Moerman
  • Patent number: 5832442
    Abstract: A method is disclosed of modification of parameters of audio signals by dividing a digital signal converted from an original analog signal into sound frames, modifying a pitch and a playing rate of the digital signal within a frame and subsequent successive splicing a last modified frame with a first non-modified frame and calculating the mean absolute error to define the best splicing point in terms of producing minimal or no audible noise such that various sections of sound signals can be spliced together to achieve pitch and playing rate modification.An apparatus is also disclosed for implementing the method, the apparatus comprising input and output amplifiers, a low pass filter at the input and a low pass filter at the output, analog-to-digital and digital-to-analog converters, and a pitch shifting processor.
    Type: Grant
    Filed: June 23, 1995
    Date of Patent: November 3, 1998
    Assignee: Electronics Research & Service Organization
    Inventors: Gang-Janp Lin, Sau-Gee Chen, Der-Chwan Wu, Yuan-An Kao, Yen-Hui Wang
  • Patent number: 5809456
    Abstract: The present invention relates to a method and to equipment for coding and decoding a sampled speech signal. It belongs to systems used in speech processing, in particular for compression of speech information. The method is based upon a time/frequency description and on a representation of the prototype as a fundamental period of a periodic waveform; moreover the excitation of the synthesis filter is carried out through a single, phase-adapted pulse.
    Type: Grant
    Filed: June 27, 1996
    Date of Patent: September 15, 1998
    Assignee: Alcatel Italia S.P.A.
    Inventors: Silvio Cucchi, Marco Fratti
  • Patent number: 5761635
    Abstract: A synthesis filter is disclosed which models the effect of the fundamental frequency of speech for digital speech coders operating on the analysis-by-synthesis principle. High fundamental frequencies having a period shorter than the corresponding cycle length of the frame employed in the analysis-by-synthesis method are optimally encoded. The filter is constructed of a number of parallel, separately updatable synthesis-memory blocks. When analysis delays shorter than the analysis frame are used, a portion of a signal that was stored in memory several frames earlier is selected and scaled to approximate the missing portion of the analysis frame using the available portion of the analysis frame.
    Type: Grant
    Filed: April 29, 1996
    Date of Patent: June 2, 1998
    Assignee: Nokia Mobile Phones Ltd.
    Inventor: Kari Juhani Jarvinen