Abstract: Techniques are disclosed that enable generation of an audio waveform representing synthesized speech based on a difference signal determined using an autoregressive model. Various implementations include using a distribution of the difference signal values to represent sounds found in human speech with a higher level of granularity than sounds not frequently found in human speech. Additional or alternative implementations include using one or more speakers of a client device to render the generated audio waveform.
Type:
Grant
Filed:
May 20, 2019
Date of Patent:
February 27, 2024
Assignee:
DeepMind Technologies Limited
Inventors:
Luis Carlos Cobo Rus, Nal Kalchbrenner, Erich Elsen, Chenjie Gu
Abstract: A quantization apparatus comprises: a first quantization module for performing quantization without an inter-frame prediction; and a second quantization module for performing quantization with an inter-frame prediction, and the first quantization module comprises: a first quantization part for quantizing an input signal; and a third quantization part for quantizing a first quantization error signal, and the second quantization module comprises: a second quantization part for quantizing a prediction error; and a fourth quantization part for quantizing a second quantization error signal, and the first quantization part and the second quantization part comprise a trellis structured vector quantizer.
Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.
Type:
Grant
Filed:
July 1, 2022
Date of Patent:
March 7, 2023
Assignee:
Google LLC
Inventors:
Neil Zeghidour, Marco Tagliasacchi, Dominik Roblek
Abstract: A method and apparatus for processing speech in a wireless communication system uses CELP speech encoded signals. A decoder receives encoded speech including a code index, a code index gain, a pitch lag, a pitch gain, and a line spectral pair (LSP) index. An innovation codevector and an adaptive codevector are determined and scaled. An excitation sequence is generated. Reconstructed speech is then output based on the excitation sequence and LSP index.
Abstract: Disclosed are a vector quantization device and others capable of adaptively adjusting a vector space of a code vector for quantization of a second stage by using a quantization result of a first stage and improving the quantization accuracy.