Patents by Inventor Syed Ammar ABBAS

Syed Ammar ABBAS has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Synthetic speech processing by representing text by phonemes exhibiting predicted volume and pitch using neural networks

Patent number: 11978431

Abstract: A speech-processing system receives input data representing text. One or more encoders trained to predict audio properties corresponding to the text process the text to predict those properties. A speech decoder processes phoneme embeddings as well as the predicted properties to create data representing synthesized speech.

Type: Grant

Filed: May 21, 2021

Date of Patent: May 7, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Arnaud Joly, Simon Slangen, Alexis Pierre Moinet, Thomas Renaud Drugman, Panagiota Karanasou, Syed Ammar Abbas, Sri Vishnu Kumar Karlapati
Learned condition text-to-speech synthesis

Patent number: 11830476

Abstract: Devices and techniques are generally described for learned condition text-to-speech synthesis. In some examples, first data representing a selection of a type of prosodic expressivity may be received. In some further examples, a selection of content comprising text data may be received. First audio data may be determined that includes an audio representation of the text data. The first audio data may be generated based at least in part on sampling from a first latent distribution generated using a conditional primary variational autoencoder (VAE). The sampling from the first latent distribution may be conditioned on a first learned distribution associated with the type of prosodic expressivity. In various examples, the first audio data may be sent to a first computing device.

Type: Grant

Filed: June 8, 2021

Date of Patent: November 28, 2023

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Panagiota Karanasou, Sri Vishnu Kumar Karlapati, Alexis Pierre Moinet, Arnaud Vincent Pierre Yves Joly, Syed Ammar Abbas, Thomas Renaud Drugman, Jaime Lorenzo Trueba
Multi-scale spectrogram text-to-speech

Patent number: 11694674

Abstract: Techniques for performing text-to-speech are described. An exemplary method includes receiving a request to generate audio from input text; generating audio from the input text by: generating a first number of vectors from phoneme embeddings representing the input text, predicting one or more spectrograms having the first number of frames using multiple scales wherein a coarser scale influences a finer scale, concatenating the first number of vectors and the predicted one or more spectrograms, generating at least one mel spectrogram from the concatenated vectors and the predicted one or more spectrograms, and converting, with a vocoder, the at least one mel spectrogram frames to audio; and outputting the generated audio according to the request.

Type: Grant

Filed: May 26, 2021

Date of Patent: July 4, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Syed Ammar Abbas, Bajibabu Bollepalli, Alexis Pierre Moinet, Thomas Renaud Drugman, Arnaud Vincent Pierre Yves Joly, Panagiota Karanasou, Sri Vishnu Kumar Karlapati, Simon Slangen, Petr Makarov
Synthetic speech processing

Patent number: 11574624

Abstract: A speech-processing system receives input data representing text. An input encoder processes the input data to determine first embedding data representing the text. A local attention encoder processes a subset of the first embedding data in accordance with a predicted size to determine second embedding data. An attention encoder processes the second embedding data to determine third embedding data. A decoder processes the third embedding data to determine audio data corresponding to the text.

Type: Grant

Filed: March 31, 2021

Date of Patent: February 7, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Arnaud Vincent Pierre Yves Joly, Panagiota Karanasou, Alexis Pierre Jean-Baptiste Moinet, Thomas Renaud Drugman, Sri Vishnu Kumar Karlapati, Syed Ammar Abbas, Simon Slangen
NEURAL NETWORK MEMORY FOR AUDIO

Publication number: 20220415304

Abstract: Techniques for utilizing memory for a neural network are described. For example, some techniques utilize a plurality of memory types to respond to a query from a neural network including a short-term memory to store fine-grained information for recent text of a document and receiving a first value in response, an episodic long-term memory to store information discarded from the short-term memory in a compressed form and receiving a second value in response, and a semantic long-term memory to store relevant facts per entity in the document.

Type: Application

Filed: June 24, 2021

Publication date: December 29, 2022

Inventors: Sri Vishnu Kumar KARLAPATI, Panagiota KARANASOU, Arnaud Vincent Pierre Yves JOLY, Alexis Pierre MOINET, Thomas Renaud DRUGMAN, Petr MAKAROV, Bajibabu BOLLEPALLI, Syed Ammar ABBAS, Simon SLANGEN

Synthetic speech processing by representing text by phonemes exhibiting predicted volume and pitch using neural networks

Learned condition text-to-speech synthesis

Multi-scale spectrogram text-to-speech

Synthetic speech processing

NEURAL NETWORK MEMORY FOR AUDIO