Patents by Inventor Ladan GOLIPOUR

Ladan GOLIPOUR has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Generating genre appropriate voices for audio books

Patent number: 12380876

Abstract: Systems and processes for generating audio books from text are provided. An example process includes, at an electronic device having one or more processors and memory: receiving a text including at least a first subset and a second subset, wherein at least a portion of the first subset overlaps with at least a portion of the second subset; determining, based on the text, a prosody for a speech output, wherein the prosody is representative of a genre; determining a semantic meaning of the text; and generating, based on the prosody and the semantic meaning, the speech output of the text.

Type: Grant

Filed: October 31, 2022

Date of Patent: August 5, 2025

Assignee: Apple Inc.

Inventors: Ramya Rasipuram, William Beckman, Ladan Golipour, David A. Winarsky, Cheng-Chieh Yeh, Weicheng Zhang
GENERATING GENRE APPROPRIATE VOICES FOR AUDIO BOOKS

Publication number: 20230134970

Abstract: Systems and processes for generating audio books from text are provided. An example process includes, at an electronic device having one or more processors and memory: receiving a text including at least a first subset and a second subset, wherein at least a portion of the first subset overlaps with at least a portion of the second subset; determining, based on the text, a prosody for a speech output, wherein the prosody is representative of a genre; determining a semantic meaning of the text; and generating, based on the prosody and the semantic meaning, the speech output of the text.

Type: Application

Filed: October 31, 2022

Publication date: May 4, 2023

Inventors: Ramya RASIPURAM, William BECKMAN, Ladan GOLIPOUR, David A. WINARSKY, Cheng-Chieh YEH, Weicheng ZHANG
SYSTEM AND METHOD FOR TEXT NORMALIZATION USING ATOMIC TOKENS

Publication number: 20210272549

Abstract: A system, method and computer-readable storage devices are for normalizing text for ASR and TTS in a language-neutral way. The system described herein divides Unicode text into meaningful chunks called “atomic tokens.” The atomic tokens strongly correlate to their actual pronunciation, and not to their meaning. The system combines the tokenization with a data-driven classification scheme, followed by class-determined actions to convert text to normalized form. The classification labels are based on pronunciation, unlike alternative approaches that typically employ Named Entity-based categories. Thus, this approach is relatively simple to adapt to new languages. Non-experts can easily annotate training data because the tokens are based on pronunciation alone.

Type: Application

Filed: May 3, 2021

Publication date: September 2, 2021

Inventors: Ladan Golipour, Alistair D. Conkie
System and method for prosodically modified unit selection databases

Patent number: 11049491

Abstract: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.

Type: Grant

Filed: March 24, 2020

Date of Patent: June 29, 2021

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Alistair D. Conkie, Ladan Golipour, Ann K. Syrdal
System and method for text normalization using atomic tokens

Patent number: 10997964

Abstract: A system, method and computer-readable storage devices are for normalizing text for ASR and TTS in a language-neutral way. The system described herein divides Unicode text into meaningful chunks called “atomic tokens.” The atomic tokens strongly correlate to their actual pronunciation, and not to their meaning. The system combines the tokenization with a data-driven classification scheme, followed by class-determined actions to convert text to normalized form. The classification labels are based on pronunciation, unlike alternative approaches that typically employ Named Entity-based categories. Thus, this approach is relatively simple to adapt to new languages. Non-experts can easily annotate training data because the tokens are based on pronunciation alone.

Type: Grant

Filed: August 16, 2019

Date of Patent: May 4, 2021

Assignee: AT&T INTELLECTUAL PROPERTY 1, L.P.

Inventors: Ladan Golipour, Alistair D. Conkie
SYSTEM AND METHOD FOR PROSODICALLY MODIFIED UNIT SELECTION DATABASES

Publication number: 20200227023

Abstract: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.

Type: Application

Filed: March 24, 2020

Publication date: July 16, 2020

Inventors: Alistair D. Conkie, Ladan Golipour, Ann K. Syrdal
System and method for prosodically modified unit selection databases

Patent number: 10607594

Abstract: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.

Type: Grant

Filed: March 29, 2019

Date of Patent: March 31, 2020

Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventors: Alistair D. Conkie, Ladan Golipour, Ann K. Syrdal
SYSTEM AND METHOD FOR TEXT NORMALIZATION USING ATOMIC TOKENS

Publication number: 20190371294

Abstract: A system, method and computer-readable storage devices are for normalizing text for ASR and TTS in a language-neutral way. The system described herein divides Unicode text into meaningful chunks called “atomic tokens.” The atomic tokens strongly correlate to their actual pronunciation, and not to their meaning. The system combines the tokenization with a data-driven classification scheme, followed by class-determined actions to convert text to normalized form. The classification labels are based on pronunciation, unlike alternative approaches that typically employ Named Entity-based categories. Thus, this approach is relatively simple to adapt to new languages. Non-experts can easily annotate training data because the tokens are based on pronunciation alone.

Type: Application

Filed: August 16, 2019

Publication date: December 5, 2019

Inventors: Ladan GOLIPOUR, Alistair D. CONKIE
Text normalization based on a data-driven learning network

Patent number: 10395654

Abstract: Systems and processes for operating an intelligent automated assistant to perform text-to-speech conversion are provided. An example method includes, at an electronic device having one or more processors, receiving a text corpus comprising unstructured natural language text. The method further includes generating a sequence of normalized text based on the received text corpus; and generating a pronunciation sequence representing the sequence of the normalized text. The method further includes causing an audio output to be provided to the user based on the pronunciation sequence. At least one of the sequence of normalized text and the pronunciation sequence is generated based on a data-driven learning network.

Type: Grant

Filed: August 10, 2017

Date of Patent: August 27, 2019

Assignee: Apple Inc.

Inventors: Ladan Golipour, Matthias Neeracher, Ramya Rasipuram
System and method for text normalization using atomic tokens

Patent number: 10388270

Abstract: A system, method and computer-readable storage devices are for normalizing text for ASR and TTS in a language-neutral way. The system described herein divides Unicode text into meaningful chunks called “atomic tokens.” The atomic tokens strongly correlate to their actual pronunciation, and not to their meaning The system combines the tokenization with a data-driven classification scheme, followed by class-determined actions to convert text to normalized form. The classification labels are based on pronunciation, unlike alternative approaches that typically employ Named Entity-based categories. Thus, this approach is relatively simple to adapt to new languages. Non-experts can easily annotate training data because the tokens are based on pronunciation alone.

Type: Grant

Filed: November 5, 2014

Date of Patent: August 20, 2019

Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventors: Ladan Golipour, Alistair D. Conkie
SYSTEM AND METHOD FOR PROSODICALLY MODIFIED UNIT SELECTION DATABASES

Publication number: 20190228761

Abstract: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.

Type: Application

Filed: March 29, 2019

Publication date: July 25, 2019

Inventors: Alistair D. Conkie, Ladan Golipour, Ann K. Syrdal
System and method for prosodically modified unit selection databases

Patent number: 10249290

Abstract: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.

Type: Grant

Filed: June 11, 2018

Date of Patent: April 2, 2019

Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventors: Alistair D. Conkie, Ladan Golipour, Ann K. Syrdal
System and method for unified normalization in text-to-speech and automatic speech recognition

Patent number: 10199034

Abstract: A system, method and computer-readable storage devices are for using a single set of normalization protocols and a single language lexica (or dictionary) for both TTS and ASR. The system receives input (which is either text to be converted to speech or ASR training text), then normalizes the input. The system produces, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output which is either phonemes corresponding to the input or text corresponding to the input for training the ASR system. When the output is phonemes corresponding to the input, the system generates speech by performing prosody generation and unit selection synthesis using the phonemes. When the output is text corresponding to the input, the system trains both an acoustic model and a language model for use in future speech recognition.

Type: Grant

Filed: August 18, 2014

Date of Patent: February 5, 2019

Assignee: AT&T INTELLECTUAL PROPERTY I, L.P.

Inventors: Alistair D. Conkie, Ladan Golipour
TEXT NORMALIZATION BASED ON A DATA-DRIVEN LEARNING NETWORK

Publication number: 20180330729

Abstract: Systems and processes for operating an intelligent automated assistant to perform text-to-speech conversion are provided. An example method includes, at an electronic device having one or more processors, receiving a text corpus comprising unstructured natural language text. The method further includes generating a sequence of normalized text based on the received text corpus; and generating a pronunciation sequence representing the sequence of the normalized text. The method further includes causing an audio output to be provided to the user based on the pronunciation sequence. At least one of the sequence of normalized text and the pronunciation sequence is generated based on a data-driven learning network.

Type: Application

Filed: August 10, 2017

Publication date: November 15, 2018

Inventors: Ladan GOLIPOUR, Matthias NEERACHER, Ramya RASIPURAM
SYSTEM AND METHOD FOR PROSODICALLY MODIFIED UNIT SELECTION DATABASES

Publication number: 20180293972

Abstract: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.

Type: Application

Filed: June 11, 2018

Publication date: October 11, 2018

Inventors: Alistair D. CONKIE, Ladan GOLIPOUR, Ann K. SYRDAL
System and method for prosodically modified unit selection databases

Patent number: 9997154

Abstract: Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.

Type: Grant

Filed: May 12, 2014

Date of Patent: June 12, 2018

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Alistair D. Conkie, Ladan Golipour, Ann K. Syrdal
Unit-selection text-to-speech synthesis based on predicted concatenation parameters

Patent number: 9934775

Abstract: Systems and processes for performing unit-selection text-to-speech synthesis are provided. In an example process, text to be converted to speech is received. The text is represented as a sequence of target units. A plurality of candidate speech segments corresponding to the sequence of target units are selected. Predicted statistical parameters of acoustic features associated with the sequence of target units are determined. The predicted statistical parameters of acoustic features are used to determine target costs and concatenation costs associated with the plurality of candidate speech segments. Based on a combined cost determined from the target costs and concatenation costs, a subset of candidate speech segments is selected from the plurality of candidate speech segments. Speech corresponding to the received text is generated using the subset of candidate speech segments.

Type: Grant

Filed: September 15, 2016

Date of Patent: April 3, 2018

Assignee: Apple Inc.

Inventors: Tuomo J. Raitio, Kishore Sunkeswari Prahallad, Alistair D. Conkie, Ladan Golipour, David A. Winarsky
UNIT-SELECTION TEXT-TO-SPEECH SYNTHESIS BASED ON PREDICTED CONCATENATION PARAMETERS

Publication number: 20170345411

Abstract: Systems and processes for performing unit-selection text-to-speech synthesis are provided. In an example process, text to be converted to speech is received. The text is represented as a sequence of target units. A plurality of candidate speech segments corresponding to the sequence of target units are selected. Predicted statistical parameters of acoustic features associated with the sequence of target units are determined. The predicted statistical parameters of acoustic features are used to determine target costs and concatenation costs associated with the plurality of candidate speech segments. Based on a combined cost determined from the target costs and concatenation costs, a subset of candidate speech segments is selected from the plurality of candidate speech segments. Speech corresponding to the received text is generated using the subset of candidate speech segments.

Type: Application

Filed: September 15, 2016

Publication date: November 30, 2017

Inventors: Tuomo J. RAITIO, Kishore Sunkeswari PRAHALLAD, Alistair D. CONKIE, Ladan GOLIPOUR, David A. WINARSKY
SYSTEM AND METHOD FOR TEXT NORMALIZATION USING ATOMIC TOKENS

Publication number: 20160125872

Abstract: A system, method and computer-readable storage devices are for normalizing text for ASR and TTS in a language-neutral way. The system described herein divides Unicode text into meaningful chunks called “atomic tokens.” The atomic tokens strongly correlate to their actual pronunciation, and not to their meaning The system combines the tokenization with a data-driven classification scheme, followed by class-determined actions to convert text to normalized form. The classification labels are based on pronunciation, unlike alternative approaches that typically employ Named Entity-based categories. Thus, this approach is relatively simple to adapt to new languages. Non-experts can easily annotate training data because the tokens are based on pronunciation alone.

Type: Application

Filed: November 5, 2014

Publication date: May 5, 2016

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Ladan GOLIPOUR, Alistair D. CONKIE
SYSTEM AND METHOD FOR UNIFIED NORMALIZATION IN TEXT-TO-SPEECH AND AUTOMATIC SPEECH RECOGNITION

Publication number: 20160049144

Abstract: A system, method and computer-readable storage devices are for using a single set of normalization protocols and a single language lexica (or dictionary) for both TTS and ASR. The system receives input (which is either text to be converted to speech or ASR training text), then normalizes the input. The system produces, using the normalized input and a dictionary configured for both automatic speech recognition and text-to-speech processing, output which is either phonemes corresponding to the input or text corresponding to the input for training the ASR system. When the output is phonemes corresponding to the input, the system generates speech by performing prosody generation and unit selection synthesis using the phonemes. When the output is text corresponding to the input, the system trains both an acoustic model and a language model for use in future speech recognition.

Type: Application

Filed: August 18, 2014

Publication date: February 18, 2016

Inventors: Alistair D. CONKIE, Ladan GOLIPOUR

1 2 next