Patents Assigned to SANAS.AI INC.

System and method for style extraction in speech synthesis using neural networks and stored augmentations to simulate degraded speech characteristics

Patent number: 12374318

Abstract: The disclosed technology relates to methods, speech processing systems, and non-transitory computer readable media for style extraction in speech synthesis. In some examples, one or more content elements and one or more non-content elements are extracted from input audio data obtained via an audio interface and corresponding to input speech. The one or more non-content elements comprise style elements comprising at least an input pitch. A trained autoencoder is applied to encode the input pitch in a latent representation comprising a low-dimensional vector and combine the one or more content elements and the one or more non-content elements based on the low-dimensional vector to generate a new representation of the input speech. Output audio data is then generated and provided based on the new representation of the input speech. The output audio data comprises a pitch-consistent reconstruction of the input speech.

Type: Grant

Filed: July 25, 2024

Date of Patent: July 29, 2025

Assignee: SANAS.AI INC.

Inventors: Lukas Pfeifenberger, Shawn Zhang, Sharath Keshava Narayana
METHODS AND SYSTEMS FOR DETERMINING QUALITY ASSURANCE OF PARALLEL SPEECH UTTERANCES

Publication number: 20250046332

Abstract: The disclosed technology relates to methods, voice conversion systems, and non-transitory computer readable media for determining quality assurance of parallel speech utterances. In some examples, a candidate utterance and a reference utterance in obtained audio data are converted into first and second time series sequence representations, respectively, using acoustic features and linguistic features. A cross-correlation of the first and second time series sequence representations is performed to generate a result representing a first degree of similarity between the first and second time series sequence representations. An alignment difference of path-based distances between the reference and candidate speech utterances is generated. A quality metric is then output, which is generated based on the result of the cross-correlation and the alignment difference. The quality metric is indicative of a second degree of similarity between the candidate and reference utterances.

Type: Application

Filed: October 22, 2024

Publication date: February 6, 2025

Applicant: Sanas.ai Inc.

Inventors: Lukas PFEIFENBERGER, Shawn ZHANG
METHODS AND SYSTEMS FOR DETERMINING QUALITY ASSURANCE OF PARALLEL SPEECH UTTERANCES

Publication number: 20240363135

Abstract: The disclosed technology relates to methods, voice conversion systems, and non-transitory computer readable media for determining quality assurance of parallel speech utterances. In some examples, a candidate utterance and a reference utterance in obtained audio data are converted into first and second time series sequence representations, respectively, using acoustic features and linguistic features. A cross-correlation of the first and second time series sequence representations is performed to generate a result representing a first degree of similarity between the first and second time series sequence representations. An alignment difference of path-based distances between the reference and candidate speech utterances is generated. A quality metric is then output, which is generated based on the result of the cross-correlation and the alignment difference. The quality metric is indicative of a second degree of similarity between the candidate and reference utterances.

Type: Application

Filed: March 22, 2024

Publication date: October 31, 2024

Applicant: Sanas.ai Inc.

Inventors: Lukas PFEIFENBERGER, Shawn ZHANG
System and method for automatic alignment of phonetic content for real-time accent conversion

Patent number: 12131745

Abstract: The disclosed technology relates to methods, accent conversion systems, and non-transitory computer readable media for real-time accent conversion. In some examples, a set of phonetic embedding vectors is obtained for phonetic content representing a source accent and obtained from input audio data. A trained machine learning model is applied to the set of phonetic embedding vectors to generate a set of transformed phonetic embedding vectors corresponding to phonetic characteristics of speech data in a target accent. An alignment is determined by maximizing a cosine distance between the set of phonetic embedding vectors and the set of transformed phonetic embedding vectors. The speech data is then aligned to the phonetic content based on the determined alignment to generate output audio data representing the target accent.

Type: Grant

Filed: June 26, 2024

Date of Patent: October 29, 2024

Assignee: SANAS.AI INC.

Inventors: Lukas Pfeifenberger, Shawn Zhang
Methods for neural network-based voice enhancement and systems thereof

Patent number: 12125496

Abstract: The disclosed technology relates to methods, voice enhancement systems, and non-transitory computer readable media for real-time voice enhancement. In some examples, input audio data including foreground speech content, non-content elements, and speech characteristics is fragmented into input speech frames. The input speech frames are converted to low-dimensional representations of the input speech frames. One or more of the fragmentation or the conversion is based on an application of a first trained neural network to the input audio data. The low-dimensional representations of the input speech frames omit one or more of the non-content elements. A second trained neural network is applied to the low-dimensional representations of the input speech frames to generate target speech frames. The target speech frames are combined to generate output audio data. The output audio data further includes one or more portions of the foreground speech content and one or more of the speech characteristics.

Type: Grant

Filed: April 24, 2024

Date of Patent: October 22, 2024

Assignee: SANAS.AI INC.

Inventors: Shawn Zhang, Lukas Pfeifenberger, Jason Wu, Piotr Dura, David Braude, Bajibabu Bollepalli, Alvaro Escudero, Gokce Keskin, Ankita Jha, Maxim Serebryakov
Real-time accent conversion model

Patent number: 11948550

Abstract: Techniques for real-time accent conversion are described herein. An example computing device receives an indication of a first accent and a second accent. The computing device further receives, via at least one microphone, speech content having the first accent. The computing device is configured to derive, using a first machine-learning algorithm trained with audio data including the first accent, a linguistic representation of the received speech content having the first accent. The computing device is configured to, based on the derived linguistic representation of the received speech content having the first accent, synthesize, using a second machine learning-algorithm trained with (i) audio data comprising the first accent and (ii) audio data including the second accent, audio data representative of the received speech content having the second accent.

Type: Grant

Filed: August 27, 2021

Date of Patent: April 2, 2024

Assignee: SANAS.AI INC.

Inventors: Maxim Serebryakov, Shawn Zhang

System and method for style extraction in speech synthesis using neural networks and stored augmentations to simulate degraded speech characteristics

METHODS AND SYSTEMS FOR DETERMINING QUALITY ASSURANCE OF PARALLEL SPEECH UTTERANCES

METHODS AND SYSTEMS FOR DETERMINING QUALITY ASSURANCE OF PARALLEL SPEECH UTTERANCES

System and method for automatic alignment of phonetic content for real-time accent conversion

Methods for neural network-based voice enhancement and systems thereof

Real-time accent conversion model