Patents by Inventor Lukas PFEIFENBERGER

Lukas PFEIFENBERGER has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250095665
    Abstract: The disclosed technology relates to methods, speech processing systems, and non-transitory computer readable media for real-time accent localization. In some examples, a geolocation of a first user device is determined, and accent features are extracted from first input speech, in response to first input audio data comprising the first input speech obtained from the first user device. Accent profiles identified based on the determined geolocation are compared to the extracted accent features to identify one of the accent profiles most closely matching the extracted accent features. Second input speech is modified to adjust an accent represented in the second input speech based on the identified one of the accent profiles. The second input speech with the adjusted accent is then provided to an audio interface of a second user device to improve communication bridging between users of the first and second user devices.
    Type: Application
    Filed: December 2, 2024
    Publication date: March 20, 2025
    Inventors: Ankita JHA, Piotr DURA, David BRAUDE, Lukas PFEIFENBERGER, Alvaro ESCUDERO, Shawn ZHANG, Maxim SEREBRYAKOV
  • Publication number: 20250046332
    Abstract: The disclosed technology relates to methods, voice conversion systems, and non-transitory computer readable media for determining quality assurance of parallel speech utterances. In some examples, a candidate utterance and a reference utterance in obtained audio data are converted into first and second time series sequence representations, respectively, using acoustic features and linguistic features. A cross-correlation of the first and second time series sequence representations is performed to generate a result representing a first degree of similarity between the first and second time series sequence representations. An alignment difference of path-based distances between the reference and candidate speech utterances is generated. A quality metric is then output, which is generated based on the result of the cross-correlation and the alignment difference. The quality metric is indicative of a second degree of similarity between the candidate and reference utterances.
    Type: Application
    Filed: October 22, 2024
    Publication date: February 6, 2025
    Applicant: Sanas.ai Inc.
    Inventors: Lukas PFEIFENBERGER, Shawn ZHANG
  • Publication number: 20250029622
    Abstract: The disclosed technology relates to methods, accent conversion systems, and non-transitory computer readable media for real-time accent conversion. In some examples, a set of phonetic embedding vectors is obtained for phonetic content representing a source accent and obtained from input audio data. A trained machine learning model is applied to the set of phonetic embedding vectors to generate a set of transformed phonetic embedding vectors corresponding to phonetic characteristics of speech data in a target accent. An alignment is determined by maximizing a cosine distance between the set of phonetic embedding vectors and the set of transformed phonetic embedding vectors. The speech data is then aligned to the phonetic content based on the determined alignment to generate output audio data representing the target accent.
    Type: Application
    Filed: October 3, 2024
    Publication date: January 23, 2025
    Inventors: Lukas PFEIFENBERGER, Shawn ZHANG
  • Publication number: 20250029626
    Abstract: The disclosed technology relates to methods, voice enhancement systems, and non- transitory computer readable media for real-time voice enhancement. In some examples, input audio data including foreground speech content, non-content elements, and speech characteristics is fragmented into input speech frames. The input speech frames are converted to low-dimensional representations of the input speech frames. One or more of the fragmentation or the conversion is based on an application of a first trained neural network to the input audio data. The low-dimensional representations of the input speech frames omit one or more of the non-content elements. A second trained neural network is applied to the low-dimensional representations of the input speech frames to generate target speech frames. The target speech frames are combined to generate output audio data. The output audio data further includes one or more portions of the foreground speech content and one or more of the speech characteristics.
    Type: Application
    Filed: October 4, 2024
    Publication date: January 23, 2025
    Inventors: Shawn ZHANG, Lukas PFEIFENBERGER, Jason WU, Piotr DURA, David BRAUDE, Bajibabu BOLLEPALLI, Alvaro ESCUDERO, Gokce KESKIN, Ankita JHA, Maxim SEREBRYAKOV
  • Publication number: 20250014587
    Abstract: The disclosed technology relates to methods, background noise suppression systems, and non-transitory computer readable media for background noise suppression. In some examples, frames fragmented from input audio data are projected into a higher dimension space than the input audio data. An estimated speech mask is applied to the frames to separate speech components and noise components of the frames. The speech components are then transformed into a feature domain of the input audio data by performing an inverse projection on the speech components to generate output audio data. The output audio data is provided via an audio interface. The output audio data advantageously comprises a noise-suppressed version of the input audio data.
    Type: Application
    Filed: September 19, 2024
    Publication date: January 9, 2025
    Inventors: Lukas PFEIFENBERGER, Shawn Zhang, Monal Patel, Maxim Serebryakov, Raj Vardhan, Lan Shek, Ankita Jha
  • Publication number: 20240363135
    Abstract: The disclosed technology relates to methods, voice conversion systems, and non-transitory computer readable media for determining quality assurance of parallel speech utterances. In some examples, a candidate utterance and a reference utterance in obtained audio data are converted into first and second time series sequence representations, respectively, using acoustic features and linguistic features. A cross-correlation of the first and second time series sequence representations is performed to generate a result representing a first degree of similarity between the first and second time series sequence representations. An alignment difference of path-based distances between the reference and candidate speech utterances is generated. A quality metric is then output, which is generated based on the result of the cross-correlation and the alignment difference. The quality metric is indicative of a second degree of similarity between the candidate and reference utterances.
    Type: Application
    Filed: March 22, 2024
    Publication date: October 31, 2024
    Applicant: Sanas.ai Inc.
    Inventors: Lukas PFEIFENBERGER, Shawn ZHANG
  • Patent number: 12131745
    Abstract: The disclosed technology relates to methods, accent conversion systems, and non-transitory computer readable media for real-time accent conversion. In some examples, a set of phonetic embedding vectors is obtained for phonetic content representing a source accent and obtained from input audio data. A trained machine learning model is applied to the set of phonetic embedding vectors to generate a set of transformed phonetic embedding vectors corresponding to phonetic characteristics of speech data in a target accent. An alignment is determined by maximizing a cosine distance between the set of phonetic embedding vectors and the set of transformed phonetic embedding vectors. The speech data is then aligned to the phonetic content based on the determined alignment to generate output audio data representing the target accent.
    Type: Grant
    Filed: June 26, 2024
    Date of Patent: October 29, 2024
    Assignee: SANAS.AI INC.
    Inventors: Lukas Pfeifenberger, Shawn Zhang
  • Patent number: 12125496
    Abstract: The disclosed technology relates to methods, voice enhancement systems, and non-transitory computer readable media for real-time voice enhancement. In some examples, input audio data including foreground speech content, non-content elements, and speech characteristics is fragmented into input speech frames. The input speech frames are converted to low-dimensional representations of the input speech frames. One or more of the fragmentation or the conversion is based on an application of a first trained neural network to the input audio data. The low-dimensional representations of the input speech frames omit one or more of the non-content elements. A second trained neural network is applied to the low-dimensional representations of the input speech frames to generate target speech frames. The target speech frames are combined to generate output audio data. The output audio data further includes one or more portions of the foreground speech content and one or more of the speech characteristics.
    Type: Grant
    Filed: April 24, 2024
    Date of Patent: October 22, 2024
    Assignee: SANAS.AI INC.
    Inventors: Shawn Zhang, Lukas Pfeifenberger, Jason Wu, Piotr Dura, David Braude, Bajibabu Bollepalli, Alvaro Escudero, Gokce Keskin, Ankita Jha, Maxim Serebryakov
  • Publication number: 20240347070
    Abstract: The disclosed technology relates to methods, accent conversion systems, and non-transitory computer readable media for real-time accent conversion. In some examples, a set of phonetic embedding vectors is obtained for phonetic content representing a source accent and obtained from input audio data. A trained machine learning model is applied to the set of phonetic embedding vectors to generate a set of transformed phonetic embedding vectors corresponding to phonetic characteristics of speech data in a target accent. An alignment is determined by maximizing a cosine distance between the set of phonetic embedding vectors and the set of transformed phonetic embedding vectors. The speech data is then aligned to the phonetic content based on the determined alignment to generate output audio data representing the target accent.
    Type: Application
    Filed: June 26, 2024
    Publication date: October 17, 2024
    Inventors: Lukas PFEIFENBERGER, Shawn Zhang