Patents by Inventor Shawn Zhang
Shawn Zhang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12374318Abstract: The disclosed technology relates to methods, speech processing systems, and non-transitory computer readable media for style extraction in speech synthesis. In some examples, one or more content elements and one or more non-content elements are extracted from input audio data obtained via an audio interface and corresponding to input speech. The one or more non-content elements comprise style elements comprising at least an input pitch. A trained autoencoder is applied to encode the input pitch in a latent representation comprising a low-dimensional vector and combine the one or more content elements and the one or more non-content elements based on the low-dimensional vector to generate a new representation of the input speech. Output audio data is then generated and provided based on the new representation of the input speech. The output audio data comprises a pitch-consistent reconstruction of the input speech.Type: GrantFiled: July 25, 2024Date of Patent: July 29, 2025Assignee: SANAS.AI INC.Inventors: Lukas Pfeifenberger, Shawn Zhang, Sharath Keshava Narayana
-
Publication number: 20250190240Abstract: Techniques for providing adaptive warehouses in a multi-tenant data system are described. The workloads for the account can be multiplexed in the adaptive warehouse environment. Warehouse endpoints in a warehouse layer can be defined for an account in the multi-tenant data system. A compute layer for the account can be divided into workload regions, where each workload region corresponds to a different workload type.Type: ApplicationFiled: December 6, 2024Publication date: June 12, 2025Inventors: Prayag Chandran Nirmala, Samartha Chandrashekar, Jason Polites, Jeffrey Rosen, David Ruiz, Michael Uhlar, William Waddington, Shawn Zhang
-
Publication number: 20250166603Abstract: The disclosed technology relates to methods, speech processing systems, and non-transitory computer readable media for real-time accent mimicking. In some examples, trained machine learning model(s) are applied to first input audio data to extract accent features of first input speech associated with a first accent of a first user. Obtained second input data associated with second input speech associated with a second accent of a second user is analyzed to generate characteristics specific to a natural voice of the second user. A modified version of the second input speech is synthesized based on the generated characteristics and the extracted accent features. The modified version of the second input speech advantageously preserves aspects of the natural voice of the second user and mimics the first accent. Output audio data generated based on the modified version of the second input speech is provided for output via an audio output device.Type: ApplicationFiled: January 17, 2025Publication date: May 22, 2025Inventors: Ankita JHA, Lukas Pfeifenberger, Piotr Dura, David Braude, Alvaro Escudero, Shawn Zhang, Maxim Serebryakov
-
Publication number: 20250095665Abstract: The disclosed technology relates to methods, speech processing systems, and non-transitory computer readable media for real-time accent localization. In some examples, a geolocation of a first user device is determined, and accent features are extracted from first input speech, in response to first input audio data comprising the first input speech obtained from the first user device. Accent profiles identified based on the determined geolocation are compared to the extracted accent features to identify one of the accent profiles most closely matching the extracted accent features. Second input speech is modified to adjust an accent represented in the second input speech based on the identified one of the accent profiles. The second input speech with the adjusted accent is then provided to an audio interface of a second user device to improve communication bridging between users of the first and second user devices.Type: ApplicationFiled: December 2, 2024Publication date: March 20, 2025Inventors: Ankita JHA, Piotr DURA, David BRAUDE, Lukas PFEIFENBERGER, Alvaro ESCUDERO, Shawn ZHANG, Maxim SEREBRYAKOV
-
Publication number: 20250046332Abstract: The disclosed technology relates to methods, voice conversion systems, and non-transitory computer readable media for determining quality assurance of parallel speech utterances. In some examples, a candidate utterance and a reference utterance in obtained audio data are converted into first and second time series sequence representations, respectively, using acoustic features and linguistic features. A cross-correlation of the first and second time series sequence representations is performed to generate a result representing a first degree of similarity between the first and second time series sequence representations. An alignment difference of path-based distances between the reference and candidate speech utterances is generated. A quality metric is then output, which is generated based on the result of the cross-correlation and the alignment difference. The quality metric is indicative of a second degree of similarity between the candidate and reference utterances.Type: ApplicationFiled: October 22, 2024Publication date: February 6, 2025Applicant: Sanas.ai Inc.Inventors: Lukas PFEIFENBERGER, Shawn ZHANG
-
Publication number: 20250029622Abstract: The disclosed technology relates to methods, accent conversion systems, and non-transitory computer readable media for real-time accent conversion. In some examples, a set of phonetic embedding vectors is obtained for phonetic content representing a source accent and obtained from input audio data. A trained machine learning model is applied to the set of phonetic embedding vectors to generate a set of transformed phonetic embedding vectors corresponding to phonetic characteristics of speech data in a target accent. An alignment is determined by maximizing a cosine distance between the set of phonetic embedding vectors and the set of transformed phonetic embedding vectors. The speech data is then aligned to the phonetic content based on the determined alignment to generate output audio data representing the target accent.Type: ApplicationFiled: October 3, 2024Publication date: January 23, 2025Inventors: Lukas PFEIFENBERGER, Shawn ZHANG
-
Publication number: 20250029626Abstract: The disclosed technology relates to methods, voice enhancement systems, and non- transitory computer readable media for real-time voice enhancement. In some examples, input audio data including foreground speech content, non-content elements, and speech characteristics is fragmented into input speech frames. The input speech frames are converted to low-dimensional representations of the input speech frames. One or more of the fragmentation or the conversion is based on an application of a first trained neural network to the input audio data. The low-dimensional representations of the input speech frames omit one or more of the non-content elements. A second trained neural network is applied to the low-dimensional representations of the input speech frames to generate target speech frames. The target speech frames are combined to generate output audio data. The output audio data further includes one or more portions of the foreground speech content and one or more of the speech characteristics.Type: ApplicationFiled: October 4, 2024Publication date: January 23, 2025Inventors: Shawn ZHANG, Lukas PFEIFENBERGER, Jason WU, Piotr DURA, David BRAUDE, Bajibabu BOLLEPALLI, Alvaro ESCUDERO, Gokce KESKIN, Ankita JHA, Maxim SEREBRYAKOV
-
Publication number: 20250014587Abstract: The disclosed technology relates to methods, background noise suppression systems, and non-transitory computer readable media for background noise suppression. In some examples, frames fragmented from input audio data are projected into a higher dimension space than the input audio data. An estimated speech mask is applied to the frames to separate speech components and noise components of the frames. The speech components are then transformed into a feature domain of the input audio data by performing an inverse projection on the speech components to generate output audio data. The output audio data is provided via an audio interface. The output audio data advantageously comprises a noise-suppressed version of the input audio data.Type: ApplicationFiled: September 19, 2024Publication date: January 9, 2025Inventors: Lukas PFEIFENBERGER, Shawn Zhang, Monal Patel, Maxim Serebryakov, Raj Vardhan, Lan Shek, Ankita Jha
-
Publication number: 20240386875Abstract: Techniques for real-time accent conversion are described herein. An example computing device receives an indication of a first accent and a second accent. The computing device further receives, via at least one microphone, speech content having the first accent. The computing device is configured to derive, using a first machine-learning algorithm trained with audio data including the first accent, a linguistic representation of the received speech content having the first accent. The computing device is configured to, based on the derived linguistic representation of the received speech content having the first accent, synthesize, using a second machine learning-algorithm trained with (i) audio data comprising the first accent and (ii) audio data including the second accent, audio data representative of the received speech content having the second accent.Type: ApplicationFiled: July 30, 2024Publication date: November 21, 2024Inventors: Maxim SEREBRYAKOV, Shawn Zhang
-
Publication number: 20240363135Abstract: The disclosed technology relates to methods, voice conversion systems, and non-transitory computer readable media for determining quality assurance of parallel speech utterances. In some examples, a candidate utterance and a reference utterance in obtained audio data are converted into first and second time series sequence representations, respectively, using acoustic features and linguistic features. A cross-correlation of the first and second time series sequence representations is performed to generate a result representing a first degree of similarity between the first and second time series sequence representations. An alignment difference of path-based distances between the reference and candidate speech utterances is generated. A quality metric is then output, which is generated based on the result of the cross-correlation and the alignment difference. The quality metric is indicative of a second degree of similarity between the candidate and reference utterances.Type: ApplicationFiled: March 22, 2024Publication date: October 31, 2024Applicant: Sanas.ai Inc.Inventors: Lukas PFEIFENBERGER, Shawn ZHANG
-
Patent number: 12131745Abstract: The disclosed technology relates to methods, accent conversion systems, and non-transitory computer readable media for real-time accent conversion. In some examples, a set of phonetic embedding vectors is obtained for phonetic content representing a source accent and obtained from input audio data. A trained machine learning model is applied to the set of phonetic embedding vectors to generate a set of transformed phonetic embedding vectors corresponding to phonetic characteristics of speech data in a target accent. An alignment is determined by maximizing a cosine distance between the set of phonetic embedding vectors and the set of transformed phonetic embedding vectors. The speech data is then aligned to the phonetic content based on the determined alignment to generate output audio data representing the target accent.Type: GrantFiled: June 26, 2024Date of Patent: October 29, 2024Assignee: SANAS.AI INC.Inventors: Lukas Pfeifenberger, Shawn Zhang
-
Patent number: 12125496Abstract: The disclosed technology relates to methods, voice enhancement systems, and non-transitory computer readable media for real-time voice enhancement. In some examples, input audio data including foreground speech content, non-content elements, and speech characteristics is fragmented into input speech frames. The input speech frames are converted to low-dimensional representations of the input speech frames. One or more of the fragmentation or the conversion is based on an application of a first trained neural network to the input audio data. The low-dimensional representations of the input speech frames omit one or more of the non-content elements. A second trained neural network is applied to the low-dimensional representations of the input speech frames to generate target speech frames. The target speech frames are combined to generate output audio data. The output audio data further includes one or more portions of the foreground speech content and one or more of the speech characteristics.Type: GrantFiled: April 24, 2024Date of Patent: October 22, 2024Assignee: SANAS.AI INC.Inventors: Shawn Zhang, Lukas Pfeifenberger, Jason Wu, Piotr Dura, David Braude, Bajibabu Bollepalli, Alvaro Escudero, Gokce Keskin, Ankita Jha, Maxim Serebryakov
-
Publication number: 20240347070Abstract: The disclosed technology relates to methods, accent conversion systems, and non-transitory computer readable media for real-time accent conversion. In some examples, a set of phonetic embedding vectors is obtained for phonetic content representing a source accent and obtained from input audio data. A trained machine learning model is applied to the set of phonetic embedding vectors to generate a set of transformed phonetic embedding vectors corresponding to phonetic characteristics of speech data in a target accent. An alignment is determined by maximizing a cosine distance between the set of phonetic embedding vectors and the set of transformed phonetic embedding vectors. The speech data is then aligned to the phonetic content based on the determined alignment to generate output audio data representing the target accent.Type: ApplicationFiled: June 26, 2024Publication date: October 17, 2024Inventors: Lukas PFEIFENBERGER, Shawn Zhang
-
Publication number: 20240265908Abstract: Techniques for real-time accent conversion are described herein. An example computing device receives an indication of a first accent and a second accent. The computing device further receives, via at least one microphone, speech content having the first accent. The computing device is configured to derive, using a first machine-learning algorithm trained with audio data including the first accent, a linguistic representation of the received speech content having the first accent. The computing device is configured to, based on the derived linguistic representation of the received speech content having the first accent, synthesize, using a second machine learning-algorithm trained with (i) audio data comprising the first accent and (ii) audio data including the second accent, audio data representative of the received speech content having the second accent.Type: ApplicationFiled: March 5, 2024Publication date: August 8, 2024Inventors: Maxim SEREBRYAKOV, Shawn ZHANG
-
Patent number: 11948550Abstract: Techniques for real-time accent conversion are described herein. An example computing device receives an indication of a first accent and a second accent. The computing device further receives, via at least one microphone, speech content having the first accent. The computing device is configured to derive, using a first machine-learning algorithm trained with audio data including the first accent, a linguistic representation of the received speech content having the first accent. The computing device is configured to, based on the derived linguistic representation of the received speech content having the first accent, synthesize, using a second machine learning-algorithm trained with (i) audio data comprising the first accent and (ii) audio data including the second accent, audio data representative of the received speech content having the second accent.Type: GrantFiled: August 27, 2021Date of Patent: April 2, 2024Assignee: SANAS.AI INC.Inventors: Maxim Serebryakov, Shawn Zhang
-
Publication number: 20220358903Abstract: Techniques for real-time accent conversion are described herein. An example computing device receives an indication of a first accent and a second accent. The computing device further receives, via at least one microphone, speech content having the first accent. The computing device is configured to derive, using a first machine-learning algorithm trained with audio data including the first accent, a linguistic representation of the received speech content having the first accent. The computing device is configured to, based on the derived linguistic representation of the received speech content having the first accent, synthesize, using a second machine learning-algorithm trained with (i) audio data comprising the first accent and (ii) audio data including the second accent, audio data representative of the received speech content having the second accent.Type: ApplicationFiled: August 27, 2021Publication date: November 10, 2022Inventors: Maxim Serebryakov, Shawn Zhang
-
Publication number: 20210384559Abstract: Methods and systems are provided for predictive thermal models for determining current and power capabilities of battery components of a battery-powered system. In one example, a method may include measuring a reference temperature of a first component of the battery-powered system, correlating a target temperature of a second component of the battery-powered system to the reference temperature, determining a maximum current manageable by the second component over a predetermined duration based on the target temperature, and responsive to an actual current at the second component being requested greater than the maximum current during the predetermined duration, adjusting one or more operating conditions of the battery-powered system to maintain the actual current below the maximum current. In some examples, the first component may be different from the second component.Type: ApplicationFiled: June 4, 2021Publication date: December 9, 2021Inventors: Wei Zhao, Yufeng Liu, Shawn Zhang
-
Patent number: 8648216Abstract: One or more embodiments of the invention are directed to the synthetic methods for making lepidopteran pheromones including navel orangeworm pheromones. The synthetic methods involve novel, efficient, and environmentally benign steps and procedures.Type: GrantFiled: November 19, 2012Date of Patent: February 11, 2014Inventors: Andrew S. Thompson, Xiongzhi Shawn Zhang, Lonnie Robarge
-
Patent number: D692385Type: GrantFiled: October 22, 2012Date of Patent: October 29, 2013Assignee: Cooper Technologies CompanyInventors: Rohit Sumerchand Dodal, Shawn Zhang, Carlos Eduardo Restrepo
-
Patent number: D705734Type: GrantFiled: September 10, 2013Date of Patent: May 27, 2014Assignee: Cooper Technologies CompanyInventors: Rohit Sumerchand Dodal, Shawn Zhang, Carlos Eduardo Restrepo