Patents by Inventor Tianyan ZHOU

Tianyan ZHOU has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Convolutional neural network with phonetic attention for speaker verification

Patent number: 11776548

Abstract: Embodiments may include determination, for each of a plurality of speech frames associated with an acoustic feature, of a phonetic feature based on the associated acoustic feature, generation of one or more two-dimensional feature maps based on the plurality of phonetic features, input of the one or more two-dimensional feature maps to a trained neural network to generate a plurality of speaker embeddings, and aggregation of the plurality of speaker embeddings into a speaker embedding based on respective weights determined for each of the plurality of speaker embeddings, wherein the speaker embedding is associated with an identity of the speaker.

Type: Grant

Filed: February 7, 2022

Date of Patent: October 3, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yong Zhao, Tianyan Zhou, Jinyu Li, Yifan Gong, Jian Wu, Zhuo Chen
CONVOLUTIONAL NEURAL NETWORK WITH PHONETIC ATTENTION FOR SPEAKER VERIFICATION

Publication number: 20220157324

Abstract: Embodiments may include determination, for each of a plurality of speech frames associated with an acoustic feature, of a phonetic feature based on the associated acoustic feature, generation of one or more two-dimensional feature maps based on the plurality of phonetic features, input of the one or more two-dimensional feature maps to a trained neural network to generate a plurality of speaker embeddings, and aggregation of the plurality of speaker embeddings into a speaker embedding based on respective weights determined for each of the plurality of speaker embeddings, wherein the speaker embedding is associated with an identity of the speaker.

Type: Application

Filed: February 7, 2022

Publication date: May 19, 2022

Inventors: Yong ZHAO, Tianyan ZHOU, Jinyu LI, Yifan GONG, Jian WU, Zhuo CHEN
Convolutional neural network with phonetic attention for speaker verification

Patent number: 11276410

Abstract: Embodiments may include reception of a plurality of speech frames, determination of a multi-dimensional acoustic feature associated with each of the plurality of speech frames, determination of a plurality of multi-dimensional phonetic features, each of the plurality of multi-dimensional phonetic features determined based on a respective one of the plurality of speech frames, generation of a plurality of two-dimensional feature maps based on the phonetic features, input of the feature maps and the plurality of acoustic features to a convolutional neural network, the convolutional neural network to generate a plurality of speaker embeddings based on the plurality of feature maps and the plurality of acoustic features, aggregation of the plurality of speaker embeddings into a first speaker embedding based on respective weights determined for each of the plurality of speaker embeddings, and determination of a speaker associated with the plurality of speech frames based on the first speaker embedding.

Type: Grant

Filed: November 13, 2019

Date of Patent: March 15, 2022

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Yong Zhao, Tianyan Zhou, Jinyu Li, Yifan Gong, Jian Wu, Zhuo Chen
CONVOLUTIONAL NEURAL NETWORK WITH PHONETIC ATTENTION FOR SPEAKER VERIFICATION

Publication number: 20210082438

Abstract: Embodiments may include reception of a plurality of speech frames, determination of a multi-dimensional acoustic feature associated with each of the plurality of speech frames, determination of a plurality of multi-dimensional phonetic features, each of the plurality of multi-dimensional phonetic features determined based on a respective one of the plurality of speech frames, generation of a plurality of two-dimensional feature maps based on the phonetic features, input of the feature maps and the plurality of acoustic features to a convolutional neural network, the convolutional neural network to generate a plurality of speaker embeddings based on the plurality of feature maps and the plurality of acoustic features, aggregation of the plurality of speaker embeddings into a first speaker embedding based on respective weights determined for each of the plurality of speaker embeddings, and determination of a speaker associated with the plurality of speech frames based on the first speaker embedding.

Type: Application

Filed: November 13, 2019

Publication date: March 18, 2021

Inventors: Yong ZHAO, Tianyan ZHOU, Jinyu LI, Yifan GONG, Jian WU, Zhuo CHEN

Convolutional neural network with phonetic attention for speaker verification

CONVOLUTIONAL NEURAL NETWORK WITH PHONETIC ATTENTION FOR SPEAKER VERIFICATION

Convolutional neural network with phonetic attention for speaker verification

CONVOLUTIONAL NEURAL NETWORK WITH PHONETIC ATTENTION FOR SPEAKER VERIFICATION