Patents by Inventor Takuya Yoshioka
Takuya Yoshioka has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10957337Abstract: This document relates to separation of audio signals into speaker-specific signals. One example obtains features reflecting mixed speech signals captured by multiple microphones. The features can be input a neural network and masks can be obtained from the neural network. The masks can be applied one or more of the mixed speech signals captured by one or more of the microphones to obtain two or more separate speaker-specific speech signals, which can then be output.Type: GrantFiled: May 29, 2018Date of Patent: March 23, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Zhuo Chen, Hakan Erdogan, Takuya Yoshioka, Fileno A. Alleva, Xiong Xiao
-
Publication number: 20210076129Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.Type: ApplicationFiled: November 17, 2020Publication date: March 11, 2021Inventors: Zhuo CHEN, Changliang LIU, Takuya YOSHIOKA, Xiong XIAO, Hakan ERDOGAN, Dimitrios Basile DIMITRIADIS
-
Patent number: 10856076Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.Type: GrantFiled: April 5, 2019Date of Patent: December 1, 2020Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Zhuo Chen, Changliang Liu, Takuya Yoshioka, Xiong Xiao, Hakan Erdogan, Dimitrios Basile Dimitriadis
-
Patent number: 10839822Abstract: Representative embodiments disclose mechanisms to separate and recognize multiple audio sources (e.g., picking out individual speakers) in an environment where they overlap and interfere with each other. The architecture uses a microphone array to spatially separate out the audio signals. The spatially filtered signals are then input into a plurality of separators, so each signal is input into a corresponding signal. The separators use neural networks to separate out audio sources. The separators typically produce multiple output signals for the single input signals. A post selection processor then assesses the separator outputs to pick the signals with the highest quality output. These signals can be used in a variety of systems such as speech recognition, meeting transcription and enhancement, hearing aids, music information retrieval, speech enhancement and so forth.Type: GrantFiled: November 6, 2017Date of Patent: November 17, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Zhuo Chen, Jinyu Li, Xiong Xiao, Takuya Yoshioka, Huaming Wang, Zhenghao Wang, Yifan Gong
-
Publication number: 20200349230Abstract: Systems and methods for providing customized output based on a user preference in a distributed system are provided. In example embodiments, a meeting server or system receives audio streams from a plurality of distributed devices involved in an intelligent meeting. The meeting system identifies a user corresponding to a distributed device of the plurality of distributed devices and determines a preferred language of the user. A transcript from the received audio streams is generated. The meeting system translates the transcript into the preferred language of the user to form a translated transcript. The translated transcript is provided to the distributed device of the user.Type: ApplicationFiled: April 30, 2019Publication date: November 5, 2020Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
-
Publication number: 20200349949Abstract: A computer implemented method includes receiving audio streams at a meeting server from two distributed devices that are streaming audio captured during an ad-hoc meeting between at least two users, comparing the received audio streams to determine that the received audio streams are representative of sound from the ad-hoc meeting, generating a meeting instance to process the audio streams in response to the comparing determining that the audio streams are representative of sound from the ad-hoc meeting, and processing the received audio streams to generate a transcript of the ad-hoc meeting.Type: ApplicationFiled: April 30, 2019Publication date: November 5, 2020Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
-
Publication number: 20200351603Abstract: A computer implemented method includes receiving multiple channels of audio from three or more microphones detecting speech from a meeting of multiple users, localizing speech sources to determine an approximate direction of arrival of speech from a user, using a speech unmixing model to select two channels corresponding to a primary and a secondary microphone, and sending the two selected channels to a meeting server for generation of a speaker attributed meeting transcript.Type: ApplicationFiled: April 30, 2019Publication date: November 5, 2020Inventors: William Isaac Hinthorn, Lijuan Qin, Nanshan Zeng, Dimitrios Basile Dimitriadis, Zhuo Chen, Andreas Stolcke, Takuya Yoshioka, Xuedong Huang
-
Publication number: 20200349950Abstract: A computer implemented method processes audio streams recorded during a meeting by a plurality of distributed devices.Type: ApplicationFiled: April 30, 2019Publication date: November 5, 2020Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
-
Publication number: 20200349954Abstract: A computer implemented method includes receiving audio signals representative of speech via multiple audio streams transmitted from corresponding multiple distributed devices, performing, via a neural network model, continuous speech separation for one or more of the received audio signals having overlapped speech, and providing the separated speech on a fixed number of separate output audio channels.Type: ApplicationFiled: April 30, 2019Publication date: November 5, 2020Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
-
Publication number: 20200349953Abstract: A computer implemented method includes receiving information streams on a meeting server from a set of multiple distributed devices included in a meeting, receiving audio signals representative of speech by at least two users in at least two of the information streams, receiving at least one video signal of at least one user in the information streams, associating a specific user with speech in the received audio signals as a function of the received audio and video signals, and generating a transcript of the meeting with an indication of the specific user associated with the speech.Type: ApplicationFiled: April 30, 2019Publication date: November 5, 2020Inventors: Lijuan Qin, Nanshan Zeng, Dimitrios Basile Dimitriadis, Zhuo Chen, Andreas Stolcke, Takuya Yoshioka, William Isaac Hinthorn, Xuedong Huang
-
Publication number: 20200335119Abstract: Embodiments are associated with determination of a first plurality of multi-dimensional vectors, each of the first plurality of multi-dimensional vectors representing speech of a target speaker, determination of a multi-dimensional vector representing a speech signal of two or more speakers, determination of a weighted vector representing speech of the target speaker based on the first plurality of multi-dimensional vectors and on similarities between the multi-dimensional vector and each of the first plurality of multi-dimensional vectors, and extraction of speech of the target speaker from the speech signal based on the weighted vector and the speech signal.Type: ApplicationFiled: June 7, 2019Publication date: October 22, 2020Inventors: Xiong XIAO, Zhuo CHEN, Takuya YOSHIOKA, Changliang LIU, Hakan ERDOGAN, Dimitrios Basile DIMITRIADIS, Yifan GONG, James Garnet Droppo, III
-
Patent number: 10812921Abstract: A computer implemented method includes receiving multiple channels of audio from three or more microphones detecting speech from a meeting of multiple users, localizing speech sources to determine an approximate direction of arrival of speech from a user, using a speech unmixing model to select two channels corresponding to a primary and a secondary microphone, and sending the two selected channels to a meeting server for generation of a speaker attributed meeting transcript.Type: GrantFiled: April 30, 2019Date of Patent: October 20, 2020Assignee: Microsoft Technology Licensing, LLCInventors: William Isaac Hinthorn, Lijuan Qin, Nanshan Zeng, Dimitrios Basile Dimitriadis, Zhuo Chen, Andreas Stolcke, Takuya Yoshioka, Xuedong Huang
-
Publication number: 20200322722Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.Type: ApplicationFiled: April 5, 2019Publication date: October 8, 2020Inventors: Zhuo CHEN, Changliang LIU, Takuya YOSHIOKA, Xiong XIAO, Hakan ERDOGAN, Dimitrios Basile DIMITRIADIS
-
Patent number: 10743107Abstract: A computer implemented method includes receiving audio signals representative of speech via multiple audio channels transmitted from corresponding multiple distributed devices, designating one of the audio channels as a reference channel, and for each of the remaining audio channels, determine a difference in time from the reference channel, and correcting each remaining audio channel by compensating for the corresponding difference in time from the reference channel.Type: GrantFiled: April 30, 2019Date of Patent: August 11, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
-
Patent number: 10643633Abstract: An observation feature value vector is calculated based on observation signals recorded at different positions in a situation in which target sound sources and background noise are present in a mixed manner; masks associated with the target sound sources and a mask associated with the background noise are estimated; a spatial correlation matrix of the target sound sources that includes the background noise is calculated based on the masks associated with the observation signals and the target sound sources; a spatial correlation matrix of the background noise is calculated based on the masks associated with the observation signals and the background noise; and a spatial correlation matrix of the target sound sources is estimated based on the matrix obtained by weighting each of the spatial correlation matrices by predetermined coefficients.Type: GrantFiled: December 1, 2016Date of Patent: May 5, 2020Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Tomohiro Nakatani, Nobutaka Ito, Takuya Higuchi, Shoko Araki, Takuya Yoshioka
-
Publication number: 20200055970Abstract: Provided is a fluorinated polymer that can impart excellent washing durability and water- and oil-repellency to fibers, said fluorinated polymer having a repeating unit derived from a fluorinated monomer (a) that comprises a first fluorinated monomer (a1) represented by the formula: CH2?C(—X1)—C(?O)—Y1—Z1—Rf1 [wherein X1 represents a halogen atom; Y1 represents —O— or —NH—; Z1 represents a direct bond or a bivalent organic group; and Rf1 represents a fluoroalkyl group having 1 to 20 carbon atoms] and a second fluorinated monomer (a2) represented by the formula: CH2?C(—X2)—C(?O)—Y2—Z2—Rf2 [wherein X2 represents a monovalent organic group or a hydrogen atom; Y2 represents —O— or —NH—; Z2 represents a direct bond or a bivalent organic group; and Rf2 represents a fluoroalkyl group having 1 to 20 carbon atoms].Type: ApplicationFiled: October 31, 2017Publication date: February 20, 2020Applicant: DAIKIN INDUSTRIES, LTD.Inventors: Shinichi MINAMI, Masaki FUKUMORI, Takashi ENOMOTO, Takuya YOSHIOKA, Ikuo YAMAMOTO, Bin ZHOU, Min ZHU
-
Publication number: 20190318757Abstract: This document relates to separation of audio signals into speaker-specific signals. One example obtains features reflecting mixed speech signals captured by multiple microphones. The features can be input a neural network and masks can be obtained from the neural network. The masks can be applied one or more of the mixed speech signals captured by one or more of the microphones to obtain two or more separate speaker-specific speech signals, which can then be output.Type: ApplicationFiled: May 29, 2018Publication date: October 17, 2019Applicant: Microsoft Technology Licensing, LLCInventors: Zhuo CHEN, Hakan ERDOGAN, Takuya YOSHIOKA, Fileno A. ALLEVA, Xiong XIAO
-
Publication number: 20190139563Abstract: Representative embodiments disclose mechanisms to separate and recognize multiple audio sources (e.g., picking out individual speakers) in an environment where they overlap and interfere with each other. The architecture uses a microphone array to spatially separate out the audio signals. The spatially filtered signals are then input into a plurality of separators, so each signal is input into a corresponding signal. The separators use neural networks to separate out audio sources. The separators typically produce multiple output signals for the single input signals. A post selection processor then assesses the separator outputs to pick the signals with the highest quality output. These signals can be used in a variety of systems such as speech recognition, meeting transcription and enhancement, hearing aids, music information retrieval, speech enhancement and so forth.Type: ApplicationFiled: November 6, 2017Publication date: May 9, 2019Inventors: Zhuo Chen, Jinyu Li, Xiong Xiao, Takuya Yoshioka, Huaming Wang, Zhenghao Wang, Yifan Gong
-
Publication number: 20180366135Abstract: An observation feature value vector is calculated based on observation signals recorded at different positions in a situation in which target sound sources and background noise are present in a mixed manner; masks associated with the target sound sources and a mask associated with the background noise are estimated; a spatial correlation matrix of the target sound sources that includes the background noise is calculated based on the masks associated with the observation signals and the target sound sources; a spatial correlation matrix of the background noise is calculated based on the masks associated with the observation signals and the background noise; and a spatial correlation matrix of the target sound sources is estimated based on the matrix obtained by weighting each of the spatial correlation matrices by predetermined coefficients.Type: ApplicationFiled: December 1, 2016Publication date: December 20, 2018Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Tomohiro NAKATANI, Nobutaka ITO, Takuya HIGUCHI, Shoko ARAKI, Takuya YOSHIOKA
-
Patent number: 9754608Abstract: A noise estimation apparatus which estimates a non-stationary noise component on the basis of the likelihood maximization criterion is provided. The noise estimation apparatus obtains the variance of a noise signal that causes a large value to be obtained by weighted addition of the sums each of which is obtained by adding the product of the log likelihood of a model of an observed signal expressed by a Gaussian distribution in a speech segment and a speech posterior probability in each frame, and the product of the log likelihood of a model of an observed signal expressed by a Gaussian distribution in a non-speech segment and a non-speech posterior probability in each frame, by using complex spectra of a plurality of observed signals up to the current frame.Type: GrantFiled: January 30, 2013Date of Patent: September 5, 2017Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Mehrez Souden, Keisuke Kinoshita, Tomohiro Nakatani, Marc Delcroix, Takuya Yoshioka