Patents by Inventor Takuya Yoshioka

Takuya Yoshioka has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Multi-microphone speech separation

Patent number: 10957337

Abstract: This document relates to separation of audio signals into speaker-specific signals. One example obtains features reflecting mixed speech signals captured by multiple microphones. The features can be input a neural network and masks can be obtained from the neural network. The masks can be applied one or more of the mixed speech signals captured by one or more of the microphones to obtain two or more separate speaker-specific speech signals, which can then be output.

Type: Grant

Filed: May 29, 2018

Date of Patent: March 23, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Zhuo Chen, Hakan Erdogan, Takuya Yoshioka, Fileno A. Alleva, Xiong Xiao
LOW-LATENCY SPEECH SEPARATION

Publication number: 20210076129

Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.

Type: Application

Filed: November 17, 2020

Publication date: March 11, 2021

Inventors: Zhuo CHEN, Changliang LIU, Takuya YOSHIOKA, Xiong XIAO, Hakan ERDOGAN, Dimitrios Basile DIMITRIADIS
Low-latency speech separation

Patent number: 10856076

Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.

Type: Grant

Filed: April 5, 2019

Date of Patent: December 1, 2020

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Zhuo Chen, Changliang Liu, Takuya Yoshioka, Xiong Xiao, Hakan Erdogan, Dimitrios Basile Dimitriadis
Multi-channel speech separation

Patent number: 10839822

Abstract: Representative embodiments disclose mechanisms to separate and recognize multiple audio sources (e.g., picking out individual speakers) in an environment where they overlap and interfere with each other. The architecture uses a microphone array to spatially separate out the audio signals. The spatially filtered signals are then input into a plurality of separators, so each signal is input into a corresponding signal. The separators use neural networks to separate out audio sources. The separators typically produce multiple output signals for the single input signals. A post selection processor then assesses the separator outputs to pick the signals with the highest quality output. These signals can be used in a variety of systems such as speech recognition, meeting transcription and enhancement, hearing aids, music information retrieval, speech enhancement and so forth.

Type: Grant

Filed: November 6, 2017

Date of Patent: November 17, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Zhuo Chen, Jinyu Li, Xiong Xiao, Takuya Yoshioka, Huaming Wang, Zhenghao Wang, Yifan Gong
CUSTOMIZED OUTPUT TO OPTIMIZE FOR USER PREFERENCE IN A DISTRIBUTED SYSTEM

Publication number: 20200349230

Abstract: Systems and methods for providing customized output based on a user preference in a distributed system are provided. In example embodiments, a meeting server or system receives audio streams from a plurality of distributed devices involved in an intelligent meeting. The meeting system identifies a user corresponding to a distributed device of the plurality of distributed devices and determines a preferred language of the user. A transcript from the received audio streams is generated. The meeting system translates the transcript into the preferred language of the user to form a translated transcript. The translated transcript is provided to the distributed device of the user.

Type: Application

Filed: April 30, 2019

Publication date: November 5, 2020

Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
Distributed Device Meeting Initiation

Publication number: 20200349949

Abstract: A computer implemented method includes receiving audio streams at a meeting server from two distributed devices that are streaming audio captured during an ad-hoc meeting between at least two users, comparing the received audio streams to determine that the received audio streams are representative of sound from the ad-hoc meeting, generating a meeting instance to process the audio streams in response to the comparing determining that the audio streams are representative of sound from the ad-hoc meeting, and processing the received audio streams to generate a transcript of the ad-hoc meeting.

Type: Application

Filed: April 30, 2019

Publication date: November 5, 2020

Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
Audio Stream Processing for Distributed Device Meeting

Publication number: 20200351603

Abstract: A computer implemented method includes receiving multiple channels of audio from three or more microphones detecting speech from a meeting of multiple users, localizing speech sources to determine an approximate direction of arrival of speech from a user, using a speech unmixing model to select two channels corresponding to a primary and a secondary microphone, and sending the two selected channels to a meeting server for generation of a speaker attributed meeting transcript.

Type: Application

Filed: April 30, 2019

Publication date: November 5, 2020

Inventors: William Isaac Hinthorn, Lijuan Qin, Nanshan Zeng, Dimitrios Basile Dimitriadis, Zhuo Chen, Andreas Stolcke, Takuya Yoshioka, Xuedong Huang
Speaker Attributed Transcript Generation

Publication number: 20200349950

Abstract: A computer implemented method processes audio streams recorded during a meeting by a plurality of distributed devices.

Type: Application

Filed: April 30, 2019

Publication date: November 5, 2020

Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
Processing Overlapping Speech from Distributed Devices

Publication number: 20200349954

Abstract: A computer implemented method includes receiving audio signals representative of speech via multiple audio streams transmitted from corresponding multiple distributed devices, performing, via a neural network model, continuous speech separation for one or more of the received audio signals having overlapped speech, and providing the separated speech on a fixed number of separate output audio channels.

Type: Application

Filed: April 30, 2019

Publication date: November 5, 2020

Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
Audio-visual diarization to identify meeting attendees

Publication number: 20200349953

Abstract: A computer implemented method includes receiving information streams on a meeting server from a set of multiple distributed devices included in a meeting, receiving audio signals representative of speech by at least two users in at least two of the information streams, receiving at least one video signal of at least one user in the information streams, associating a specific user with speech in the received audio signals as a function of the received audio and video signals, and generating a transcript of the meeting with an indication of the specific user associated with the speech.

Type: Application

Filed: April 30, 2019

Publication date: November 5, 2020

Inventors: Lijuan Qin, Nanshan Zeng, Dimitrios Basile Dimitriadis, Zhuo Chen, Andreas Stolcke, Takuya Yoshioka, William Isaac Hinthorn, Xuedong Huang
SPEECH EXTRACTION USING ATTENTION NETWORK

Publication number: 20200335119

Abstract: Embodiments are associated with determination of a first plurality of multi-dimensional vectors, each of the first plurality of multi-dimensional vectors representing speech of a target speaker, determination of a multi-dimensional vector representing a speech signal of two or more speakers, determination of a weighted vector representing speech of the target speaker based on the first plurality of multi-dimensional vectors and on similarities between the multi-dimensional vector and each of the first plurality of multi-dimensional vectors, and extraction of speech of the target speaker from the speech signal based on the weighted vector and the speech signal.

Type: Application

Filed: June 7, 2019

Publication date: October 22, 2020

Inventors: Xiong XIAO, Zhuo CHEN, Takuya YOSHIOKA, Changliang LIU, Hakan ERDOGAN, Dimitrios Basile DIMITRIADIS, Yifan GONG, James Garnet Droppo, III
Audio stream processing for distributed device meeting

Patent number: 10812921

Abstract: A computer implemented method includes receiving multiple channels of audio from three or more microphones detecting speech from a meeting of multiple users, localizing speech sources to determine an approximate direction of arrival of speech from a user, using a speech unmixing model to select two channels corresponding to a primary and a secondary microphone, and sending the two selected channels to a meeting server for generation of a speaker attributed meeting transcript.

Type: Grant

Filed: April 30, 2019

Date of Patent: October 20, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: William Isaac Hinthorn, Lijuan Qin, Nanshan Zeng, Dimitrios Basile Dimitriadis, Zhuo Chen, Andreas Stolcke, Takuya Yoshioka, Xuedong Huang
LOW-LATENCY SPEECH SEPARATION

Publication number: 20200322722

Abstract: A system and method include reception of a first plurality of audio signals, generation of a second plurality of beamformed audio signals based on the first plurality of audio signals, each of the second plurality of beamformed audio signals associated with a respective one of a second plurality of beamformer directions, generation of a first TF mask for a first output channel based on the first plurality of audio signals, determination of a first beamformer direction associated with a first target sound source based on the first TF mask, generation of first features based on the first beamformer direction and the first plurality of audio signals, determination of a second TF mask based on the first features, and application of the second TF mask to one of the second plurality of beamformed audio signals associated with the first beamformer direction.

Type: Application

Filed: April 5, 2019

Publication date: October 8, 2020

Inventors: Zhuo CHEN, Changliang LIU, Takuya YOSHIOKA, Xiong XIAO, Hakan ERDOGAN, Dimitrios Basile DIMITRIADIS
Synchronization of audio signals from distributed devices

Patent number: 10743107

Abstract: A computer implemented method includes receiving audio signals representative of speech via multiple audio channels transmitted from corresponding multiple distributed devices, designating one of the audio channels as a reference channel, and for each of the remaining audio channels, determine a difference in time from the reference channel, and correcting each remaining audio channel by compensating for the corresponding difference in time from the reference channel.

Type: Grant

Filed: April 30, 2019

Date of Patent: August 11, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Takuya Yoshioka, Andreas Stolcke, Zhuo Chen, Dimitrios Basile Dimitriadis, Nanshan Zeng, Lijuan Qin, William Isaac Hinthorn, Xuedong Huang
Spatial correlation matrix estimation device, spatial correlation matrix estimation method, and spatial correlation matrix estimation program

Patent number: 10643633

Abstract: An observation feature value vector is calculated based on observation signals recorded at different positions in a situation in which target sound sources and background noise are present in a mixed manner; masks associated with the target sound sources and a mask associated with the background noise are estimated; a spatial correlation matrix of the target sound sources that includes the background noise is calculated based on the masks associated with the observation signals and the target sound sources; a spatial correlation matrix of the background noise is calculated based on the masks associated with the observation signals and the background noise; and a spatial correlation matrix of the target sound sources is estimated based on the matrix obtained by weighting each of the spatial correlation matrices by predetermined coefficients.

Type: Grant

Filed: December 1, 2016

Date of Patent: May 5, 2020

Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventors: Tomohiro Nakatani, Nobutaka Ito, Takuya Higuchi, Shoko Araki, Takuya Yoshioka
FLUORINATED POLYMER AND SURFACE TREATING AGENT COMPOSITION

Publication number: 20200055970

Abstract: Provided is a fluorinated polymer that can impart excellent washing durability and water- and oil-repellency to fibers, said fluorinated polymer having a repeating unit derived from a fluorinated monomer (a) that comprises a first fluorinated monomer (a1) represented by the formula: CH2?C(—X1)—C(?O)—Y1—Z1—Rf1 [wherein X1 represents a halogen atom; Y1 represents —O— or —NH—; Z1 represents a direct bond or a bivalent organic group; and Rf1 represents a fluoroalkyl group having 1 to 20 carbon atoms] and a second fluorinated monomer (a2) represented by the formula: CH2?C(—X2)—C(?O)—Y2—Z2—Rf2 [wherein X2 represents a monovalent organic group or a hydrogen atom; Y2 represents —O— or —NH—; Z2 represents a direct bond or a bivalent organic group; and Rf2 represents a fluoroalkyl group having 1 to 20 carbon atoms].

Type: Application

Filed: October 31, 2017

Publication date: February 20, 2020

Applicant: DAIKIN INDUSTRIES, LTD.

Inventors: Shinichi MINAMI, Masaki FUKUMORI, Takashi ENOMOTO, Takuya YOSHIOKA, Ikuo YAMAMOTO, Bin ZHOU, Min ZHU
MULTI-MICROPHONE SPEECH SEPARATION

Publication number: 20190318757

Abstract: This document relates to separation of audio signals into speaker-specific signals. One example obtains features reflecting mixed speech signals captured by multiple microphones. The features can be input a neural network and masks can be obtained from the neural network. The masks can be applied one or more of the mixed speech signals captured by one or more of the microphones to obtain two or more separate speaker-specific speech signals, which can then be output.

Type: Application

Filed: May 29, 2018

Publication date: October 17, 2019

Applicant: Microsoft Technology Licensing, LLC

Inventors: Zhuo CHEN, Hakan ERDOGAN, Takuya YOSHIOKA, Fileno A. ALLEVA, Xiong XIAO
MULTI-CHANNEL SPEECH SEPARATION

Publication number: 20190139563

Abstract: Representative embodiments disclose mechanisms to separate and recognize multiple audio sources (e.g., picking out individual speakers) in an environment where they overlap and interfere with each other. The architecture uses a microphone array to spatially separate out the audio signals. The spatially filtered signals are then input into a plurality of separators, so each signal is input into a corresponding signal. The separators use neural networks to separate out audio sources. The separators typically produce multiple output signals for the single input signals. A post selection processor then assesses the separator outputs to pick the signals with the highest quality output. These signals can be used in a variety of systems such as speech recognition, meeting transcription and enhancement, hearing aids, music information retrieval, speech enhancement and so forth.

Type: Application

Filed: November 6, 2017

Publication date: May 9, 2019

Inventors: Zhuo Chen, Jinyu Li, Xiong Xiao, Takuya Yoshioka, Huaming Wang, Zhenghao Wang, Yifan Gong
SPATIAL CORRELATION MATRIX ESTIMATION DEVICE, SPATIAL CORRELATION MATRIX ESTIMATION METHOD, AND SPATIAL CORRELATION MATRIX ESTIMATION PROGRAM

Publication number: 20180366135

Abstract: An observation feature value vector is calculated based on observation signals recorded at different positions in a situation in which target sound sources and background noise are present in a mixed manner; masks associated with the target sound sources and a mask associated with the background noise are estimated; a spatial correlation matrix of the target sound sources that includes the background noise is calculated based on the masks associated with the observation signals and the target sound sources; a spatial correlation matrix of the background noise is calculated based on the masks associated with the observation signals and the background noise; and a spatial correlation matrix of the target sound sources is estimated based on the matrix obtained by weighting each of the spatial correlation matrices by predetermined coefficients.

Type: Application

Filed: December 1, 2016

Publication date: December 20, 2018

Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventors: Tomohiro NAKATANI, Nobutaka ITO, Takuya HIGUCHI, Shoko ARAKI, Takuya YOSHIOKA
Noise estimation apparatus, noise estimation method, noise estimation program, and recording medium

Patent number: 9754608

Abstract: A noise estimation apparatus which estimates a non-stationary noise component on the basis of the likelihood maximization criterion is provided. The noise estimation apparatus obtains the variance of a noise signal that causes a large value to be obtained by weighted addition of the sums each of which is obtained by adding the product of the log likelihood of a model of an observed signal expressed by a Gaussian distribution in a speech segment and a speech posterior probability in each frame, and the product of the log likelihood of a model of an observed signal expressed by a Gaussian distribution in a non-speech segment and a non-speech posterior probability in each frame, by using complex spectra of a plurality of observed signals up to the current frame.

Type: Grant

Filed: January 30, 2013

Date of Patent: September 5, 2017

Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventors: Mehrez Souden, Keisuke Kinoshita, Tomohiro Nakatani, Marc Delcroix, Takuya Yoshioka

prev 1 2 3 next