Patents by Inventor Kazuhito Koishida

Kazuhito Koishida has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Audio-visual speech enhancement

Patent number: 11244696

Abstract: Example speech enhancement systems include a spatio-temporal residual network configured to receive video data containing a target speaker and extract visual features from the video data, an autoencoder configured to receive input of an audio spectrogram and extract audio features from the audio spectrogram, and a squeeze-excitation fusion block configured to receive input of visual features from a layer of the spatio-temporal residual network and input of audio features from a layer of the autoencoder, and to provide an output to the decoder of the autoencoder. The decoder is configured to output a mask configured based upon the fusion of audio features and visual features by the squeeze-excitation fusion block, and the instructions are executable to apply the mask to the audio spectrogram to generate an enhanced magnitude spectrogram, and to reconstruct an enhanced waveform from the enhanced magnitude spectrogram.

Type: Grant

Filed: February 5, 2020

Date of Patent: February 8, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Kazuhito Koishida, Michael Iuzzolino
MULTI-USER INTELLIGENT ASSISTANCE

Publication number: 20220012470

Abstract: An intelligent assistant records speech spoken by a first user and determines a self-selection score for the first user. The intelligent assistant sends the self-selection score to another intelligent assistant, and receives a remote-selection score for the first user from the other intelligent assistant. The intelligent assistant compares the self-selection score to the remote-selection score. If the self-selection score is greater than the remote-selection score, the intelligent assistant responds to the first user and blocks subsequent responses to all other users until a disengagement metric of the first user exceeds a blocking threshold. If the self-selection score is less than the remote-selection score, the intelligent assistant does not respond to the first user.

Type: Application

Filed: September 27, 2021

Publication date: January 13, 2022

Applicant: Microsoft Technology Licensing, LLC

Inventors: Kazuhito KOISHIDA, Alexander A. POPOV, Uros BATRICEVIC, Steven Nabil BATHICHE
Multi-user intelligent assistance

Patent number: 11194998

Abstract: An intelligent assistant records speech spoken by a first user and determines a self-selection score for the first user. The intelligent assistant sends the self-selection score to another intelligent assistant, and receives a remote-selection score for the first user from the other intelligent assistant. The intelligent assistant compares the self-selection score to the remote-selection score. If the self-selection score is greater than the remote-selection score, the intelligent assistant responds to the first user and blocks subsequent responses to all other users until a disengagement metric of the first user exceeds a blocking threshold. If the self-selection score is less than the remote-selection score, the intelligent assistant does not respond to the first user.

Type: Grant

Filed: July 24, 2017

Date of Patent: December 7, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Kazuhito Koishida, Alexander A Popov, Uros Batricevic, Steven Nabil Bathiche
AUDIO-VISUAL SPEECH ENHANCEMENT

Publication number: 20210134312

Abstract: Example speech enhancement systems include a spatio-temporal residual network configured to receive video data containing a target speaker and extract visual features from the video data, an autoencoder configured to receive input of an audio spectrogram and extract audio features from the audio spectrogram, and a squeeze-excitation fusion block configured to receive input of visual features from a layer of the spatio-temporal residual network and input of audio features from a layer of the autoencoder, and to provide an output to the decoder of the autoencoder. The decoder is configured to output a mask configured based upon the fusion of audio features and visual features by the squeeze-excitation fusion block, and the instructions are executable to apply the mask to the audio spectrogram to generate an enhanced magnitude spectrogram, and to reconstruct an enhanced waveform from the enhanced magnitude spectrogram.

Type: Application

Filed: February 5, 2020

Publication date: May 6, 2021

Applicant: Microsoft Technology Licensing, LLC

Inventors: Kazuhito KOISHIDA, Michael IUZZOLINO
Location-based audio messaging

Patent number: 10721594

Abstract: Mobile devices provide a variety of techniques for presenting messages from sources to a user. However, when the message pertains to the presence of the user at a location, the available communications techniques may exhibit deficiencies, e.g., reliance on the memory of the source and/or user of the existence and content of a message between its initiation and the user's visit to the location, or reliance on the communication accessibility of the user, the device, and/or the source during the user's location visit. Presented herein are techniques for enabling a mobile device, at a first time, to receive a request to present an audio message during the presence of the user at a location; and, at a second time, detecting the presence of the user at the location, and presenting the audio message to the user, optionally without awaiting a request from the user to present the message.

Type: Grant

Filed: June 26, 2014

Date of Patent: July 21, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Raja Bose, Hiroshi Horii, Jonathan Lester, Ruchita Bhargava, Kazuhito Koishida, Michelle L. Holtmann, Christina Chen
Continuous motion controls operable using neurological data

Patent number: 10564713

Abstract: Computer systems, methods, and storage media for generating a continuous motion control using neurological data and for associating the continuous motion control with a continuous user interface control to enable analog control of the user interface control. The user interface control is modulated through a user's physical movements within a continuous range of motion associated with the continuous motion control. The continuous motion control enables fine-tuned and continuous control of the corresponding user interface control as opposed to control limited to a small number of discrete settings.

Type: Grant

Filed: January 9, 2019

Date of Patent: February 18, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Cem Keskin, Khuram Shahid, Bill Chau, Jaeyoun Kim, Kazuhito Koishida
CONTINUOUS MOTION CONTROLS OPERABLE USING NEUROLOGICAL DATA

Publication number: 20190212810

Abstract: Computer systems, methods, and storage media for generating a continuous motion control using neurological data and for associating the continuous motion control with a continuous user interface control to enable analog control of the user interface control. The user interface control is modulated through a user's physical movements within a continuous range of motion associated with the continuous motion control. The continuous motion control enables fine-tuned and continuous control of the corresponding user interface control as opposed to control limited to a small number of discrete settings.

Type: Application

Filed: January 9, 2019

Publication date: July 11, 2019

Inventors: Cem Keskin, Khuram Shahid, Bill Chau, Jaeyoun Kim, Kazuhito Koishida
Continuous motion controls operable using neurological data

Patent number: 10203751

Abstract: Computer systems, methods, and storage media for generating a continuous motion control using neurological data and for associating the continuous motion control with a continuous user interface control to enable analog control of the user interface control. The user interface control is modulated through a user's physical movements within a continuous range of motion associated with the continuous motion control. The continuous motion control enables fine-tuned and continuous control of the corresponding user interface control as opposed to control limited to a small number of discrete settings.

Type: Grant

Filed: May 11, 2016

Date of Patent: February 12, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Cem Keskin, Khuram Shahid, Bill Chau, Jaeyoun Kim, Kazuhito Koishida
SPEECH PARSING WITH INTELLIGENT ASSISTANT

Publication number: 20180293221

Abstract: A method to execute computer-actionable directives conveyed in human speech comprises: receiving audio data recording speech from one or more speakers; converting the audio data into a linguistic representation of the recorded speech; detecting a target corresponding to the linguistic representation; committing to the data structure language data associated with the detected target and based on the linguistic representation; parsing the data structure to identify one or more of the computer-actionable directives; and submitting the one or more of the computer-actionable directives to the computer for processing.

Type: Application

Filed: June 11, 2018

Publication date: October 11, 2018

Applicant: Microsoft Technology Licensing, LLC

Inventors: Erich-Soren FINKELSTEIN, Han Yee Mimi FUNG, Aleksandar UZELAC, Oz SOLOMON, Keith Coleman HEROLD, Vivek PRADEEP, Zongyi LIU, Kazuhito KOISHIDA, Haithem ALBADAWI, Steven Nabil BATHICHE, Christopher Lance NUESMEYER, Michelle Lynn HOLTMANN, Christopher Brian QUIRK, Pablo Luis SALA
DETERMINING SPEAKER CHANGES IN AUDIO INPUT

Publication number: 20180233140

Abstract: Intelligent assistant systems, methods and computing devices are disclosed for identifying a speaker change. A method comprises receiving audio input comprising a speech fragment. A first voice model is trained with a first sub-fragment from the speech fragment. A second voice model is trained with a second sub-fragment from the speech fragment. The first sub-fragment is analyzed with the second voice model to yield a first confidence value. The second sub-fragment is analyzed with the first voice model to yield a second confidence value. Based at least on the first and second confidence values, the method determines if a speaker of the first sub-fragment is the speaker of the second sub-fragment.

Type: Application

Filed: July 11, 2017

Publication date: August 16, 2018

Applicant: Microsoft Technology Licensing, LLC

Inventors: Kazuhito KOISHIDA, Uros BATRICEVIC
MULTI-USER INTELLIGENT ASSISTANCE

Publication number: 20180233142

Abstract: An intelligent assistant records speech spoken by a first user and determines a self-selection score for the first user. The intelligent assistant sends the self-selection score to another intelligent assistant, and receives a remote-selection score for the first user from the other intelligent assistant. The intelligent assistant compares the self-selection score to the remote-selection score. If the self-selection score is greater than the remote-selection score, the intelligent assistant responds to the first user and blocks subsequent responses to all other users until a disengagement metric of the first user exceeds a blocking threshold. If the self-selection score is less than the remote-selection score, the intelligent assistant does not respond to the first user.

Type: Application

Filed: July 24, 2017

Publication date: August 16, 2018

Applicant: Microsoft Technology Licensing, LLC

Inventors: Kazuhito KOISHIDA, Alexander A. POPOV, Uros BATRICEVIC, Steven Nabil BATHICHE
Changing an application state using neurological data

Patent number: 9864431

Abstract: Computer systems, methods, and storage media for changing the state of an application by detecting neurological user intent data associated with a particular operation of a particular application state, and changing the application state so as to enable execution of the particular operation as intended by the user. The application state is automatically changed to align with the intended operation, as determined by received neurological user intent data, so that the intended operation is performed. Some embodiments relate to a computer system creating or updating a state machine, through a training process, to change the state of an application according to detected neurological data.

Type: Grant

Filed: May 11, 2016

Date of Patent: January 9, 2018

Assignee: Microsoft Technology Licensing, LLC

Inventors: Cem Keskin, David Kim, Bill Chau, Jaeyoun Kim, Kazuhito Koishida, Khuram Shahid
CONTINUOUS MOTION CONTROLS OPERABLE USING NEUROLOGICAL DATA

Publication number: 20170329392

Abstract: Computer systems, methods, and storage media for generating a continuous motion control using neurological data and for associating the continuous motion control with a continuous user interface control to enable analog control of the user interface control. The user interface control is modulated through a user's physical movements within a continuous range of motion associated with the continuous motion control. The continuous motion control enables fine-tuned and continuous control of the corresponding user interface control as opposed to control limited to a small number of discrete settings.

Type: Application

Filed: May 11, 2016

Publication date: November 16, 2017

Inventors: Cem Keskin, Khuram Shahid, Bill Chau, Jaeyoun Kim, Kazuhito Koishida
CHANGING AN APPLICATION STATE USING NEUROLOGICAL DATA

Publication number: 20170329404

Abstract: Computer systems, methods, and storage media for changing the state of an application by detecting neurological user intent data associated with a particular operation of a particular application state, and changing the application state so as to enable execution of the particular operation as intended by the user. The application state is automatically changed to align with the intended operation, as determined by received neurological user intent data, so that the intended operation is performed. Some embodiments relate to a computer system creating or updating a state machine, through a training process, to change the state of an application according to detected neurological data.

Type: Application

Filed: May 11, 2016

Publication date: November 16, 2017

Inventors: Cem Keskin, David Kim, Bill Chau, Jaeyoun Kim, Kazuhito Koishida, Khuram Shahid
Sound source localization using phase spectrum

Patent number: 9817100

Abstract: An array of microphones placed on a mobile robot provides multiple channels of audio signals. A received set of audio signals is called an audio segment, which is divided into multiple frames. A phase analysis is performed on a frame of the signals from each pair of microphones. If both microphones are in an active state during the frame, a candidate angle is generated for each such pair of microphones. The result is a list of candidate angles for the frame. This list is processed to select a final candidate angle for the frame. The list of candidate angles is tracked over time to assist in the process of selecting the final candidate angle for an audio segment.

Type: Grant

Filed: August 19, 2016

Date of Patent: November 14, 2017

Assignee: Microsoft Technology Licensing, LLC

Inventors: Shankar Regunathan, Kazuhito Koishida, Harshavardhana Narayana Kikkeri
Modifying the Modality of a Computing Device Based Upon a User's Brain Activity

Publication number: 20170323220

Abstract: Technologies are described herein for modifying the modality of a computing device based upon a user's brain activity. A machine learning classifier is trained using data that identifies a modality for operating a computing device and data identifying brain activity of a user of the computing device. Once trained, the machine learning classifier can select a mode of operation for the computing device based upon a user's current brain activity and, potentially, other biological data. The computing device can then be operated in accordance with the selected modality. An application programming interface can also expose an interface through which an operating system and application programs executing on the computing device can obtain data identifying the modality selected by the machine learning classifier. Through the use of this data, the operating system and application programs can modify their mode of operation to be most suitable for the user's current mental state.

Type: Application

Filed: May 9, 2016

Publication date: November 9, 2017

Inventors: John C. Gordon, Kazuhito Koishida
Bitstream syntax for multi-process audio decoding

Patent number: 9741354

Abstract: An audio decoder provides a combination of decoding components including components implementing base band decoding, spectral peak decoding, frequency extension decoding and channel extension decoding techniques. The audio decoder decodes a compressed bitstream structured by a bitstream syntax scheme to permit the various decoding components to extract the appropriate parameters for their respective decoding technique.

Type: Grant

Filed: April 29, 2016

Date of Patent: August 22, 2017

Assignee: Microsoft Technology Licensing, LLC

Inventors: Kazuhito Koishida, Sanjeev Mehrotra, Chao He, Wei-Ge Chen
SOUND SOURCE LOCALIZATION USING PHASE SPECTRUM

Publication number: 20170052245

Abstract: An array of microphones placed on a mobile robot provides multiple channels of audio signals. A received set of audio signals is called an audio segment, which is divided into multiple frames. A phase analysis is performed on a frame of the signals from each pair of microphones. If both microphones are in an active state during the frame, a candidate angle is generated for each such pair of microphones. The result is a list of candidate angles for the frame. This list is processed to select a final candidate angle for the frame. The list of candidate angles is tracked over time to assist in the process of selecting the final candidate angle for an audio segment.

Type: Application

Filed: August 19, 2016

Publication date: February 23, 2017

Inventors: Shankar Regunathan, Kazuhito Koishida, Harshavardhana Narayana Kikkeri
Sound source localization using phase spectrum

Patent number: 9435873

Abstract: An array of microphones placed on a mobile robot provides multiple channels of audio signals. A received set of audio signals is called an audio segment, which is divided into multiple frames. A phase analysis is performed on a frame of the signals from each pair of microphones. If both microphones are in an active state during the frame, a candidate angle is generated for each such pair of microphones. The result is a list of candidate angles for the frame. This list is processed to select a final candidate angle for the frame. The list of candidate angles is tracked over time to assist in the process of selecting the final candidate angle for an audio segment.

Type: Grant

Filed: July 14, 2011

Date of Patent: September 6, 2016

Assignee: Microsoft Technology Licensing, LLC

Inventors: Shankar Regunathan, Kazuhito Koishida, Harshavardhana Narayana Kikkeri
BITSTREAM SYNTAX FOR MULTI-PROCESS AUDIO DECODING

Publication number: 20160247515

Abstract: An audio decoder provides a combination of decoding components including components implementing base band decoding, spectral peak decoding, frequency extension decoding and channel extension decoding techniques. The audio decoder decodes a compressed bitstream structured by a bitstream syntax scheme to permit the various decoding components to extract the appropriate parameters for their respective decoding technique.

Type: Application

Filed: April 29, 2016

Publication date: August 25, 2016

Applicant: Microsoft Technology Licensing, LLC

Inventors: Kazuhito Koishida, Sanjeev Mehrotra, Chao He, Wei-Ge Chen

1 2 3 4 next