Patents by Inventor Kazuhito Koishida

Kazuhito Koishida has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11244696
    Abstract: Example speech enhancement systems include a spatio-temporal residual network configured to receive video data containing a target speaker and extract visual features from the video data, an autoencoder configured to receive input of an audio spectrogram and extract audio features from the audio spectrogram, and a squeeze-excitation fusion block configured to receive input of visual features from a layer of the spatio-temporal residual network and input of audio features from a layer of the autoencoder, and to provide an output to the decoder of the autoencoder. The decoder is configured to output a mask configured based upon the fusion of audio features and visual features by the squeeze-excitation fusion block, and the instructions are executable to apply the mask to the audio spectrogram to generate an enhanced magnitude spectrogram, and to reconstruct an enhanced waveform from the enhanced magnitude spectrogram.
    Type: Grant
    Filed: February 5, 2020
    Date of Patent: February 8, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kazuhito Koishida, Michael Iuzzolino
  • Publication number: 20220012470
    Abstract: An intelligent assistant records speech spoken by a first user and determines a self-selection score for the first user. The intelligent assistant sends the self-selection score to another intelligent assistant, and receives a remote-selection score for the first user from the other intelligent assistant. The intelligent assistant compares the self-selection score to the remote-selection score. If the self-selection score is greater than the remote-selection score, the intelligent assistant responds to the first user and blocks subsequent responses to all other users until a disengagement metric of the first user exceeds a blocking threshold. If the self-selection score is less than the remote-selection score, the intelligent assistant does not respond to the first user.
    Type: Application
    Filed: September 27, 2021
    Publication date: January 13, 2022
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Kazuhito KOISHIDA, Alexander A. POPOV, Uros BATRICEVIC, Steven Nabil BATHICHE
  • Patent number: 11194998
    Abstract: An intelligent assistant records speech spoken by a first user and determines a self-selection score for the first user. The intelligent assistant sends the self-selection score to another intelligent assistant, and receives a remote-selection score for the first user from the other intelligent assistant. The intelligent assistant compares the self-selection score to the remote-selection score. If the self-selection score is greater than the remote-selection score, the intelligent assistant responds to the first user and blocks subsequent responses to all other users until a disengagement metric of the first user exceeds a blocking threshold. If the self-selection score is less than the remote-selection score, the intelligent assistant does not respond to the first user.
    Type: Grant
    Filed: July 24, 2017
    Date of Patent: December 7, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kazuhito Koishida, Alexander A Popov, Uros Batricevic, Steven Nabil Bathiche
  • Publication number: 20210134312
    Abstract: Example speech enhancement systems include a spatio-temporal residual network configured to receive video data containing a target speaker and extract visual features from the video data, an autoencoder configured to receive input of an audio spectrogram and extract audio features from the audio spectrogram, and a squeeze-excitation fusion block configured to receive input of visual features from a layer of the spatio-temporal residual network and input of audio features from a layer of the autoencoder, and to provide an output to the decoder of the autoencoder. The decoder is configured to output a mask configured based upon the fusion of audio features and visual features by the squeeze-excitation fusion block, and the instructions are executable to apply the mask to the audio spectrogram to generate an enhanced magnitude spectrogram, and to reconstruct an enhanced waveform from the enhanced magnitude spectrogram.
    Type: Application
    Filed: February 5, 2020
    Publication date: May 6, 2021
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Kazuhito KOISHIDA, Michael IUZZOLINO
  • Patent number: 10721594
    Abstract: Mobile devices provide a variety of techniques for presenting messages from sources to a user. However, when the message pertains to the presence of the user at a location, the available communications techniques may exhibit deficiencies, e.g., reliance on the memory of the source and/or user of the existence and content of a message between its initiation and the user's visit to the location, or reliance on the communication accessibility of the user, the device, and/or the source during the user's location visit. Presented herein are techniques for enabling a mobile device, at a first time, to receive a request to present an audio message during the presence of the user at a location; and, at a second time, detecting the presence of the user at the location, and presenting the audio message to the user, optionally without awaiting a request from the user to present the message.
    Type: Grant
    Filed: June 26, 2014
    Date of Patent: July 21, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Raja Bose, Hiroshi Horii, Jonathan Lester, Ruchita Bhargava, Kazuhito Koishida, Michelle L. Holtmann, Christina Chen
  • Patent number: 10564713
    Abstract: Computer systems, methods, and storage media for generating a continuous motion control using neurological data and for associating the continuous motion control with a continuous user interface control to enable analog control of the user interface control. The user interface control is modulated through a user's physical movements within a continuous range of motion associated with the continuous motion control. The continuous motion control enables fine-tuned and continuous control of the corresponding user interface control as opposed to control limited to a small number of discrete settings.
    Type: Grant
    Filed: January 9, 2019
    Date of Patent: February 18, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Cem Keskin, Khuram Shahid, Bill Chau, Jaeyoun Kim, Kazuhito Koishida
  • Publication number: 20190212810
    Abstract: Computer systems, methods, and storage media for generating a continuous motion control using neurological data and for associating the continuous motion control with a continuous user interface control to enable analog control of the user interface control. The user interface control is modulated through a user's physical movements within a continuous range of motion associated with the continuous motion control. The continuous motion control enables fine-tuned and continuous control of the corresponding user interface control as opposed to control limited to a small number of discrete settings.
    Type: Application
    Filed: January 9, 2019
    Publication date: July 11, 2019
    Inventors: Cem Keskin, Khuram Shahid, Bill Chau, Jaeyoun Kim, Kazuhito Koishida
  • Patent number: 10203751
    Abstract: Computer systems, methods, and storage media for generating a continuous motion control using neurological data and for associating the continuous motion control with a continuous user interface control to enable analog control of the user interface control. The user interface control is modulated through a user's physical movements within a continuous range of motion associated with the continuous motion control. The continuous motion control enables fine-tuned and continuous control of the corresponding user interface control as opposed to control limited to a small number of discrete settings.
    Type: Grant
    Filed: May 11, 2016
    Date of Patent: February 12, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Cem Keskin, Khuram Shahid, Bill Chau, Jaeyoun Kim, Kazuhito Koishida
  • Publication number: 20180293221
    Abstract: A method to execute computer-actionable directives conveyed in human speech comprises: receiving audio data recording speech from one or more speakers; converting the audio data into a linguistic representation of the recorded speech; detecting a target corresponding to the linguistic representation; committing to the data structure language data associated with the detected target and based on the linguistic representation; parsing the data structure to identify one or more of the computer-actionable directives; and submitting the one or more of the computer-actionable directives to the computer for processing.
    Type: Application
    Filed: June 11, 2018
    Publication date: October 11, 2018
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Erich-Soren FINKELSTEIN, Han Yee Mimi FUNG, Aleksandar UZELAC, Oz SOLOMON, Keith Coleman HEROLD, Vivek PRADEEP, Zongyi LIU, Kazuhito KOISHIDA, Haithem ALBADAWI, Steven Nabil BATHICHE, Christopher Lance NUESMEYER, Michelle Lynn HOLTMANN, Christopher Brian QUIRK, Pablo Luis SALA
  • Publication number: 20180233140
    Abstract: Intelligent assistant systems, methods and computing devices are disclosed for identifying a speaker change. A method comprises receiving audio input comprising a speech fragment. A first voice model is trained with a first sub-fragment from the speech fragment. A second voice model is trained with a second sub-fragment from the speech fragment. The first sub-fragment is analyzed with the second voice model to yield a first confidence value. The second sub-fragment is analyzed with the first voice model to yield a second confidence value. Based at least on the first and second confidence values, the method determines if a speaker of the first sub-fragment is the speaker of the second sub-fragment.
    Type: Application
    Filed: July 11, 2017
    Publication date: August 16, 2018
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Kazuhito KOISHIDA, Uros BATRICEVIC
  • Publication number: 20180233142
    Abstract: An intelligent assistant records speech spoken by a first user and determines a self-selection score for the first user. The intelligent assistant sends the self-selection score to another intelligent assistant, and receives a remote-selection score for the first user from the other intelligent assistant. The intelligent assistant compares the self-selection score to the remote-selection score. If the self-selection score is greater than the remote-selection score, the intelligent assistant responds to the first user and blocks subsequent responses to all other users until a disengagement metric of the first user exceeds a blocking threshold. If the self-selection score is less than the remote-selection score, the intelligent assistant does not respond to the first user.
    Type: Application
    Filed: July 24, 2017
    Publication date: August 16, 2018
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Kazuhito KOISHIDA, Alexander A. POPOV, Uros BATRICEVIC, Steven Nabil BATHICHE
  • Patent number: 9864431
    Abstract: Computer systems, methods, and storage media for changing the state of an application by detecting neurological user intent data associated with a particular operation of a particular application state, and changing the application state so as to enable execution of the particular operation as intended by the user. The application state is automatically changed to align with the intended operation, as determined by received neurological user intent data, so that the intended operation is performed. Some embodiments relate to a computer system creating or updating a state machine, through a training process, to change the state of an application according to detected neurological data.
    Type: Grant
    Filed: May 11, 2016
    Date of Patent: January 9, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Cem Keskin, David Kim, Bill Chau, Jaeyoun Kim, Kazuhito Koishida, Khuram Shahid
  • Publication number: 20170329392
    Abstract: Computer systems, methods, and storage media for generating a continuous motion control using neurological data and for associating the continuous motion control with a continuous user interface control to enable analog control of the user interface control. The user interface control is modulated through a user's physical movements within a continuous range of motion associated with the continuous motion control. The continuous motion control enables fine-tuned and continuous control of the corresponding user interface control as opposed to control limited to a small number of discrete settings.
    Type: Application
    Filed: May 11, 2016
    Publication date: November 16, 2017
    Inventors: Cem Keskin, Khuram Shahid, Bill Chau, Jaeyoun Kim, Kazuhito Koishida
  • Publication number: 20170329404
    Abstract: Computer systems, methods, and storage media for changing the state of an application by detecting neurological user intent data associated with a particular operation of a particular application state, and changing the application state so as to enable execution of the particular operation as intended by the user. The application state is automatically changed to align with the intended operation, as determined by received neurological user intent data, so that the intended operation is performed. Some embodiments relate to a computer system creating or updating a state machine, through a training process, to change the state of an application according to detected neurological data.
    Type: Application
    Filed: May 11, 2016
    Publication date: November 16, 2017
    Inventors: Cem Keskin, David Kim, Bill Chau, Jaeyoun Kim, Kazuhito Koishida, Khuram Shahid
  • Patent number: 9817100
    Abstract: An array of microphones placed on a mobile robot provides multiple channels of audio signals. A received set of audio signals is called an audio segment, which is divided into multiple frames. A phase analysis is performed on a frame of the signals from each pair of microphones. If both microphones are in an active state during the frame, a candidate angle is generated for each such pair of microphones. The result is a list of candidate angles for the frame. This list is processed to select a final candidate angle for the frame. The list of candidate angles is tracked over time to assist in the process of selecting the final candidate angle for an audio segment.
    Type: Grant
    Filed: August 19, 2016
    Date of Patent: November 14, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Shankar Regunathan, Kazuhito Koishida, Harshavardhana Narayana Kikkeri
  • Publication number: 20170323220
    Abstract: Technologies are described herein for modifying the modality of a computing device based upon a user's brain activity. A machine learning classifier is trained using data that identifies a modality for operating a computing device and data identifying brain activity of a user of the computing device. Once trained, the machine learning classifier can select a mode of operation for the computing device based upon a user's current brain activity and, potentially, other biological data. The computing device can then be operated in accordance with the selected modality. An application programming interface can also expose an interface through which an operating system and application programs executing on the computing device can obtain data identifying the modality selected by the machine learning classifier. Through the use of this data, the operating system and application programs can modify their mode of operation to be most suitable for the user's current mental state.
    Type: Application
    Filed: May 9, 2016
    Publication date: November 9, 2017
    Inventors: John C. Gordon, Kazuhito Koishida
  • Patent number: 9741354
    Abstract: An audio decoder provides a combination of decoding components including components implementing base band decoding, spectral peak decoding, frequency extension decoding and channel extension decoding techniques. The audio decoder decodes a compressed bitstream structured by a bitstream syntax scheme to permit the various decoding components to extract the appropriate parameters for their respective decoding technique.
    Type: Grant
    Filed: April 29, 2016
    Date of Patent: August 22, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kazuhito Koishida, Sanjeev Mehrotra, Chao He, Wei-Ge Chen
  • Publication number: 20170052245
    Abstract: An array of microphones placed on a mobile robot provides multiple channels of audio signals. A received set of audio signals is called an audio segment, which is divided into multiple frames. A phase analysis is performed on a frame of the signals from each pair of microphones. If both microphones are in an active state during the frame, a candidate angle is generated for each such pair of microphones. The result is a list of candidate angles for the frame. This list is processed to select a final candidate angle for the frame. The list of candidate angles is tracked over time to assist in the process of selecting the final candidate angle for an audio segment.
    Type: Application
    Filed: August 19, 2016
    Publication date: February 23, 2017
    Inventors: Shankar Regunathan, Kazuhito Koishida, Harshavardhana Narayana Kikkeri
  • Patent number: 9435873
    Abstract: An array of microphones placed on a mobile robot provides multiple channels of audio signals. A received set of audio signals is called an audio segment, which is divided into multiple frames. A phase analysis is performed on a frame of the signals from each pair of microphones. If both microphones are in an active state during the frame, a candidate angle is generated for each such pair of microphones. The result is a list of candidate angles for the frame. This list is processed to select a final candidate angle for the frame. The list of candidate angles is tracked over time to assist in the process of selecting the final candidate angle for an audio segment.
    Type: Grant
    Filed: July 14, 2011
    Date of Patent: September 6, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Shankar Regunathan, Kazuhito Koishida, Harshavardhana Narayana Kikkeri
  • Publication number: 20160247515
    Abstract: An audio decoder provides a combination of decoding components including components implementing base band decoding, spectral peak decoding, frequency extension decoding and channel extension decoding techniques. The audio decoder decodes a compressed bitstream structured by a bitstream syntax scheme to permit the various decoding components to extract the appropriate parameters for their respective decoding technique.
    Type: Application
    Filed: April 29, 2016
    Publication date: August 25, 2016
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Kazuhito Koishida, Sanjeev Mehrotra, Chao He, Wei-Ge Chen