Patents by Inventor Samuel Thomas

Samuel Thomas has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12444405
    Abstract: Systems, computer-implemented methods, and computer program products to facilitate fine-grained textual knowledge transfer to improve speech recognition and understanding are provided. According to an embodiment, a system can comprise a processor that executes components stored in memory. The computer executable components comprise deriving component that can derive one or more speech-based embeddings from an utterance via a speech encoder. The computer executable components can comprise a cross-attention component that can align, at a token level, one or more large language model (LLM) based sentence embeddings with the one or more speech-based embeddings. The computer executable components can comprise a loss component that can combine an alignment loss and an automatic speech recognition (ASR) loss.
    Type: Grant
    Filed: May 2, 2023
    Date of Patent: October 14, 2025
    Assignees: International Business Machines Corporation, Ohio State Innovation Foundation
    Inventors: Samuel Thomas, Vishal Sunder, Hong-Kwang Kuo, Brian E. D. Kingsbury, Eric Fosler-Lussier, George Andrei Saon
  • Patent number: 12387717
    Abstract: Features of two or more single speaker utterances are concatenated together and corresponding labels of the two or more single speaker utterances are concatenated together. Single speaker acoustic embeddings for each of the single speaker utterances of the concatenated single speaker utterances are generated using a single speaker teacher encoder network. An enhanced model is trained on the concatenated single speaker utterances using a classification loss LCLASS and a representation similarity loss LREP, the representation similarity loss LREP defined to influence an embedding derived from the concatenated single speaker utterances, the influence being based on the single speaker acoustic embeddings derived from the single speaker teacher encoder network.
    Type: Grant
    Filed: March 31, 2023
    Date of Patent: August 12, 2025
    Assignee: International Business Machines Corporation
    Inventors: Samuel Thomas, Hong-Kwang Kuo, George Andrei Saon, Brian E. D. Kingsbury
  • Publication number: 20240371361
    Abstract: Systems, computer-implemented methods, and computer program products to facilitate fine-grained textual knowledge transfer to improve speech recognition and understanding are provided. According to an embodiment, a system can comprise a processor that executes components stored in memory. The computer executable components comprise deriving component that can derive one or more speech-based embeddings from an utterance via a speech encoder. The computer executable components can comprise a cross-attention component that can align, at a token level, one or more large language model (LLM) based sentence embeddings with the one or more speech-based embeddings. The computer executable components can comprise a loss component that can combine an alignment loss and an automatic speech recognition (ASR) loss.
    Type: Application
    Filed: May 2, 2023
    Publication date: November 7, 2024
    Inventors: Samuel Thomas, Vishal Sunder, Hong-Kwang Kuo, Brian E. D. Kingsbury, Eric Fosler-Lussier, George Andrei Saon
  • Patent number: 12136414
    Abstract: Audio signals representing a current utterance in a conversation and a dialog history including at least information associated with past utterances corresponding to the current utterance in the conversation can be received. The dialog history can be encoded into an embedding. A spoken language understanding neural network model can be trained to perform a spoken language understanding task based on input features including at least speech features associated with the received audio signals and the embedding. An encoder can also be trained to encode a given dialog history into an embedding. The spoken language understanding task can include predicting a dialog action of an utterance. The spoken language understanding task can include predicting a dialog intent or overall topic of the conversation.
    Type: Grant
    Filed: August 18, 2021
    Date of Patent: November 5, 2024
    Assignee: International Business Machines Corporation
    Inventors: Samuel Thomas, Jatin Ganhotra, Hong-Kwang Kuo, Sachindra Joshi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury
  • Patent number: 12119008
    Abstract: Systems, computer-implemented methods, and computer program products to facilitate end to end integration of dialogue history for spoken language understanding are provided. According to an embodiment, a system can comprise a processor that executes components stored in memory. The computer executable components comprise a conversation component that encodes speech-based content of an utterance and text-based content of the utterance into a uniform representation.
    Type: Grant
    Filed: March 18, 2022
    Date of Patent: October 15, 2024
    Assignees: International Business Machines Corporation, The Ohio State University
    Inventors: Samuel Thomas, Vishal Sunder, Hong-Kwang Kuo, Jatin Ganhotra, Brian E. D. Kingsbury, Eric Fosler-Lussier
  • Publication number: 20240331684
    Abstract: Features of two or more single speaker utterances are concatenated together and corresponding labels of the two or more single speaker utterances are concatenated together. Single speaker acoustic embeddings for each of the single speaker utterances of the concatenated single speaker utterances are generated using a single speaker teacher encoder network. An enhanced model is trained on the concatenated single speaker utterances using a classification loss LCLASS and a representation similarity loss LREP, the representation similarity loss LREP defined to influence an embedding derived from the concatenated single speaker utterances, the influence being based on the single speaker acoustic embeddings derived from the single speaker teacher encoder network.
    Type: Application
    Filed: March 31, 2023
    Publication date: October 3, 2024
    Inventors: Samuel Thomas, Hong-Kwang Kuo, George Andrei Saon, Brian E. D. Kingsbury
  • Patent number: 12046236
    Abstract: Training data can be received, which can include pairs of speech and meaning representation associated with the speech as ground truth data. The meaning representation includes at least semantic entities associated with the speech, where the spoken order of the semantic entities is unknown. The semantic entities of the meaning representation in the training data can be reordered into spoken order of the associated speech using an alignment technique. A spoken language understanding machine learning model can be trained using the pairs of speech and meaning representation having the reordered semantic entities. The meaning representation, e.g., semantic entities, in the received training data can be perturbed to create random order sequence variations of the semantic entities associated with speech. Perturbed meaning representation with associated speech can augment the training data.
    Type: Grant
    Filed: August 27, 2021
    Date of Patent: July 23, 2024
    Assignee: International Business Machines Corporation
    Inventors: Hong-Kwang Kuo, Zoltan Tueske, Samuel Thomas, Brian E. D. Kingsbury, George Andrei Saon
  • Patent number: 12023146
    Abstract: Determining lung capacity of includes capturing an audio waveform of the user performing an utterance presented to a user. A video of the user performing the utterance can be captured. The captured audio waveform and the video are analyzed for compliance. Based on the audio waveform, an indicator of respiratory function is determined. The indicator is compared with a reference indicator to determine health of the user. A machine learning model such as neural network can be trained to predict the indicator of the respiratory function based on input features comprising audio spectral and temporal characteristics of utterances. Determining the indicator or respiratory function can include running the trained machine learning model.
    Type: Grant
    Filed: October 8, 2020
    Date of Patent: July 2, 2024
    Assignee: International Business Machines Corporation
    Inventors: Samuel Thomas, Nalini K. Ratha, Jonathan Hudson Connell, II
  • Patent number: 11929062
    Abstract: A method and system of training a spoken language understanding (SLU) model includes receiving natural language training data comprising (i) one or more speech recording, and (ii) a set of semantic entities and/or intents for each corresponding speech recording. For each speech recording, one or more entity labels and corresponding values, and one or more intent labels are extracted from the corresponding semantic entities and/or overall intent. A spoken language understanding (SLU) model is trained based upon the one or more entity labels and corresponding values, and one or more intent labels of the corresponding speech recordings without a need for a transcript of the corresponding speech recording.
    Type: Grant
    Filed: September 15, 2020
    Date of Patent: March 12, 2024
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Hong-Kwang Jeff Kuo, Zoltan Tueske, Samuel Thomas, Yinghui Huang, Brian E. D. Kingsbury, Kartik Audhkhasi
  • Patent number: 11908454
    Abstract: A processor-implemented method trains an automatic speech recognition system using speech data and text data. A computer device receives speech data, and generates a spectrogram based on the speech data. The computing device receives text data associated with an entire corpus of text data, and generates a textogram based upon the text data. The computing device trains an automatic speech recognition system using the spectrogram and the textogram.
    Type: Grant
    Filed: December 1, 2021
    Date of Patent: February 20, 2024
    Assignee: International Business Machines Corporation
    Inventors: Samuel Thomas, Hong-Kwang Kuo, Brian E. D. Kingsbury, George Andrei Saon, Gakuto Kurata
  • Patent number: 11907845
    Abstract: Some embodiments of the present invention are directed to techniques for training teacher neural networks (TNNs) and student neural networks (SNNs). A training data set is received with a lossless set of data and a corresponding lossy set of data. Two branches of a TNN are established, with one branch trained using the lossless data (a lossless branch) and one trained using the lossy data (a lossy branch). Weights for the two branches are tied together. The lossy branch, now isolated from the lossless branch, generates a set of soft targets for initializing an SNN. These generated soft targets benefit from the training of lossless branch through the weights that were tied together between each branch, despite isolating the lossless branch from the lossy branch during soft-target generation.
    Type: Grant
    Filed: August 17, 2020
    Date of Patent: February 20, 2024
    Assignee: International Business Machines Corporation
    Inventors: Takashi Fukuda, Samuel Thomas
  • Patent number: 11900922
    Abstract: Embodiments of the present invention provide computer implemented methods, computer program products and computer systems. For example, embodiments of the present invention can access one or more intents and associated entities from limited amount of speech to text training data in a single language. Embodiments of the present invention can locate speech to text training data in one or more other languages using the accessed one or more intents and associated entities to locate speech to text training data in the one or more other languages different than the single language. Embodiments of the present invention can then train a neural network based on the limited amount of speech to text training data in the single language and the located speech to text training data in the one or more other languages.
    Type: Grant
    Filed: November 10, 2020
    Date of Patent: February 13, 2024
    Assignee: International Business Machines Corporation
    Inventors: Samuel Thomas, Hong-Kwang Kuo, Kartik Audhkhasi, Michael Alan Picheny
  • Publication number: 20230298596
    Abstract: Systems, computer-implemented methods, and computer program products to facilitate end to end integration of dialogue history for spoken language understanding are provided. According to an embodiment, a system can comprise a processor that executes components stored in memory. The computer executable components comprise a conversation component that encodes speech-based content of an utterance and text-based content of the utterance into a uniform representation.
    Type: Application
    Filed: March 18, 2022
    Publication date: September 21, 2023
    Inventors: Samuel Thomas, Vishal Sunder, Hong-Kwang Kuo, Jatin Ganhotra, Brian E. D. Kingsbury, Eric Fosler-Lussier
  • Patent number: 11741355
    Abstract: A student neural network may be trained by a computer-implemented method, including: inputting common input data to each teacher neural network among a plurality of teacher neural networks to obtain a soft label output among a plurality of soft label outputs from each teacher neural network among the plurality of teacher neural networks, and training a student neural network with the input data and the plurality of soft label outputs.
    Type: Grant
    Filed: July 27, 2018
    Date of Patent: August 29, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Takashi Fukuda, Masayuki Suzuki, Osamu Ichikawa, Gakuto Kurata, Samuel Thomas, Bhuvana Ramabhadran
  • Publication number: 20230169954
    Abstract: A processor-implemented method trains an automatic speech recognition system using speech data and text data. A computer device receives speech data, and generates a spectrogram based on the speech data. The computing device receives text data associated with an entire corpus of text data, and generates a textogram based upon the text data. The computing device trains an automatic speech recognition system using the spectrogram and the textogram.
    Type: Application
    Filed: December 1, 2021
    Publication date: June 1, 2023
    Inventors: SAMUEL THOMAS, HONG-KWANG KUO, BRIAN E.D. KINGSBURY, GEORGE ANDREI SAON, KAGUTO KURATA
  • Publication number: 20230153601
    Abstract: A computer-implemented method for training a neural transducer for speech recognition is provided. The method includes initializing the neural transducer having a prediction network and an encoder network and a joint network. The method further includes expanding the prediction network by changing the prediction network to a plurality of prediction-net branches. Each of the prediction-net branches is a prediction network for a respective specific sub-task from among a plurality of specific sub-tasks. The method also includes training, by a hardware processor, an entirety of the neural transducer by using training data sets for all of the plurality of specific sub-tasks. The method additionally includes obtaining a trained neural transducer by fusing the plurality of prediction-net branches.
    Type: Application
    Filed: November 15, 2021
    Publication date: May 18, 2023
    Inventors: Takashi Fukuda, Samuel Thomas
  • Patent number: 11645329
    Abstract: Examples of techniques for constructing, evaluating, and improving a search string for retrieving images are disclosed. In one example implementation according to aspects of the present disclosure, a computer-implemented method includes receiving, by a processing device, a plurality of images as search results returned based at least in part on a search string for an item in the form of a tuple including an item class, an action and an actor. The method further includes determining, by the processing device, whether the search string is effective at indicating a common item use based on image similarity. The method further includes, based at least in part on determining that the search string is ineffective at indicating the item use, generating, by the processing device, an alternative search string.
    Type: Grant
    Filed: December 28, 2017
    Date of Patent: May 9, 2023
    Assignee: International Business Machines Corporation
    Inventors: Anne E. Gattiker, Sujatha Kashyap, Minh Ngoc Binh Nguyen, Samuel Thomas, Kaipeng Li, Thomas Hubregtsen
  • Patent number: 11610108
    Abstract: A student neural network may be trained by a computer-implemented method, including: selecting a teacher neural network among a plurality of teacher neural networks, inputting an input data to the selected teacher neural network to obtain a soft label output generated by the selected teacher neural network, and training a student neural network with at least the input data and the soft label output from the selected teacher neural network.
    Type: Grant
    Filed: July 27, 2018
    Date of Patent: March 21, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Takashi Fukuda, Masayuki Suzuki, Osamu Ichikawa, Gakuto Kurata, Samuel Thomas, Bhuvana Ramabhadran
  • Publication number: 20230081306
    Abstract: Training data can be received, which can include pairs of speech and meaning representation associated with the speech as ground truth data. The meaning representation includes at least semantic entities associated with the speech, where the spoken order of the semantic entities is unknown. The semantic entities of the meaning representation in the training data can be reordered into spoken order of the associated speech using an alignment technique. A spoken language understanding machine learning model can be trained using the pairs of speech and meaning representation having the reordered semantic entities. The meaning representation, e.g., semantic entities, in the received training data can be perturbed to create random order sequence variations of the semantic entities associated with speech. Perturbed meaning representation with associated speech can augment the training data.
    Type: Application
    Filed: August 27, 2021
    Publication date: March 16, 2023
    Inventors: Hong-Kwang Kuo, Zoltan Tueske, Samuel Thomas, Brian E. D. Kingsbury, George Andrei Saon
  • Publication number: 20230056680
    Abstract: Audio signals representing a current utterance in a conversation and a dialog history including at least information associated with past utterances corresponding to the current utterance in the conversation can be received. The dialog history can be encoded into an embedding. A spoken language understanding neural network model can be trained to perform a spoken language understanding task based on input features including at least speech features associated with the received audio signals and the embedding. An encoder can also be trained to encode a given dialog history into an embedding. The spoken language understanding task can include predicting a dialog action of an utterance. The spoken language understanding task can include predicting a dialog intent or overall topic of the conversation.
    Type: Application
    Filed: August 18, 2021
    Publication date: February 23, 2023
    Inventors: Samuel Thomas, Jatin Ganhotra, Hong-Kwang Kuo, Sachindra Joshi, George Andrei Saon, Zoltan Tueske, Brian E. D. Kingsbury