Patents by Inventor Corey B. Shelton

Corey B. Shelton has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11942070
    Abstract: A method, computer system, and a computer program product for speech synthesis is provided. The present invention may include generating one or more final voiceprints. The present invention may include generating one or more voice clones based on the one or more final voiceprints. The present invention may include classifying the one or more voice clones into a grouping using a language model, wherein the language model is trained using manually classified uncloned voice samples. The present invention may include identifying a cluster within the grouping, wherein the cluster is identified by determining a difference between corresponding vectors of the one or more voice clones below a similarity threshold. The present invention may include generating a new archetypal voice by blending the one or more voice clones of the cluster where the difference between the corresponding vectors is below the similarity threshold.
    Type: Grant
    Filed: January 29, 2021
    Date of Patent: March 26, 2024
    Assignee: International Business Machines Corporation
    Inventors: Aaron K. Baughman, Gray Franklin Cannon, Sara Perelman, Gary William Reiss, Corey B. Shelton
  • Patent number: 11675822
    Abstract: A relevant factoid(s) related to multimedia data is generated by splitting a multimedia item into a media component and a text component. Text information is retrieved relevant to text data from the text component using a query. The text information is summarized into a factoid. Source data is checked for an image based on the multimedia component. A current state image is generated from the image. The factoid and the current state image are combined into a combined factoid, and the combined factoid is stored for sending to a media outlet for presentation on a media format.
    Type: Grant
    Filed: July 27, 2020
    Date of Patent: June 13, 2023
    Assignee: International Business Machines Corporation
    Inventors: Aaron K. Baughman, Stephen C. Hammer, Corey B. Shelton, Nicholas Michael Wilkin, Sara Perelman
  • Patent number: 11640516
    Abstract: According to a first aspect of the present invention, there is provided a computer implemented method, a computer system and a computer program product, including training a set of exploitation models, training a set of exploration models, generating a combined exploitation and exploration heat map, and inputting the combined exploitation and exploration heat map into a convoluted neural network.
    Type: Grant
    Filed: June 3, 2020
    Date of Patent: May 2, 2023
    Assignee: International Business Machines Corporation
    Inventors: Aaron K. Baughman, Gray Franklin Cannon, Gary William Reiss, Corey B. Shelton, Stephen C. Hammer
  • Patent number: 11538464
    Abstract: The disclosure includes using dilation of speech content from a separated audio input for speech recognition. An audio input from a speaker and predicted changes for the audio input based on an external noise are received at a CNN (Convolutional Neural Network). In the CNN, diarization is applied to the audio input to predict how a dilation of speech content from the speaker changes the audio input to generate a CNN output. A resulting dilation is determined from the CNN output. A word error rate is determined for the dilated CNN output to determine an accuracy for speech to text outputs. An adjustment parameter is set to change a range of the dilation based on the word error rate, and the resulting dilation of the CNN output is adjusted based on the adjustment parameter to reduce the word error rate.
    Type: Grant
    Filed: September 9, 2020
    Date of Patent: December 27, 2022
    Assignee: International Business Machines Corporation .
    Inventors: Aaron K. Baughman, Corey B. Shelton, Stephen C. Hammer, Shikhar Kwatra
  • Patent number: 11495216
    Abstract: The disclosure includes using dilation of speech content from an interlaced audio input for speech recognition. A learning model is initiated to determine dilation parameters for each of a plurality of audible sounds of speech content from a plurality of speakers received at a computer as an audio input. As part of the learning model, a change of each of a plurality of independent sounds is determined in response to an audio stimulus, the independent sounds being derived from the audio input. The disclosure applies the dilation parameters, respectively, based on the change of each of the independent sounds. A voice print is constructed for each of the speakers based on the independent sounds and the dilation parameters, respectively. Speech content is attributed to each of the plurality of speakers based at least in part on the voice print, respectively, and the independent sounds.
    Type: Grant
    Filed: September 9, 2020
    Date of Patent: November 8, 2022
    Assignee: International Business Machines Corporation
    Inventors: Aaron K. Baughman, Corey B. Shelton, Stephen C. Hammer, Shikhar Kwatra
  • Publication number: 20220246130
    Abstract: A method, computer system, and a computer program product for speech synthesis is provided. The present invention may include generating one or more final voiceprints. The present invention may include generating one or more voice clones based on the one or more final voiceprints. The present invention may include classifying the one or more voice clones into a grouping using a language model, wherein the language model is trained using manually classified uncloned voice samples. The present invention may include identifying a cluster within the grouping, wherein the cluster is identified by determining a difference between corresponding vectors of the one or more voice clones below a similarity threshold. The present invention may include generating a new archetypal voice by blending the one or more voice clones of the cluster where the difference between the corresponding vectors is below the similarity threshold.
    Type: Application
    Filed: January 29, 2021
    Publication date: August 4, 2022
    Inventors: Aaron K. Baughman, Gray Franklin Cannon, Sara Perelman, Gary William Reiss, Corey B. Shelton
  • Publication number: 20220076665
    Abstract: The disclosure includes using dilation of speech content from a separated audio input for speech recognition. An audio input from a speaker and predicted changes for the audio input based on an external noise are received at a CNN (Convolutional Neural Network). In the CNN, diarization is applied to the audio input to predict how a dilation of speech content from the speaker changes the audio input to generate a CNN output. A resulting dilation is determined from the CNN output. A word error rate is determined for the dilated CNN output to determine an accuracy for speech to text outputs. An adjustment parameter is set to change a range of the dilation based on the word error rate, and the resulting dilation of the CNN output is adjusted based on the adjustment parameter to reduce the word error rate.
    Type: Application
    Filed: September 9, 2020
    Publication date: March 10, 2022
    Inventors: Aaron K. Baughman, Corey B. Shelton, Stephen C. Hammer, Shikhar Kwatra
  • Publication number: 20220076664
    Abstract: The disclosure includes using dilation of speech content from an interlaced audio input for speech recognition. A learning model is initiated to determine dilation parameters for each of a plurality of audible sounds of speech content from a plurality of speakers received at a computer as an audio input. As part of the learning model, a change of each of a plurality of independent sounds is determined in response to an audio stimulus, the independent sounds being derived from the audio input. The disclosure applies the dilation parameters, respectively, based on the change of each of the independent sounds. A voice print is constructed for each of the speakers based on the independent sounds and the dilation parameters, respectively. Speech content is attributed to each of the plurality of speakers based at least in part on the voice print, respectively, and the independent sounds.
    Type: Application
    Filed: September 9, 2020
    Publication date: March 10, 2022
    Inventors: Aaron K. Baughman, Corey B. Shelton, Stephen C. Hammer, Shikhar Kwatra
  • Publication number: 20220027550
    Abstract: A relevant factoid(s) related to multimedia data is generated by splitting a multimedia item into a media component and a text component. Text information is retrieved relevant to text data from the text component using a query. The text information is summarized into a factoid. Source data is checked for an image based on the multimedia component. A current state image is generated from the image. The factoid and the current state image are combined into a combined factoid, and the combined factoid is stored for sending to a media outlet for presentation on a media format.
    Type: Application
    Filed: July 27, 2020
    Publication date: January 27, 2022
    Inventors: Aaron K. Baughman, Stephen C. Hammer, Corey B. Shelton, Nicholas Michael Wilkin, Sara Perelman
  • Publication number: 20210383204
    Abstract: According to a first aspect of the present invention, there is provided a computer implemented method, a computer system and a computer program product, including training a set of exploitation models, training a set of exploration models, generating a combined exploitation and exploration heat map, and inputting the combined exploitation and exploration heat map into a convoluted neural network.
    Type: Application
    Filed: June 3, 2020
    Publication date: December 9, 2021
    Inventors: Aaron K. Baughman, Gray Franklin Cannon, Gary William Reiss, Corey B. Shelton, Stephen C. Hammer