Patents by Inventor Corey B. Shelton

Corey B. Shelton has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Voice cloning transfer for speech synthesis

Patent number: 11942070

Abstract: A method, computer system, and a computer program product for speech synthesis is provided. The present invention may include generating one or more final voiceprints. The present invention may include generating one or more voice clones based on the one or more final voiceprints. The present invention may include classifying the one or more voice clones into a grouping using a language model, wherein the language model is trained using manually classified uncloned voice samples. The present invention may include identifying a cluster within the grouping, wherein the cluster is identified by determining a difference between corresponding vectors of the one or more voice clones below a similarity threshold. The present invention may include generating a new archetypal voice by blending the one or more voice clones of the cluster where the difference between the corresponding vectors is below the similarity threshold.

Type: Grant

Filed: January 29, 2021

Date of Patent: March 26, 2024

Assignee: International Business Machines Corporation

Inventors: Aaron K. Baughman, Gray Franklin Cannon, Sara Perelman, Gary William Reiss, Corey B. Shelton
Computer generated data analysis and learning to derive multimedia factoids

Patent number: 11675822

Abstract: A relevant factoid(s) related to multimedia data is generated by splitting a multimedia item into a media component and a text component. Text information is retrieved relevant to text data from the text component using a query. The text information is summarized into a factoid. Source data is checked for an image based on the multimedia component. A current state image is generated from the image. The factoid and the current state image are combined into a combined factoid, and the combined factoid is stored for sending to a media outlet for presentation on a media format.

Type: Grant

Filed: July 27, 2020

Date of Patent: June 13, 2023

Assignee: International Business Machines Corporation

Inventors: Aaron K. Baughman, Stephen C. Hammer, Corey B. Shelton, Nicholas Michael Wilkin, Sara Perelman
Deep evolved strategies with reinforcement

Patent number: 11640516

Abstract: According to a first aspect of the present invention, there is provided a computer implemented method, a computer system and a computer program product, including training a set of exploitation models, training a set of exploration models, generating a combined exploitation and exploration heat map, and inputting the combined exploitation and exploration heat map into a convoluted neural network.

Type: Grant

Filed: June 3, 2020

Date of Patent: May 2, 2023

Assignee: International Business Machines Corporation

Inventors: Aaron K. Baughman, Gray Franklin Cannon, Gary William Reiss, Corey B. Shelton, Stephen C. Hammer
Speech recognition using data analysis and dilation of speech content from separated audio input

Patent number: 11538464

Abstract: The disclosure includes using dilation of speech content from a separated audio input for speech recognition. An audio input from a speaker and predicted changes for the audio input based on an external noise are received at a CNN (Convolutional Neural Network). In the CNN, diarization is applied to the audio input to predict how a dilation of speech content from the speaker changes the audio input to generate a CNN output. A resulting dilation is determined from the CNN output. A word error rate is determined for the dilated CNN output to determine an accuracy for speech to text outputs. An adjustment parameter is set to change a range of the dilation based on the word error rate, and the resulting dilation of the CNN output is adjusted based on the adjustment parameter to reduce the word error rate.

Type: Grant

Filed: September 9, 2020

Date of Patent: December 27, 2022

Assignee: International Business Machines Corporation .

Inventors: Aaron K. Baughman, Corey B. Shelton, Stephen C. Hammer, Shikhar Kwatra
Speech recognition using data analysis and dilation of interlaced audio input

Patent number: 11495216

Abstract: The disclosure includes using dilation of speech content from an interlaced audio input for speech recognition. A learning model is initiated to determine dilation parameters for each of a plurality of audible sounds of speech content from a plurality of speakers received at a computer as an audio input. As part of the learning model, a change of each of a plurality of independent sounds is determined in response to an audio stimulus, the independent sounds being derived from the audio input. The disclosure applies the dilation parameters, respectively, based on the change of each of the independent sounds. A voice print is constructed for each of the speakers based on the independent sounds and the dilation parameters, respectively. Speech content is attributed to each of the plurality of speakers based at least in part on the voice print, respectively, and the independent sounds.

Type: Grant

Filed: September 9, 2020

Date of Patent: November 8, 2022

Assignee: International Business Machines Corporation

Inventors: Aaron K. Baughman, Corey B. Shelton, Stephen C. Hammer, Shikhar Kwatra
VOICE CLONING TRANSFER FOR SPEECH SYNTHESIS

Publication number: 20220246130

Abstract: A method, computer system, and a computer program product for speech synthesis is provided. The present invention may include generating one or more final voiceprints. The present invention may include generating one or more voice clones based on the one or more final voiceprints. The present invention may include classifying the one or more voice clones into a grouping using a language model, wherein the language model is trained using manually classified uncloned voice samples. The present invention may include identifying a cluster within the grouping, wherein the cluster is identified by determining a difference between corresponding vectors of the one or more voice clones below a similarity threshold. The present invention may include generating a new archetypal voice by blending the one or more voice clones of the cluster where the difference between the corresponding vectors is below the similarity threshold.

Type: Application

Filed: January 29, 2021

Publication date: August 4, 2022

Inventors: Aaron K. Baughman, Gray Franklin Cannon, Sara Perelman, Gary William Reiss, Corey B. Shelton
SPEECH RECOGNITION USING DATA ANALYSIS AND DILATION OF SPEECH CONTENT FROM SEPARATED AUDIO INPUT

Publication number: 20220076665

Abstract: The disclosure includes using dilation of speech content from a separated audio input for speech recognition. An audio input from a speaker and predicted changes for the audio input based on an external noise are received at a CNN (Convolutional Neural Network). In the CNN, diarization is applied to the audio input to predict how a dilation of speech content from the speaker changes the audio input to generate a CNN output. A resulting dilation is determined from the CNN output. A word error rate is determined for the dilated CNN output to determine an accuracy for speech to text outputs. An adjustment parameter is set to change a range of the dilation based on the word error rate, and the resulting dilation of the CNN output is adjusted based on the adjustment parameter to reduce the word error rate.

Type: Application

Filed: September 9, 2020

Publication date: March 10, 2022

Inventors: Aaron K. Baughman, Corey B. Shelton, Stephen C. Hammer, Shikhar Kwatra
SPEECH RECOGNITION USING DATA ANALYSIS AND DILATION OF INTERLACED AUDIO INPUT

Publication number: 20220076664

Abstract: The disclosure includes using dilation of speech content from an interlaced audio input for speech recognition. A learning model is initiated to determine dilation parameters for each of a plurality of audible sounds of speech content from a plurality of speakers received at a computer as an audio input. As part of the learning model, a change of each of a plurality of independent sounds is determined in response to an audio stimulus, the independent sounds being derived from the audio input. The disclosure applies the dilation parameters, respectively, based on the change of each of the independent sounds. A voice print is constructed for each of the speakers based on the independent sounds and the dilation parameters, respectively. Speech content is attributed to each of the plurality of speakers based at least in part on the voice print, respectively, and the independent sounds.

Type: Application

Filed: September 9, 2020

Publication date: March 10, 2022

Inventors: Aaron K. Baughman, Corey B. Shelton, Stephen C. Hammer, Shikhar Kwatra
COMPUTER GENERATED DATA ANALYSIS AND LEARNING TO DERIVE MULTIMEDIA FACTOIDS

Publication number: 20220027550

Abstract: A relevant factoid(s) related to multimedia data is generated by splitting a multimedia item into a media component and a text component. Text information is retrieved relevant to text data from the text component using a query. The text information is summarized into a factoid. Source data is checked for an image based on the multimedia component. A current state image is generated from the image. The factoid and the current state image are combined into a combined factoid, and the combined factoid is stored for sending to a media outlet for presentation on a media format.

Type: Application

Filed: July 27, 2020

Publication date: January 27, 2022

Inventors: Aaron K. Baughman, Stephen C. Hammer, Corey B. Shelton, Nicholas Michael Wilkin, Sara Perelman
DEEP EVOLVED STRATEGIES WITH REINFORCEMENT

Publication number: 20210383204

Abstract: According to a first aspect of the present invention, there is provided a computer implemented method, a computer system and a computer program product, including training a set of exploitation models, training a set of exploration models, generating a combined exploitation and exploration heat map, and inputting the combined exploitation and exploration heat map into a convoluted neural network.

Type: Application

Filed: June 3, 2020

Publication date: December 9, 2021

Inventors: Aaron K. Baughman, Gray Franklin Cannon, Gary William Reiss, Corey B. Shelton, Stephen C. Hammer