Patents by Inventor Kenneth L. Clarkson

Kenneth L. Clarkson has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Experimental design for symbolic model discovery

Patent number: 11657194

Abstract: A method for optimal design of experiments for joint model selection and parametrization determination of a symbolic mathematical model includes: determining a prediction value for a given inquiry data point, functional form and parameterization for conducting an experiment relating to a system under investigation; assuming a set of input-output data pairs as a starting point in a model discovery process relating to the system under investigation; performing discovery of symbolic models minimizing complexity for a bounded misfit, or minimizing a misfit measure, subject to bounded complexity; determining a new data point through optimal experimental design that informs best as for the underlying symbolic models; and updating a posterior distribution, given results of the experiment relating to the system under investigation for the determined new data point to enable informed assessment among a plurality of functional forms and parameterizations. An apparatus configured to perform the method is also provided.

Type: Grant

Filed: April 22, 2020

Date of Patent: May 23, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Lior Horesh, Kenneth L. Clarkson, Cristina Cornelio, Sara Magliacane
Quantum walk for community clique detection

Patent number: 11455562

Abstract: A method of detecting cliques in a graph includes determining, based on a number of nodes in the graph, a number of qubits to be included in a quantum processor. The method includes assigning to each node in the graph, a qubit of the quantum processor. The method includes operating on the qubits with a preparation circuit to create a quantum state in the qubits that corresponds to the graph. The method includes operating on the quantum state with a random walk circuit, and measuring the qubits of the quantum processor to detect cliques in the graph. The preparation circuit comprises a plurality of single- and two-qubit operators, wherein, for each pair of adjacent nodes in the graph, an operator of the plurality of two-qubit operators acts on a pair of qubits corresponding to the pair of adjacent nodes to create the quantum state.

Type: Grant

Filed: September 17, 2019

Date of Patent: September 27, 2022

Assignee: International Business Machines Corporation

Inventors: Tal Kachman, Lior Horesh, Giacomo Nannicini, Mark S. Squillante, John A. Gunnels, Kenneth L. Clarkson
Methods of automatically and self-consistently correcting genome databases

Patent number: 11347810

Abstract: A method is described for automatically correcting metadata errors in a k-mer database. A k-mer database having a self-consistent taxonomy based on genome-genome distance was constructed from a set of sample and reference genomes whose metadata included taxonomic labeling from a reference taxonomy (the standard NCBI taxonomy), which is not based on genetic distance. As a result, genomes of a given taxonomic ID of the self-consistent taxonomy could be separated into clusters based on the differences in the metadata. Genomes of the clusters less than a minimum cluster size Cmin were removed and profiled against the remaining genomes, correcting metadata automatically for those genomes that could be mapped back. The resulting k-mer database showed improved specificity for genetic profiling. Another method is described for identifying and handling chimeric genomes using the self-consistent taxonomy. Another method is described for correcting a classification database.

Type: Grant

Filed: December 20, 2018

Date of Patent: May 31, 2022

Assignee: International Business Machines Corporation

Inventors: James H. Kaufman, Matthew A. Davis, Mark Kunitomi, Kenneth L. Clarkson
QUANTUM WALK FOR COMMUNITY CLIQUE DETECTION

Publication number: 20210406954

Abstract: A method of detecting cliques in a graph includes determining, based on a number of nodes in the graph, a number of qubits to be included in a quantum processor. The method includes assigning to each node in the graph, a qubit of the quantum processor. The method includes operating on the qubits with a preparation circuit to create a quantum state in the qubits that corresponds to the graph. The method includes operating on the quantum state with a random walk circuit, and measuring the qubits of the quantum processor to detect cliques in the graph. The preparation circuit comprises a plurality of single- and two-qubit operators, wherein, for each pair of adjacent nodes in the graph, an operator of the plurality of two-qubit operators acts on a pair of qubits corresponding to the pair of adjacent nodes to create the quantum state.

Type: Application

Filed: September 17, 2019

Publication date: December 30, 2021

Inventors: Tal Kachman, Lior Horesh, Giacomo Nannicini, Mark S. Squillante, John A. Gunnels, Kenneth L. Clarkson
Lower-dimensional subspace approximation of a dataset

Patent number: 11163774

Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.

Type: Grant

Filed: May 6, 2019

Date of Patent: November 2, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kenneth L. Clarkson, David P. Woodruff
Experimental Design for Symbolic Model Discovery

Publication number: 20210334432

Abstract: A method for optimal design of experiments for joint model selection and parametrization determination of a symbolic mathematical model includes: determining a prediction value for a given inquiry data point, functional form and parameterization for conducting an experiment relating to a system under investigation; assuming a set of input-output data pairs as a starting point in a model discovery process relating to the system under investigation; performing discovery of symbolic models minimizing complexity for a bounded misfit, or minimizing a misfit measure, subject to bounded complexity; determining a new data point through optimal experimental design that informs best as for the underlying symbolic models; and updating a posterior distribution, given results of the experiment relating to the system under investigation for the determined new data point to enable informed assessment among a plurality of functional forms and parameterizations. An apparatus configured to perform the method is also provided.

Type: Application

Filed: April 22, 2020

Publication date: October 28, 2021

Inventors: Lior Horesh, Kenneth L. Clarkson, Cristina Cornelio, Sara Magliacane
Efficient semi-supervised concept organization accelerated via an inequality process

Patent number: 10902346

Abstract: One embodiment provides generating a similarity matrix corresponding to an input collection including initializing, by a processor, a working set as a collection of a multiple items. Until the similarity matrix converges: receiving a seed for similarity for at least one pair of items of the multiple items, and obtaining a similarity value for all other item pairs using a Naive Triangle Inequality process. The similarity is generated with obtained similarity values.

Type: Grant

Filed: March 28, 2017

Date of Patent: January 26, 2021

Assignee: International Business Machines Corporation

Inventors: Alfredo Alba, Kenneth L. Clarkson, Clemens Drews, Ronald Fagin, Daniel F. Gruhl, Neal R. Lewis, Pablo N. Mendes, Meenakshi Nagarajan, Cartic Ramakrishnan
METHODS OF AUTOMATICALLY AND SELF-CONSISTENTLY CORRECTING GENOME DATABASES

Publication number: 20200201905

Abstract: A method is described for automatically correcting metadata errors in a k-mer database. A k-mer database having a self-consistent taxonomy based on genome-genome distance was constructed from a set of sample and reference genomes whose metadata included taxonomic labeling from a reference taxonomy (the standard NCBI taxonomy), which is not based on genetic distance. As a result, genomes of a given taxonomic ID of the self-consistent taxonomy could be separated into clusters based on the differences in the metadata. Genomes of the clusters less than a minimum cluster size Cmin were removed and profiled against the remaining genomes, correcting metadata automatically for those genomes that could be mapped back. The resulting k-mer database showed improved specificity for genetic profiling. Another method is described for identifying and handling chimeric genomes using the self-consistent taxonomy. Another method is described for correcting a classification database.

Type: Application

Filed: December 20, 2018

Publication date: June 25, 2020

Inventors: James H. Kaufman, Matthew A. Davis, Mark Kunitomi, Kenneth L. Clarkson
LOWER-DIMENSIONAL SUBSPACE APPROXIMATION OF A DATASET

Publication number: 20190258640

Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.

Type: Application

Filed: May 6, 2019

Publication date: August 22, 2019

Inventors: KENNETH L. CLARKSON, DAVID P. WOODRUFF
Lower-dimensional subspace approximation of a dataset

Patent number: 10346405

Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.

Type: Grant

Filed: October 17, 2016

Date of Patent: July 9, 2019

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kenneth L. Clarkson, David P. Woodruff
EFFICIENT SEMI-SUPERVISED CONCEPT ORGANIZATION ACCELERATED VIA AN INEQUALITY PROCESS

Publication number: 20180285762

Abstract: One embodiment provides generating a similarity matrix corresponding to an input collection including initializing, by a processor, a working set as a collection of a multiple items. Until the similarity matrix converges: receiving a seed for similarity for at least one pair of items of the multiple items, and obtaining a similarity value for all other item pairs using a Naive Triangle Inequality process. The similarity is generated with obtained similarity values.

Type: Application

Filed: March 28, 2017

Publication date: October 4, 2018

Inventors: Alfredo Alba, Kenneth L. Clarkson, Clemens Drews, Ronald Fagin, Daniel F. Gruhl, Neal R. Lewis, Pablo N. Mendes, Meenakshi Nagarajan, Cartic Ramakrishnan
Information retrieval using sparse matrix sketching

Patent number: 9971735

Abstract: A system for retrieving stored data includes memory and a processor. The memory stores a first matrix, A, having dimensions n×d, a first sparse matrix, R, and a second sparse matrix, S. The processor receives an input value, k, corresponding to a selected rank to generate a second matrix, AR, by multiplying the first matrix, A, by the first sparse matrix, R. The second matrix, AT, has dimensions n×t. The processor generates a third matrix, SA, by multiplying the second sparse matrix, S, by the first matrix, A. The third matrix, SA, has dimensions t?×n, and the processor generates a fourth matrix, (SAR)?, by calculating a Moore-Penrose pseudo-inverse of a matrix, (SAR), and approximating the first matrix, A by generating a fifth matrix, Ã, the fifth matrix defined as AR×(SAR)?×SA.

Type: Grant

Filed: September 11, 2013

Date of Patent: May 15, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kenneth L. Clarkson, David P. Woodruff
LOWER-DIMENSIONAL SUBSPACE APPROXIMATION OF A DATASET

Publication number: 20180107716

Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.

Type: Application

Filed: October 17, 2016

Publication date: April 19, 2018

Inventors: Kenneth L. Clarkson, David P. Woodruff
Regression using M-estimators and polynomial kernel support vector machines and principal component regression

Patent number: 9658987

Abstract: Embodiments of the invention relate to sketching for M-estimators for performing regression. One embodiment includes providing one or more sets of input data. A matrix A and a vector b are generated using the input data. A processor device is used for processing the matrix A and the vector b based on a randomized sketching matrix S. A vector x that minimizes a normalized measure function is determined based on the matrix A and the vector b. A relationship between the input data is determined based on the vector x.

Type: Grant

Filed: May 15, 2014

Date of Patent: May 23, 2017

Assignee: International Business Machines Corporation

Inventors: Haim Avron, Kenneth L. Clarkson, Huy Le Nguyen, David P. Woodruff
High speed dictionary expansion

Patent number: 9348806

Abstract: Embodiments of the present invention relate to a pattern-based system for building dictionaries of terms related to a seed set of terms. In one embodiment, a text is read. The text comprises a plurality of tokens. A first plurality of patterns is read. The first plurality of tokens is searched using the first plurality of patterns to generate a plurality of candidate terms. Each of the plurality of candidate term comprises one or more of the plurality of tokens. A plurality of seed terms is read. Each of the first plurality of patterns is scored based on the plurality of candidate terms and the plurality of seed terms.

Type: Grant

Filed: September 30, 2014

Date of Patent: May 24, 2016

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kenneth L. Clarkson, Daniel F. Gruhl, Neal R. Lewis, Nimrod Megiddo
HIGH SPEED DICTIONARY EXPANSION

Publication number: 20160092435

Abstract: Embodiments of the present invention relate to a pattern-based system for building dictionaries of terms related to a seed set of terms. In one embodiment, a text is read. The text comprises a plurality of tokens. A first plurality of patterns is read. The first plurality of tokens is searched using the first plurality of patterns to generate a plurality of candidate terms. Each of the plurality of candidate term comprises one or more of the plurality of tokens. A plurality of seed terms is read. Each of the first plurality of patterns is scored based on the plurality of candidate terms and the plurality of seed terms.

Type: Application

Filed: September 30, 2014

Publication date: March 31, 2016

Inventors: Kenneth L. Clarkson, Daniel F. Gruhl, Neal R. Lewis, Nimrod Megiddo
REGRESSION USING M-ESTIMATORS AND POLYNOMIAL KERNEL SUPPORT VECTOR MACHINES AND PRINCIPAL COMPONENT REGRESSION

Publication number: 20150331835

Abstract: Embodiments of the invention relate to sketching for M-estimators for performing regression. One embodiment includes providing one or more sets of input data. A matrix A and a vector b are generated using the input data. A processor device is used for processing the matrix A and the vector b based on a randomized sketching matrix S. A vector x that minimizes a normalized measure function is determined based on the matrix A and the vector b. A relationship between the input data is determined based on the vector x.

Type: Application

Filed: May 15, 2014

Publication date: November 19, 2015

Applicant: International Business Machines Corporation

Inventors: Haim Avron, Kenneth L. Clarkson, Huy Le Nguyen, David P. Woodruff
INFORMATION RETRIEVAL USING SPARSE MATRIX SKETCHING

Publication number: 20140280426

Abstract: Embodiments of the invention include method of approximating a matrix of data using sparse matrices which includes receiving a first matrix and generating a second matrix based on the first matrix and a first sparse matrix. The method further includes generating a third matrix based on the first matrix and a second sparse matrix and generating a fourth matrix by generating a Moore-Penrose pseudo-inverse matrix based on the first matrix, the second matrix and the third matrix. The method also includes generating a fifth matrix based on a product of the second matrix, the third matrix, and a fourth matrix. The method further includes receiving, by a computer, a request to access at least one entry of the first matrix and responding to the request by accessing an entry of the fifth matrix.

Type: Application

Filed: March 13, 2013

Publication date: September 18, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Kenneth L. Clarkson, David P. Woodruff
INFORMATION RETRIEVAL USING SPARSE MATRIX SKETCHING

Publication number: 20140280428

Abstract: A system for retrieving stored data includes memory and a processor. The memory stores a first matrix, A, having dimensions n×d, a first sparse matrix, R, and a second sparse matrix, S. The processor receives an input value, k, corresponding to a selected rank to generate a second matrix, RA, by multiplying the first matrix, A, by the first sparse matrix, R. The second matrix, RA, has dimensions n×t. The processor generates a third matrix, AST, by multiplying the first matrix, A, by the second sparse matrix, S, transposed. The third matrix, AST, has dimensions d×t?, and the processor generates a fourth matrix, (SART)?, by calculating a Moore-Penrose pseudo-inverse of a matrix, (SART), and approximating the first matrix, A by generating a fifth matrix, Â, the fifth matrix defined as AST×(SART)?×RA.

Type: Application

Filed: September 11, 2013

Publication date: September 18, 2014

Applicant: International Business Machines Corporation

Inventors: Kenneth L. Clarkson, David P. Woodruff
Computer information retrieval using latent semantic structure via sketches

Patent number: 8255401

Abstract: A method, system and program product for computer information retrieval is disclosed. A matrix A is received. Random sign matrices S and R are generated. Matrix products of S^T*A, A*R, and S^T*A*R are computed. A Moore-Penrose pseudoinverse C of S^T*A*R is computed. A singular value decomposition is computed of the pseudoinverse C. Three matrices ARU, Sigma, and V^TS^TA are outputted as factorization in applications.

Type: Grant

Filed: April 28, 2010

Date of Patent: August 28, 2012

Assignee: International Business Machines Corporation

Inventors: Kenneth L. Clarkson, David P. Woodruff

1 2 next