Patents by Inventor Kenneth L. Clarkson

Kenneth L. Clarkson has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11657194
    Abstract: A method for optimal design of experiments for joint model selection and parametrization determination of a symbolic mathematical model includes: determining a prediction value for a given inquiry data point, functional form and parameterization for conducting an experiment relating to a system under investigation; assuming a set of input-output data pairs as a starting point in a model discovery process relating to the system under investigation; performing discovery of symbolic models minimizing complexity for a bounded misfit, or minimizing a misfit measure, subject to bounded complexity; determining a new data point through optimal experimental design that informs best as for the underlying symbolic models; and updating a posterior distribution, given results of the experiment relating to the system under investigation for the determined new data point to enable informed assessment among a plurality of functional forms and parameterizations. An apparatus configured to perform the method is also provided.
    Type: Grant
    Filed: April 22, 2020
    Date of Patent: May 23, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lior Horesh, Kenneth L. Clarkson, Cristina Cornelio, Sara Magliacane
  • Patent number: 11455562
    Abstract: A method of detecting cliques in a graph includes determining, based on a number of nodes in the graph, a number of qubits to be included in a quantum processor. The method includes assigning to each node in the graph, a qubit of the quantum processor. The method includes operating on the qubits with a preparation circuit to create a quantum state in the qubits that corresponds to the graph. The method includes operating on the quantum state with a random walk circuit, and measuring the qubits of the quantum processor to detect cliques in the graph. The preparation circuit comprises a plurality of single- and two-qubit operators, wherein, for each pair of adjacent nodes in the graph, an operator of the plurality of two-qubit operators acts on a pair of qubits corresponding to the pair of adjacent nodes to create the quantum state.
    Type: Grant
    Filed: September 17, 2019
    Date of Patent: September 27, 2022
    Assignee: International Business Machines Corporation
    Inventors: Tal Kachman, Lior Horesh, Giacomo Nannicini, Mark S. Squillante, John A. Gunnels, Kenneth L. Clarkson
  • Patent number: 11347810
    Abstract: A method is described for automatically correcting metadata errors in a k-mer database. A k-mer database having a self-consistent taxonomy based on genome-genome distance was constructed from a set of sample and reference genomes whose metadata included taxonomic labeling from a reference taxonomy (the standard NCBI taxonomy), which is not based on genetic distance. As a result, genomes of a given taxonomic ID of the self-consistent taxonomy could be separated into clusters based on the differences in the metadata. Genomes of the clusters less than a minimum cluster size Cmin were removed and profiled against the remaining genomes, correcting metadata automatically for those genomes that could be mapped back. The resulting k-mer database showed improved specificity for genetic profiling. Another method is described for identifying and handling chimeric genomes using the self-consistent taxonomy. Another method is described for correcting a classification database.
    Type: Grant
    Filed: December 20, 2018
    Date of Patent: May 31, 2022
    Assignee: International Business Machines Corporation
    Inventors: James H. Kaufman, Matthew A. Davis, Mark Kunitomi, Kenneth L. Clarkson
  • Publication number: 20210406954
    Abstract: A method of detecting cliques in a graph includes determining, based on a number of nodes in the graph, a number of qubits to be included in a quantum processor. The method includes assigning to each node in the graph, a qubit of the quantum processor. The method includes operating on the qubits with a preparation circuit to create a quantum state in the qubits that corresponds to the graph. The method includes operating on the quantum state with a random walk circuit, and measuring the qubits of the quantum processor to detect cliques in the graph. The preparation circuit comprises a plurality of single- and two-qubit operators, wherein, for each pair of adjacent nodes in the graph, an operator of the plurality of two-qubit operators acts on a pair of qubits corresponding to the pair of adjacent nodes to create the quantum state.
    Type: Application
    Filed: September 17, 2019
    Publication date: December 30, 2021
    Inventors: Tal Kachman, Lior Horesh, Giacomo Nannicini, Mark S. Squillante, John A. Gunnels, Kenneth L. Clarkson
  • Patent number: 11163774
    Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.
    Type: Grant
    Filed: May 6, 2019
    Date of Patent: November 2, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kenneth L. Clarkson, David P. Woodruff
  • Publication number: 20210334432
    Abstract: A method for optimal design of experiments for joint model selection and parametrization determination of a symbolic mathematical model includes: determining a prediction value for a given inquiry data point, functional form and parameterization for conducting an experiment relating to a system under investigation; assuming a set of input-output data pairs as a starting point in a model discovery process relating to the system under investigation; performing discovery of symbolic models minimizing complexity for a bounded misfit, or minimizing a misfit measure, subject to bounded complexity; determining a new data point through optimal experimental design that informs best as for the underlying symbolic models; and updating a posterior distribution, given results of the experiment relating to the system under investigation for the determined new data point to enable informed assessment among a plurality of functional forms and parameterizations. An apparatus configured to perform the method is also provided.
    Type: Application
    Filed: April 22, 2020
    Publication date: October 28, 2021
    Inventors: Lior Horesh, Kenneth L. Clarkson, Cristina Cornelio, Sara Magliacane
  • Patent number: 10902346
    Abstract: One embodiment provides generating a similarity matrix corresponding to an input collection including initializing, by a processor, a working set as a collection of a multiple items. Until the similarity matrix converges: receiving a seed for similarity for at least one pair of items of the multiple items, and obtaining a similarity value for all other item pairs using a Naive Triangle Inequality process. The similarity is generated with obtained similarity values.
    Type: Grant
    Filed: March 28, 2017
    Date of Patent: January 26, 2021
    Assignee: International Business Machines Corporation
    Inventors: Alfredo Alba, Kenneth L. Clarkson, Clemens Drews, Ronald Fagin, Daniel F. Gruhl, Neal R. Lewis, Pablo N. Mendes, Meenakshi Nagarajan, Cartic Ramakrishnan
  • Publication number: 20200201905
    Abstract: A method is described for automatically correcting metadata errors in a k-mer database. A k-mer database having a self-consistent taxonomy based on genome-genome distance was constructed from a set of sample and reference genomes whose metadata included taxonomic labeling from a reference taxonomy (the standard NCBI taxonomy), which is not based on genetic distance. As a result, genomes of a given taxonomic ID of the self-consistent taxonomy could be separated into clusters based on the differences in the metadata. Genomes of the clusters less than a minimum cluster size Cmin were removed and profiled against the remaining genomes, correcting metadata automatically for those genomes that could be mapped back. The resulting k-mer database showed improved specificity for genetic profiling. Another method is described for identifying and handling chimeric genomes using the self-consistent taxonomy. Another method is described for correcting a classification database.
    Type: Application
    Filed: December 20, 2018
    Publication date: June 25, 2020
    Inventors: James H. Kaufman, Matthew A. Davis, Mark Kunitomi, Kenneth L. Clarkson
  • Publication number: 20190258640
    Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.
    Type: Application
    Filed: May 6, 2019
    Publication date: August 22, 2019
    Inventors: KENNETH L. CLARKSON, DAVID P. WOODRUFF
  • Patent number: 10346405
    Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.
    Type: Grant
    Filed: October 17, 2016
    Date of Patent: July 9, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kenneth L. Clarkson, David P. Woodruff
  • Publication number: 20180285762
    Abstract: One embodiment provides generating a similarity matrix corresponding to an input collection including initializing, by a processor, a working set as a collection of a multiple items. Until the similarity matrix converges: receiving a seed for similarity for at least one pair of items of the multiple items, and obtaining a similarity value for all other item pairs using a Naive Triangle Inequality process. The similarity is generated with obtained similarity values.
    Type: Application
    Filed: March 28, 2017
    Publication date: October 4, 2018
    Inventors: Alfredo Alba, Kenneth L. Clarkson, Clemens Drews, Ronald Fagin, Daniel F. Gruhl, Neal R. Lewis, Pablo N. Mendes, Meenakshi Nagarajan, Cartic Ramakrishnan
  • Patent number: 9971735
    Abstract: A system for retrieving stored data includes memory and a processor. The memory stores a first matrix, A, having dimensions n×d, a first sparse matrix, R, and a second sparse matrix, S. The processor receives an input value, k, corresponding to a selected rank to generate a second matrix, AR, by multiplying the first matrix, A, by the first sparse matrix, R. The second matrix, AT, has dimensions n×t. The processor generates a third matrix, SA, by multiplying the second sparse matrix, S, by the first matrix, A. The third matrix, SA, has dimensions t?×n, and the processor generates a fourth matrix, (SAR)?, by calculating a Moore-Penrose pseudo-inverse of a matrix, (SAR), and approximating the first matrix, A by generating a fifth matrix, Ã, the fifth matrix defined as AR×(SAR)?×SA.
    Type: Grant
    Filed: September 11, 2013
    Date of Patent: May 15, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kenneth L. Clarkson, David P. Woodruff
  • Publication number: 20180107716
    Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.
    Type: Application
    Filed: October 17, 2016
    Publication date: April 19, 2018
    Inventors: Kenneth L. Clarkson, David P. Woodruff
  • Patent number: 9658987
    Abstract: Embodiments of the invention relate to sketching for M-estimators for performing regression. One embodiment includes providing one or more sets of input data. A matrix A and a vector b are generated using the input data. A processor device is used for processing the matrix A and the vector b based on a randomized sketching matrix S. A vector x that minimizes a normalized measure function is determined based on the matrix A and the vector b. A relationship between the input data is determined based on the vector x.
    Type: Grant
    Filed: May 15, 2014
    Date of Patent: May 23, 2017
    Assignee: International Business Machines Corporation
    Inventors: Haim Avron, Kenneth L. Clarkson, Huy Le Nguyen, David P. Woodruff
  • Patent number: 9348806
    Abstract: Embodiments of the present invention relate to a pattern-based system for building dictionaries of terms related to a seed set of terms. In one embodiment, a text is read. The text comprises a plurality of tokens. A first plurality of patterns is read. The first plurality of tokens is searched using the first plurality of patterns to generate a plurality of candidate terms. Each of the plurality of candidate term comprises one or more of the plurality of tokens. A plurality of seed terms is read. Each of the first plurality of patterns is scored based on the plurality of candidate terms and the plurality of seed terms.
    Type: Grant
    Filed: September 30, 2014
    Date of Patent: May 24, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kenneth L. Clarkson, Daniel F. Gruhl, Neal R. Lewis, Nimrod Megiddo
  • Publication number: 20160092435
    Abstract: Embodiments of the present invention relate to a pattern-based system for building dictionaries of terms related to a seed set of terms. In one embodiment, a text is read. The text comprises a plurality of tokens. A first plurality of patterns is read. The first plurality of tokens is searched using the first plurality of patterns to generate a plurality of candidate terms. Each of the plurality of candidate term comprises one or more of the plurality of tokens. A plurality of seed terms is read. Each of the first plurality of patterns is scored based on the plurality of candidate terms and the plurality of seed terms.
    Type: Application
    Filed: September 30, 2014
    Publication date: March 31, 2016
    Inventors: Kenneth L. Clarkson, Daniel F. Gruhl, Neal R. Lewis, Nimrod Megiddo
  • Publication number: 20150331835
    Abstract: Embodiments of the invention relate to sketching for M-estimators for performing regression. One embodiment includes providing one or more sets of input data. A matrix A and a vector b are generated using the input data. A processor device is used for processing the matrix A and the vector b based on a randomized sketching matrix S. A vector x that minimizes a normalized measure function is determined based on the matrix A and the vector b. A relationship between the input data is determined based on the vector x.
    Type: Application
    Filed: May 15, 2014
    Publication date: November 19, 2015
    Applicant: International Business Machines Corporation
    Inventors: Haim Avron, Kenneth L. Clarkson, Huy Le Nguyen, David P. Woodruff
  • Publication number: 20140280428
    Abstract: A system for retrieving stored data includes memory and a processor. The memory stores a first matrix, A, having dimensions n×d, a first sparse matrix, R, and a second sparse matrix, S. The processor receives an input value, k, corresponding to a selected rank to generate a second matrix, RA, by multiplying the first matrix, A, by the first sparse matrix, R. The second matrix, RA, has dimensions n×t. The processor generates a third matrix, AST, by multiplying the first matrix, A, by the second sparse matrix, S, transposed. The third matrix, AST, has dimensions d×t?, and the processor generates a fourth matrix, (SART)?, by calculating a Moore-Penrose pseudo-inverse of a matrix, (SART), and approximating the first matrix, A by generating a fifth matrix, Â, the fifth matrix defined as AST×(SART)?×RA.
    Type: Application
    Filed: September 11, 2013
    Publication date: September 18, 2014
    Applicant: International Business Machines Corporation
    Inventors: Kenneth L. Clarkson, David P. Woodruff
  • Publication number: 20140280426
    Abstract: Embodiments of the invention include method of approximating a matrix of data using sparse matrices which includes receiving a first matrix and generating a second matrix based on the first matrix and a first sparse matrix. The method further includes generating a third matrix based on the first matrix and a second sparse matrix and generating a fourth matrix by generating a Moore-Penrose pseudo-inverse matrix based on the first matrix, the second matrix and the third matrix. The method also includes generating a fifth matrix based on a product of the second matrix, the third matrix, and a fourth matrix. The method further includes receiving, by a computer, a request to access at least one entry of the first matrix and responding to the request by accessing an entry of the fifth matrix.
    Type: Application
    Filed: March 13, 2013
    Publication date: September 18, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kenneth L. Clarkson, David P. Woodruff
  • Patent number: 8255401
    Abstract: A method, system and program product for computer information retrieval is disclosed. A matrix A is received. Random sign matrices S and R are generated. Matrix products of S^T*A, A*R, and S^T*A*R are computed. A Moore-Penrose pseudoinverse C of S^T*A*R is computed. A singular value decomposition is computed of the pseudoinverse C. Three matrices ARU, Sigma, and V^TS^TA are outputted as factorization in applications.
    Type: Grant
    Filed: April 28, 2010
    Date of Patent: August 28, 2012
    Assignee: International Business Machines Corporation
    Inventors: Kenneth L. Clarkson, David P. Woodruff