Patents by Inventor Kenneth L. Clarkson
Kenneth L. Clarkson has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11657194Abstract: A method for optimal design of experiments for joint model selection and parametrization determination of a symbolic mathematical model includes: determining a prediction value for a given inquiry data point, functional form and parameterization for conducting an experiment relating to a system under investigation; assuming a set of input-output data pairs as a starting point in a model discovery process relating to the system under investigation; performing discovery of symbolic models minimizing complexity for a bounded misfit, or minimizing a misfit measure, subject to bounded complexity; determining a new data point through optimal experimental design that informs best as for the underlying symbolic models; and updating a posterior distribution, given results of the experiment relating to the system under investigation for the determined new data point to enable informed assessment among a plurality of functional forms and parameterizations. An apparatus configured to perform the method is also provided.Type: GrantFiled: April 22, 2020Date of Patent: May 23, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Lior Horesh, Kenneth L. Clarkson, Cristina Cornelio, Sara Magliacane
-
Patent number: 11455562Abstract: A method of detecting cliques in a graph includes determining, based on a number of nodes in the graph, a number of qubits to be included in a quantum processor. The method includes assigning to each node in the graph, a qubit of the quantum processor. The method includes operating on the qubits with a preparation circuit to create a quantum state in the qubits that corresponds to the graph. The method includes operating on the quantum state with a random walk circuit, and measuring the qubits of the quantum processor to detect cliques in the graph. The preparation circuit comprises a plurality of single- and two-qubit operators, wherein, for each pair of adjacent nodes in the graph, an operator of the plurality of two-qubit operators acts on a pair of qubits corresponding to the pair of adjacent nodes to create the quantum state.Type: GrantFiled: September 17, 2019Date of Patent: September 27, 2022Assignee: International Business Machines CorporationInventors: Tal Kachman, Lior Horesh, Giacomo Nannicini, Mark S. Squillante, John A. Gunnels, Kenneth L. Clarkson
-
Patent number: 11347810Abstract: A method is described for automatically correcting metadata errors in a k-mer database. A k-mer database having a self-consistent taxonomy based on genome-genome distance was constructed from a set of sample and reference genomes whose metadata included taxonomic labeling from a reference taxonomy (the standard NCBI taxonomy), which is not based on genetic distance. As a result, genomes of a given taxonomic ID of the self-consistent taxonomy could be separated into clusters based on the differences in the metadata. Genomes of the clusters less than a minimum cluster size Cmin were removed and profiled against the remaining genomes, correcting metadata automatically for those genomes that could be mapped back. The resulting k-mer database showed improved specificity for genetic profiling. Another method is described for identifying and handling chimeric genomes using the self-consistent taxonomy. Another method is described for correcting a classification database.Type: GrantFiled: December 20, 2018Date of Patent: May 31, 2022Assignee: International Business Machines CorporationInventors: James H. Kaufman, Matthew A. Davis, Mark Kunitomi, Kenneth L. Clarkson
-
Publication number: 20210406954Abstract: A method of detecting cliques in a graph includes determining, based on a number of nodes in the graph, a number of qubits to be included in a quantum processor. The method includes assigning to each node in the graph, a qubit of the quantum processor. The method includes operating on the qubits with a preparation circuit to create a quantum state in the qubits that corresponds to the graph. The method includes operating on the quantum state with a random walk circuit, and measuring the qubits of the quantum processor to detect cliques in the graph. The preparation circuit comprises a plurality of single- and two-qubit operators, wherein, for each pair of adjacent nodes in the graph, an operator of the plurality of two-qubit operators acts on a pair of qubits corresponding to the pair of adjacent nodes to create the quantum state.Type: ApplicationFiled: September 17, 2019Publication date: December 30, 2021Inventors: Tal Kachman, Lior Horesh, Giacomo Nannicini, Mark S. Squillante, John A. Gunnels, Kenneth L. Clarkson
-
Patent number: 11163774Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.Type: GrantFiled: May 6, 2019Date of Patent: November 2, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Kenneth L. Clarkson, David P. Woodruff
-
Publication number: 20210334432Abstract: A method for optimal design of experiments for joint model selection and parametrization determination of a symbolic mathematical model includes: determining a prediction value for a given inquiry data point, functional form and parameterization for conducting an experiment relating to a system under investigation; assuming a set of input-output data pairs as a starting point in a model discovery process relating to the system under investigation; performing discovery of symbolic models minimizing complexity for a bounded misfit, or minimizing a misfit measure, subject to bounded complexity; determining a new data point through optimal experimental design that informs best as for the underlying symbolic models; and updating a posterior distribution, given results of the experiment relating to the system under investigation for the determined new data point to enable informed assessment among a plurality of functional forms and parameterizations. An apparatus configured to perform the method is also provided.Type: ApplicationFiled: April 22, 2020Publication date: October 28, 2021Inventors: Lior Horesh, Kenneth L. Clarkson, Cristina Cornelio, Sara Magliacane
-
Patent number: 10902346Abstract: One embodiment provides generating a similarity matrix corresponding to an input collection including initializing, by a processor, a working set as a collection of a multiple items. Until the similarity matrix converges: receiving a seed for similarity for at least one pair of items of the multiple items, and obtaining a similarity value for all other item pairs using a Naive Triangle Inequality process. The similarity is generated with obtained similarity values.Type: GrantFiled: March 28, 2017Date of Patent: January 26, 2021Assignee: International Business Machines CorporationInventors: Alfredo Alba, Kenneth L. Clarkson, Clemens Drews, Ronald Fagin, Daniel F. Gruhl, Neal R. Lewis, Pablo N. Mendes, Meenakshi Nagarajan, Cartic Ramakrishnan
-
Publication number: 20200201905Abstract: A method is described for automatically correcting metadata errors in a k-mer database. A k-mer database having a self-consistent taxonomy based on genome-genome distance was constructed from a set of sample and reference genomes whose metadata included taxonomic labeling from a reference taxonomy (the standard NCBI taxonomy), which is not based on genetic distance. As a result, genomes of a given taxonomic ID of the self-consistent taxonomy could be separated into clusters based on the differences in the metadata. Genomes of the clusters less than a minimum cluster size Cmin were removed and profiled against the remaining genomes, correcting metadata automatically for those genomes that could be mapped back. The resulting k-mer database showed improved specificity for genetic profiling. Another method is described for identifying and handling chimeric genomes using the self-consistent taxonomy. Another method is described for correcting a classification database.Type: ApplicationFiled: December 20, 2018Publication date: June 25, 2020Inventors: James H. Kaufman, Matthew A. Davis, Mark Kunitomi, Kenneth L. Clarkson
-
Publication number: 20190258640Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.Type: ApplicationFiled: May 6, 2019Publication date: August 22, 2019Inventors: KENNETH L. CLARKSON, DAVID P. WOODRUFF
-
Patent number: 10346405Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.Type: GrantFiled: October 17, 2016Date of Patent: July 9, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Kenneth L. Clarkson, David P. Woodruff
-
Publication number: 20180285762Abstract: One embodiment provides generating a similarity matrix corresponding to an input collection including initializing, by a processor, a working set as a collection of a multiple items. Until the similarity matrix converges: receiving a seed for similarity for at least one pair of items of the multiple items, and obtaining a similarity value for all other item pairs using a Naive Triangle Inequality process. The similarity is generated with obtained similarity values.Type: ApplicationFiled: March 28, 2017Publication date: October 4, 2018Inventors: Alfredo Alba, Kenneth L. Clarkson, Clemens Drews, Ronald Fagin, Daniel F. Gruhl, Neal R. Lewis, Pablo N. Mendes, Meenakshi Nagarajan, Cartic Ramakrishnan
-
Patent number: 9971735Abstract: A system for retrieving stored data includes memory and a processor. The memory stores a first matrix, A, having dimensions n×d, a first sparse matrix, R, and a second sparse matrix, S. The processor receives an input value, k, corresponding to a selected rank to generate a second matrix, AR, by multiplying the first matrix, A, by the first sparse matrix, R. The second matrix, AT, has dimensions n×t. The processor generates a third matrix, SA, by multiplying the second sparse matrix, S, by the first matrix, A. The third matrix, SA, has dimensions t?×n, and the processor generates a fourth matrix, (SAR)?, by calculating a Moore-Penrose pseudo-inverse of a matrix, (SAR), and approximating the first matrix, A by generating a fifth matrix, Ã, the fifth matrix defined as AR×(SAR)?×SA.Type: GrantFiled: September 11, 2013Date of Patent: May 15, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Kenneth L. Clarkson, David P. Woodruff
-
Publication number: 20180107716Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.Type: ApplicationFiled: October 17, 2016Publication date: April 19, 2018Inventors: Kenneth L. Clarkson, David P. Woodruff
-
Patent number: 9658987Abstract: Embodiments of the invention relate to sketching for M-estimators for performing regression. One embodiment includes providing one or more sets of input data. A matrix A and a vector b are generated using the input data. A processor device is used for processing the matrix A and the vector b based on a randomized sketching matrix S. A vector x that minimizes a normalized measure function is determined based on the matrix A and the vector b. A relationship between the input data is determined based on the vector x.Type: GrantFiled: May 15, 2014Date of Patent: May 23, 2017Assignee: International Business Machines CorporationInventors: Haim Avron, Kenneth L. Clarkson, Huy Le Nguyen, David P. Woodruff
-
Patent number: 9348806Abstract: Embodiments of the present invention relate to a pattern-based system for building dictionaries of terms related to a seed set of terms. In one embodiment, a text is read. The text comprises a plurality of tokens. A first plurality of patterns is read. The first plurality of tokens is searched using the first plurality of patterns to generate a plurality of candidate terms. Each of the plurality of candidate term comprises one or more of the plurality of tokens. A plurality of seed terms is read. Each of the first plurality of patterns is scored based on the plurality of candidate terms and the plurality of seed terms.Type: GrantFiled: September 30, 2014Date of Patent: May 24, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Kenneth L. Clarkson, Daniel F. Gruhl, Neal R. Lewis, Nimrod Megiddo
-
Publication number: 20160092435Abstract: Embodiments of the present invention relate to a pattern-based system for building dictionaries of terms related to a seed set of terms. In one embodiment, a text is read. The text comprises a plurality of tokens. A first plurality of patterns is read. The first plurality of tokens is searched using the first plurality of patterns to generate a plurality of candidate terms. Each of the plurality of candidate term comprises one or more of the plurality of tokens. A plurality of seed terms is read. Each of the first plurality of patterns is scored based on the plurality of candidate terms and the plurality of seed terms.Type: ApplicationFiled: September 30, 2014Publication date: March 31, 2016Inventors: Kenneth L. Clarkson, Daniel F. Gruhl, Neal R. Lewis, Nimrod Megiddo
-
Publication number: 20150331835Abstract: Embodiments of the invention relate to sketching for M-estimators for performing regression. One embodiment includes providing one or more sets of input data. A matrix A and a vector b are generated using the input data. A processor device is used for processing the matrix A and the vector b based on a randomized sketching matrix S. A vector x that minimizes a normalized measure function is determined based on the matrix A and the vector b. A relationship between the input data is determined based on the vector x.Type: ApplicationFiled: May 15, 2014Publication date: November 19, 2015Applicant: International Business Machines CorporationInventors: Haim Avron, Kenneth L. Clarkson, Huy Le Nguyen, David P. Woodruff
-
Publication number: 20140280428Abstract: A system for retrieving stored data includes memory and a processor. The memory stores a first matrix, A, having dimensions n×d, a first sparse matrix, R, and a second sparse matrix, S. The processor receives an input value, k, corresponding to a selected rank to generate a second matrix, RA, by multiplying the first matrix, A, by the first sparse matrix, R. The second matrix, RA, has dimensions n×t. The processor generates a third matrix, AST, by multiplying the first matrix, A, by the second sparse matrix, S, transposed. The third matrix, AST, has dimensions d×t?, and the processor generates a fourth matrix, (SART)?, by calculating a Moore-Penrose pseudo-inverse of a matrix, (SART), and approximating the first matrix, A by generating a fifth matrix, Â, the fifth matrix defined as AST×(SART)?×RA.Type: ApplicationFiled: September 11, 2013Publication date: September 18, 2014Applicant: International Business Machines CorporationInventors: Kenneth L. Clarkson, David P. Woodruff
-
Publication number: 20140280426Abstract: Embodiments of the invention include method of approximating a matrix of data using sparse matrices which includes receiving a first matrix and generating a second matrix based on the first matrix and a first sparse matrix. The method further includes generating a third matrix based on the first matrix and a second sparse matrix and generating a fourth matrix by generating a Moore-Penrose pseudo-inverse matrix based on the first matrix, the second matrix and the third matrix. The method also includes generating a fifth matrix based on a product of the second matrix, the third matrix, and a fourth matrix. The method further includes receiving, by a computer, a request to access at least one entry of the first matrix and responding to the request by accessing an entry of the fifth matrix.Type: ApplicationFiled: March 13, 2013Publication date: September 18, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Kenneth L. Clarkson, David P. Woodruff
-
Patent number: 8255401Abstract: A method, system and program product for computer information retrieval is disclosed. A matrix A is received. Random sign matrices S and R are generated. Matrix products of S^T*A, A*R, and S^T*A*R are computed. A Moore-Penrose pseudoinverse C of S^T*A*R is computed. A singular value decomposition is computed of the pseudoinverse C. Three matrices ARU, Sigma, and V^TS^TA are outputted as factorization in applications.Type: GrantFiled: April 28, 2010Date of Patent: August 28, 2012Assignee: International Business Machines CorporationInventors: Kenneth L. Clarkson, David P. Woodruff