Patents by Inventor David P. Woodruff

David P. Woodruff has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11163774
    Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.
    Type: Grant
    Filed: May 6, 2019
    Date of Patent: November 2, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kenneth L. Clarkson, David P. Woodruff
  • Publication number: 20190258640
    Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.
    Type: Application
    Filed: May 6, 2019
    Publication date: August 22, 2019
    Inventors: KENNETH L. CLARKSON, DAVID P. WOODRUFF
  • Patent number: 10346405
    Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.
    Type: Grant
    Filed: October 17, 2016
    Date of Patent: July 9, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kenneth L. Clarkson, David P. Woodruff
  • Patent number: 10007643
    Abstract: Embodiments relate to methodologies and program product is provided for conducting regression analysis. In one embodiment the method includes obtaining data related to a statistical process including a plurality of points in a plurality of dimensions and organizing the plurality of points and the plurality of dimensions in a matrix. The method also includes calculating a vector of a particular measurement such that the measurement equal the number of the plurality of points and calculating a least absolute deviation by determining the number of non-zero entries provided in the matrix.
    Type: Grant
    Filed: April 7, 2014
    Date of Patent: June 26, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: David P. Woodruff, Qin Zhang
  • Patent number: 9971735
    Abstract: A system for retrieving stored data includes memory and a processor. The memory stores a first matrix, A, having dimensions n×d, a first sparse matrix, R, and a second sparse matrix, S. The processor receives an input value, k, corresponding to a selected rank to generate a second matrix, AR, by multiplying the first matrix, A, by the first sparse matrix, R. The second matrix, AT, has dimensions n×t. The processor generates a third matrix, SA, by multiplying the second sparse matrix, S, by the first matrix, A. The third matrix, SA, has dimensions t?×n, and the processor generates a fourth matrix, (SAR)?, by calculating a Moore-Penrose pseudo-inverse of a matrix, (SAR), and approximating the first matrix, A by generating a fifth matrix, Ã, the fifth matrix defined as AR×(SAR)?×SA.
    Type: Grant
    Filed: September 11, 2013
    Date of Patent: May 15, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Kenneth L. Clarkson, David P. Woodruff
  • Publication number: 20180107716
    Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.
    Type: Application
    Filed: October 17, 2016
    Publication date: April 19, 2018
    Inventors: Kenneth L. Clarkson, David P. Woodruff
  • Patent number: 9928214
    Abstract: A system, method and computer program product for quickly and approximately solving structured regression problems. In one aspect, the system, method and computer program product are applied to problems that arise naturally in various statistical modeling settings—when the design matrix is a Vandermonde matrix or a sequence of such matrices. Using the Vandermonde matrix structure further accelerates the solution of the regression problem, achieving running times that are faster than “input sparsity”. The modeling framework speedup benefits of randomized regression for solving structured regression problems.
    Type: Grant
    Filed: July 17, 2014
    Date of Patent: March 27, 2018
    Assignee: International Business Machines Corporation
    Inventors: Haim Avron, Vikas Sindhwani, David P. Woodruff
  • Patent number: 9760537
    Abstract: One embodiments is a computer-implemented method for finding a CUR decomposition. The method includes constructing, by a computer processor, a matrix C based on a matrix A. A matrix R is constructed based on the matrix A and the matrix C. A matrix U is constructed based on the matrices A, C, and R. The matrices C, U, and R provide a CUR decomposition of the matrix A. The construction of the matrices C, U, and R provide at least one of an input-sparsity-time CUR and a deterministic CUR.
    Type: Grant
    Filed: October 28, 2014
    Date of Patent: September 12, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Christos Boutsidis, David P. Woodruff
  • Patent number: 9658987
    Abstract: Embodiments of the invention relate to sketching for M-estimators for performing regression. One embodiment includes providing one or more sets of input data. A matrix A and a vector b are generated using the input data. A processor device is used for processing the matrix A and the vector b based on a randomized sketching matrix S. A vector x that minimizes a normalized measure function is determined based on the matrix A and the vector b. A relationship between the input data is determined based on the vector x.
    Type: Grant
    Filed: May 15, 2014
    Date of Patent: May 23, 2017
    Assignee: International Business Machines Corporation
    Inventors: Haim Avron, Kenneth L. Clarkson, Huy Le Nguyen, David P. Woodruff
  • Patent number: 9438704
    Abstract: Embodiments relate to data processing. A method includes analyzing a plurality of data items in a relational database, where different portions of the data items are stored in a plurality of servers. The method also includes determining a maximum size of a subset of the data items stored in each of at least two servers among the plurality of servers, calculating a logarithm function based on the maximum size of the subset of the data items in each of the two servers, and calculating a highest number of sequences of communications between the two servers such that when the logarithmic function is iteratively applied, a value of the logarithmic function remains smaller than one. A protocol is then generated between the two servers for performing an intersection operation using the highest number of sequences calculated.
    Type: Grant
    Filed: March 8, 2016
    Date of Patent: September 6, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: David P. Woodruff, Grigory Yaroslavtsev
  • Patent number: 9438705
    Abstract: Embodiments relate to data processing. A method includes analyzing a plurality of data items in a relational database, where different portions of the data items are stored in a plurality of servers. The method also includes determining a maximum size of a subset of the data items stored in each of at least two servers among the plurality of servers, calculating a logarithm function based on the maximum size of the subset of the data items in each of the two servers, and calculating a highest number of sequences of communications between the two servers such that when the logarithmic function is iteratively applied, a value of the logarithmic function remains smaller than one. A protocol is then generated between the two servers for performing an intersection operation using the highest number of sequences calculated.
    Type: Grant
    Filed: December 16, 2013
    Date of Patent: September 6, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: David P. Woodruff, Grigory Yaroslavtsev
  • Publication number: 20160173660
    Abstract: Embodiments relate to data processing. A method includes analyzing a plurality of data items in a relational database, where different portions of the data items are stored in a plurality of servers. The method also includes determining a maximum size of a subset of the data items stored in each of at least two servers among the plurality of servers, calculating a logarithm function based on the maximum size of the subset of the data items in each of the two servers, and calculating a highest number of sequences of communications between the two servers such that when the logarithmic function is iteratively applied, a value of the logarithmic function remains smaller than one. A protocol is then generated between the two servers for performing an intersection operation using the highest number of sequences calculated.
    Type: Application
    Filed: March 8, 2016
    Publication date: June 16, 2016
    Inventors: David P. Woodruff, Grigory Yaroslavtsev
  • Publication number: 20160117285
    Abstract: One embodiments is a computer-implemented method for finding a CUR decomposition. The method includes constructing, by a computer processor, a matrix C based on a matrix A. A matrix R is constructed based on the matrix A and the matrix C. A matrix U is constructed based on the matrices A, C, and R. The matrices C, U, and R provide a CUR decomposition of the matrix A. The construction of the matrices C, U, and R provide at least one of an input-sparsity-time CUR and a deterministic CUR.
    Type: Application
    Filed: October 28, 2014
    Publication date: April 28, 2016
    Inventors: Christos Boutsidis, David P. Woodruff
  • Patent number: 9262485
    Abstract: Embodiments relate to identifying a sketching matrix used by a linear sketch. Aspects include receiving an initial output of the linear sketch, generating a query vector and inputting the query vector into the linear sketch. Aspects further include receiving an revised output of the linear sketch based on inputting the query vector and iteratively repeating the steps of generating the query vector, inputting the query vector into the linear sketch, and receiving an revised output of the linear sketch based on inputting the query vector until the sketching matrix used by the linear sketch can be identified.
    Type: Grant
    Filed: August 13, 2013
    Date of Patent: February 16, 2016
    Assignee: International Business Machines Corporation
    Inventors: Moritz Hardt, David P. Woodruff
  • Publication number: 20160034201
    Abstract: A protocol is employed to estimate duplication of data in a storage system. This estimate is employed as a factor of enabling de-duplication, and if de-duplication is enabled, the data sets which will be subject to the de-duplication. The protocol includes a measurement procedure and an execution procedure. The measurement procedure characterizes data duplication in part of the data on the storage system, and the execution procedure use the characterization to adjust selection of which data sets are subject to de-duplication.
    Type: Application
    Filed: August 4, 2014
    Publication date: February 4, 2016
    Applicant: International Business Machines Corporation
    Inventors: David D. Chambliss, M. Corneliu Constantinescu, Joseph S. Glider, Danny Harnik, Maohua Lu, David P. Woodruff
  • Patent number: 9218389
    Abstract: A mechanism is provided for computing the frequency packets in network devices. Respective packets are associated with entities in a vector, where each of the entities is mapped to corresponding ones of the respective packets, and the entities correspond to computers. Upon a network device receiving the respective packets, a count is individually increased for the respective packets in the vector respectively mapped to the entities, and computing a matrix vector product of a matrix A and the vector. The matrix A is a product of at least a first matrix and a second matrix. The first matrix includes rows and columns where each of the rows has a single random location with a one value and remaining locations with zero values. The matrix vector product is transmitted to a centralized computer for aggregating with other matrix vector products.
    Type: Grant
    Filed: September 10, 2013
    Date of Patent: December 22, 2015
    Assignee: International Business Machines Corporation
    Inventor: David P. Woodruff
  • Publication number: 20150331835
    Abstract: Embodiments of the invention relate to sketching for M-estimators for performing regression. One embodiment includes providing one or more sets of input data. A matrix A and a vector b are generated using the input data. A processor device is used for processing the matrix A and the vector b based on a randomized sketching matrix S. A vector x that minimizes a normalized measure function is determined based on the matrix A and the vector b. A relationship between the input data is determined based on the vector x.
    Type: Application
    Filed: May 15, 2014
    Publication date: November 19, 2015
    Applicant: International Business Machines Corporation
    Inventors: Haim Avron, Kenneth L. Clarkson, Huy Le Nguyen, David P. Woodruff
  • Publication number: 20150317282
    Abstract: A system, method and computer program product for quickly and approximately solving structured regression problems. In one aspect, the system, method and computer program product are applied to problems that arise naturally in various statistical modeling settings—when the design matrix is a Vandermonde matrix or a sequence of such matrices. Using the Vandermonde matrix structure further accelerates the solution of the regression problem, achieving running times that are faster than “input sparsity”. The modeling framework speedup benefits of randomized regression for solving structured regression problems.
    Type: Application
    Filed: July 17, 2014
    Publication date: November 5, 2015
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Haim Avron, Vikas Sindhwani, David P. Woodruff
  • Patent number: 9158807
    Abstract: A mechanism is provided for computing the frequency packets in network devices. Respective packets are associated with entities in a vector, where each of the entities is mapped to corresponding ones of the respective packets, and the entities correspond to computers. Upon a network device receiving the respective packets, a count is individually increased for the respective packets in the vector respectively mapped to the entities, and computing a matrix vector product of a matrix A and the vector. The matrix A is a product of at least a first matrix and a second matrix. The first matrix includes rows and columns where each of the rows has a single random location with a one value and remaining locations with zero values. The matrix vector product is transmitted to a centralized computer for aggregating with other matrix vector products.
    Type: Grant
    Filed: March 8, 2013
    Date of Patent: October 13, 2015
    Assignee: International Business Machines Corporation
    Inventor: David P. Woodruff
  • Publication number: 20150286612
    Abstract: Embodiments relate to methodologies and program product is provided for conducting regression analysis. In one embodiment the method includes obtaining data related to a statistical process including a plurality of points in a plurality of dimensions and organizing the plurality of points and the plurality of dimensions in a matrix. The method also includes calculating a vector of a particular measurement such that the measurement equal the number of the plurality of points and calculating a least absolute deviation by determining the number of non-zero entries provided in the matrix.
    Type: Application
    Filed: April 7, 2014
    Publication date: October 8, 2015
    Applicant: International Business Machines Corporation
    Inventors: David P. Woodruff, Qin Zhang