Patents by Inventor David P. Woodruff
David P. Woodruff has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11163774Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.Type: GrantFiled: May 6, 2019Date of Patent: November 2, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Kenneth L. Clarkson, David P. Woodruff
-
Publication number: 20190258640Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.Type: ApplicationFiled: May 6, 2019Publication date: August 22, 2019Inventors: KENNETH L. CLARKSON, DAVID P. WOODRUFF
-
Patent number: 10346405Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.Type: GrantFiled: October 17, 2016Date of Patent: July 9, 2019Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Kenneth L. Clarkson, David P. Woodruff
-
Patent number: 10007643Abstract: Embodiments relate to methodologies and program product is provided for conducting regression analysis. In one embodiment the method includes obtaining data related to a statistical process including a plurality of points in a plurality of dimensions and organizing the plurality of points and the plurality of dimensions in a matrix. The method also includes calculating a vector of a particular measurement such that the measurement equal the number of the plurality of points and calculating a least absolute deviation by determining the number of non-zero entries provided in the matrix.Type: GrantFiled: April 7, 2014Date of Patent: June 26, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: David P. Woodruff, Qin Zhang
-
Patent number: 9971735Abstract: A system for retrieving stored data includes memory and a processor. The memory stores a first matrix, A, having dimensions n×d, a first sparse matrix, R, and a second sparse matrix, S. The processor receives an input value, k, corresponding to a selected rank to generate a second matrix, AR, by multiplying the first matrix, A, by the first sparse matrix, R. The second matrix, AT, has dimensions n×t. The processor generates a third matrix, SA, by multiplying the second sparse matrix, S, by the first matrix, A. The third matrix, SA, has dimensions t?×n, and the processor generates a fourth matrix, (SAR)?, by calculating a Moore-Penrose pseudo-inverse of a matrix, (SAR), and approximating the first matrix, A by generating a fifth matrix, Ã, the fifth matrix defined as AR×(SAR)?×SA.Type: GrantFiled: September 11, 2013Date of Patent: May 15, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Kenneth L. Clarkson, David P. Woodruff
-
Publication number: 20180107716Abstract: A lower-dimensional representation (e.g., approximation) of a dataset is determined. The lower-dimensional representation can be used, for example, to perform semantic document analysis. Given a matrix of input data points, where each entry of the matrix indicates a number of times a particular term in a set of terms appears in a particular document in a set of documents, a lower-dimensional compressed matrix is obtained from the matrix by sampling rows of the matrix based on a target rank parameter, a desired accuracy tolerance, leverage scores calculated for the rows, and/or distances from rows of the matrix to a span of the initial set of sampled rows. The compressed matrix is used to determine a similarity metric indicative of a degree of similarity between documents. The documents can then be classified into a same document cluster or different clusters based on whether the similarity metric satisfied a threshold value.Type: ApplicationFiled: October 17, 2016Publication date: April 19, 2018Inventors: Kenneth L. Clarkson, David P. Woodruff
-
Patent number: 9928214Abstract: A system, method and computer program product for quickly and approximately solving structured regression problems. In one aspect, the system, method and computer program product are applied to problems that arise naturally in various statistical modeling settings—when the design matrix is a Vandermonde matrix or a sequence of such matrices. Using the Vandermonde matrix structure further accelerates the solution of the regression problem, achieving running times that are faster than “input sparsity”. The modeling framework speedup benefits of randomized regression for solving structured regression problems.Type: GrantFiled: July 17, 2014Date of Patent: March 27, 2018Assignee: International Business Machines CorporationInventors: Haim Avron, Vikas Sindhwani, David P. Woodruff
-
Patent number: 9760537Abstract: One embodiments is a computer-implemented method for finding a CUR decomposition. The method includes constructing, by a computer processor, a matrix C based on a matrix A. A matrix R is constructed based on the matrix A and the matrix C. A matrix U is constructed based on the matrices A, C, and R. The matrices C, U, and R provide a CUR decomposition of the matrix A. The construction of the matrices C, U, and R provide at least one of an input-sparsity-time CUR and a deterministic CUR.Type: GrantFiled: October 28, 2014Date of Patent: September 12, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Christos Boutsidis, David P. Woodruff
-
Patent number: 9658987Abstract: Embodiments of the invention relate to sketching for M-estimators for performing regression. One embodiment includes providing one or more sets of input data. A matrix A and a vector b are generated using the input data. A processor device is used for processing the matrix A and the vector b based on a randomized sketching matrix S. A vector x that minimizes a normalized measure function is determined based on the matrix A and the vector b. A relationship between the input data is determined based on the vector x.Type: GrantFiled: May 15, 2014Date of Patent: May 23, 2017Assignee: International Business Machines CorporationInventors: Haim Avron, Kenneth L. Clarkson, Huy Le Nguyen, David P. Woodruff
-
Patent number: 9438704Abstract: Embodiments relate to data processing. A method includes analyzing a plurality of data items in a relational database, where different portions of the data items are stored in a plurality of servers. The method also includes determining a maximum size of a subset of the data items stored in each of at least two servers among the plurality of servers, calculating a logarithm function based on the maximum size of the subset of the data items in each of the two servers, and calculating a highest number of sequences of communications between the two servers such that when the logarithmic function is iteratively applied, a value of the logarithmic function remains smaller than one. A protocol is then generated between the two servers for performing an intersection operation using the highest number of sequences calculated.Type: GrantFiled: March 8, 2016Date of Patent: September 6, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: David P. Woodruff, Grigory Yaroslavtsev
-
Patent number: 9438705Abstract: Embodiments relate to data processing. A method includes analyzing a plurality of data items in a relational database, where different portions of the data items are stored in a plurality of servers. The method also includes determining a maximum size of a subset of the data items stored in each of at least two servers among the plurality of servers, calculating a logarithm function based on the maximum size of the subset of the data items in each of the two servers, and calculating a highest number of sequences of communications between the two servers such that when the logarithmic function is iteratively applied, a value of the logarithmic function remains smaller than one. A protocol is then generated between the two servers for performing an intersection operation using the highest number of sequences calculated.Type: GrantFiled: December 16, 2013Date of Patent: September 6, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: David P. Woodruff, Grigory Yaroslavtsev
-
Publication number: 20160173660Abstract: Embodiments relate to data processing. A method includes analyzing a plurality of data items in a relational database, where different portions of the data items are stored in a plurality of servers. The method also includes determining a maximum size of a subset of the data items stored in each of at least two servers among the plurality of servers, calculating a logarithm function based on the maximum size of the subset of the data items in each of the two servers, and calculating a highest number of sequences of communications between the two servers such that when the logarithmic function is iteratively applied, a value of the logarithmic function remains smaller than one. A protocol is then generated between the two servers for performing an intersection operation using the highest number of sequences calculated.Type: ApplicationFiled: March 8, 2016Publication date: June 16, 2016Inventors: David P. Woodruff, Grigory Yaroslavtsev
-
Publication number: 20160117285Abstract: One embodiments is a computer-implemented method for finding a CUR decomposition. The method includes constructing, by a computer processor, a matrix C based on a matrix A. A matrix R is constructed based on the matrix A and the matrix C. A matrix U is constructed based on the matrices A, C, and R. The matrices C, U, and R provide a CUR decomposition of the matrix A. The construction of the matrices C, U, and R provide at least one of an input-sparsity-time CUR and a deterministic CUR.Type: ApplicationFiled: October 28, 2014Publication date: April 28, 2016Inventors: Christos Boutsidis, David P. Woodruff
-
Patent number: 9262485Abstract: Embodiments relate to identifying a sketching matrix used by a linear sketch. Aspects include receiving an initial output of the linear sketch, generating a query vector and inputting the query vector into the linear sketch. Aspects further include receiving an revised output of the linear sketch based on inputting the query vector and iteratively repeating the steps of generating the query vector, inputting the query vector into the linear sketch, and receiving an revised output of the linear sketch based on inputting the query vector until the sketching matrix used by the linear sketch can be identified.Type: GrantFiled: August 13, 2013Date of Patent: February 16, 2016Assignee: International Business Machines CorporationInventors: Moritz Hardt, David P. Woodruff
-
Publication number: 20160034201Abstract: A protocol is employed to estimate duplication of data in a storage system. This estimate is employed as a factor of enabling de-duplication, and if de-duplication is enabled, the data sets which will be subject to the de-duplication. The protocol includes a measurement procedure and an execution procedure. The measurement procedure characterizes data duplication in part of the data on the storage system, and the execution procedure use the characterization to adjust selection of which data sets are subject to de-duplication.Type: ApplicationFiled: August 4, 2014Publication date: February 4, 2016Applicant: International Business Machines CorporationInventors: David D. Chambliss, M. Corneliu Constantinescu, Joseph S. Glider, Danny Harnik, Maohua Lu, David P. Woodruff
-
Patent number: 9218389Abstract: A mechanism is provided for computing the frequency packets in network devices. Respective packets are associated with entities in a vector, where each of the entities is mapped to corresponding ones of the respective packets, and the entities correspond to computers. Upon a network device receiving the respective packets, a count is individually increased for the respective packets in the vector respectively mapped to the entities, and computing a matrix vector product of a matrix A and the vector. The matrix A is a product of at least a first matrix and a second matrix. The first matrix includes rows and columns where each of the rows has a single random location with a one value and remaining locations with zero values. The matrix vector product is transmitted to a centralized computer for aggregating with other matrix vector products.Type: GrantFiled: September 10, 2013Date of Patent: December 22, 2015Assignee: International Business Machines CorporationInventor: David P. Woodruff
-
Publication number: 20150331835Abstract: Embodiments of the invention relate to sketching for M-estimators for performing regression. One embodiment includes providing one or more sets of input data. A matrix A and a vector b are generated using the input data. A processor device is used for processing the matrix A and the vector b based on a randomized sketching matrix S. A vector x that minimizes a normalized measure function is determined based on the matrix A and the vector b. A relationship between the input data is determined based on the vector x.Type: ApplicationFiled: May 15, 2014Publication date: November 19, 2015Applicant: International Business Machines CorporationInventors: Haim Avron, Kenneth L. Clarkson, Huy Le Nguyen, David P. Woodruff
-
Publication number: 20150317282Abstract: A system, method and computer program product for quickly and approximately solving structured regression problems. In one aspect, the system, method and computer program product are applied to problems that arise naturally in various statistical modeling settings—when the design matrix is a Vandermonde matrix or a sequence of such matrices. Using the Vandermonde matrix structure further accelerates the solution of the regression problem, achieving running times that are faster than “input sparsity”. The modeling framework speedup benefits of randomized regression for solving structured regression problems.Type: ApplicationFiled: July 17, 2014Publication date: November 5, 2015Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Haim Avron, Vikas Sindhwani, David P. Woodruff
-
Patent number: 9158807Abstract: A mechanism is provided for computing the frequency packets in network devices. Respective packets are associated with entities in a vector, where each of the entities is mapped to corresponding ones of the respective packets, and the entities correspond to computers. Upon a network device receiving the respective packets, a count is individually increased for the respective packets in the vector respectively mapped to the entities, and computing a matrix vector product of a matrix A and the vector. The matrix A is a product of at least a first matrix and a second matrix. The first matrix includes rows and columns where each of the rows has a single random location with a one value and remaining locations with zero values. The matrix vector product is transmitted to a centralized computer for aggregating with other matrix vector products.Type: GrantFiled: March 8, 2013Date of Patent: October 13, 2015Assignee: International Business Machines CorporationInventor: David P. Woodruff
-
Publication number: 20150286612Abstract: Embodiments relate to methodologies and program product is provided for conducting regression analysis. In one embodiment the method includes obtaining data related to a statistical process including a plurality of points in a plurality of dimensions and organizing the plurality of points and the plurality of dimensions in a matrix. The method also includes calculating a vector of a particular measurement such that the measurement equal the number of the plurality of points and calculating a least absolute deviation by determining the number of non-zero entries provided in the matrix.Type: ApplicationFiled: April 7, 2014Publication date: October 8, 2015Applicant: International Business Machines CorporationInventors: David P. Woodruff, Qin Zhang