Patents by Inventor Paul S. Bradley

Paul S. Bradley has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7333998
    Abstract: A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for the segments, scores the selected segments through the similarity measures, and then forms and visually depicts hierarchical organizations of those selected clusters. The system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization, through elimination of unnecessary hierarchical levels and inter-nodal links, based on similarity measures of segments or segment groups. Attribute/value data that tends to meaningfully characterize each segment is also scored, rank ordered based on normalized scores, and then graphically displayed.
    Type: Grant
    Filed: March 24, 2004
    Date of Patent: February 19, 2008
    Assignee: Microsoft Corporation
    Inventors: David E. Heckerman, Paul S. Bradley, David M. Chickering, Christopher A. Meek
  • Patent number: 7246125
    Abstract: A computer data processing system. A method for clustering data in a database comprising providing a database having a number of data records having both discrete and continuous attributes. Grouping together data records from the database which have specified discrete attribute configurations. Clustering data records having the same or similar specified discrete attribute configuration based on the continuous attributes to produce an intermediate set of data clusters. And, merging together clusters from the intermediate set of data clusters to produce a clustering model.
    Type: Grant
    Filed: June 21, 2001
    Date of Patent: July 17, 2007
    Assignee: Microsoft Corporation
    Inventors: Paul S. Bradley, Markus Wawryniuk
  • Publication number: 20040181554
    Abstract: A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for the segments, scores the selected segments through the similarity measures, and then forms and visually depicts hierarchical organizations of those selected clusters. The.system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization, through elimination of unnecessary hierarchical levels and inter-nodal links, based on similarity measures of segments or segment groups. Attribute/value data that tends to meaningfully characterize each segment is also scored, rank ordered based on normalized scores, and then graphically displayed.
    Type: Application
    Filed: March 24, 2004
    Publication date: September 16, 2004
    Inventors: David E. Heckerman, Paul S. Bradley, David M. Chickering, Christopher A. Meek
  • Patent number: 6742003
    Abstract: A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for the segments, scores the selected segments through the similarity measures, and then forms and visually depicts hierarchical organizations of those selected clusters. The system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization, through elimination of unnecessary hierarchical levels and inter-nodal links, based on similarity measures of segments or segment groups. Attribute/value data that tends to meaningfully characterize each segment is also scored, rank ordered based on normalized scores, and then graphically displayed.
    Type: Grant
    Filed: April 30, 2001
    Date of Patent: May 25, 2004
    Assignee: Microsoft Corporation
    Inventors: David E. Heckerman, Paul S. Bradley, David M. Chickering, Christopher A. Meek
  • Patent number: 6735589
    Abstract: A dimensionality reduction method of generating a reduced dimension matrix data set Dnew of dimension m×k from an original matrix data set D of dimension m×k wherein n>k. The method selects a subset of k columns from a set of n columns in the original data set D where the m rows correspond to observations Ri where i=1, . . . , m and the n columns correspond to attributes Aj where j=1, . . . , n and dij is the data value associated with observation Ri and attribute Aj. The data values in the reduced data set Dnew for each of the selected k attributes is identical to the data values of the corresponding attributes in the original data set.
    Type: Grant
    Filed: June 7, 2001
    Date of Patent: May 11, 2004
    Assignee: Microsoft Corporation
    Inventors: Paul S. Bradley, Demetrios Achlioptas, Christos Faloutsos, Usama Fayyad
  • Publication number: 20040010497
    Abstract: In a computer data processing system, method and apparatus for clustering data in a database. A database having a number of data records having both discrete and continuous attributes is stored on one or more storage media which may be connected by a network. The records in the database are scanned so that data records which have the same discrete attribute configuration can be tabulated. A first set of configurations is determined wherein the number of data records of each configuration of said first set of configurations exceeds a threshold number of data records. Data records that do not belong to one of the first set of configurations are added to or tabulated with a configuration within said first set of configurations to produce a subset of records from the database belonging to configurations in the first set of configurations.
    Type: Application
    Filed: June 21, 2001
    Publication date: January 15, 2004
    Applicant: Microsoft Corporation
    Inventors: Paul S. Bradley, Markus Wawryniuk
  • Patent number: 6643645
    Abstract: Retrofitting recommender systems, so that they can scale to large data, is disclosed. The principal notion is to reduce data requirements of existing recommender engines by performing a type of data reduction that minimizes the loss of information given the engine. The reductions covered in this invention are designed to be easily implemented on a database system, and are intended to have minimal impact on an existing implementation of a recommender system. In one embodiment, a method repeats reducing the data by a number of records, until an accuracy threshold or a performance requirement is met. If the accuracy threshold is met first, the method repeats removing a highest-frequency dimension from the data, until the performance requirement is also met. The reduced data is provided to the recommender system, which generates predictions based on the reduced data, and a query.
    Type: Grant
    Filed: February 8, 2000
    Date of Patent: November 4, 2003
    Assignee: Microsoft Corporation
    Inventors: Usama M. Fayyad, Paul S. Bradley, Bassel Y. Ojjeh
  • Patent number: 6581058
    Abstract: One exemplary embodiment of a scalable clustering algorithm accesses a database of records having attributes or data fields of both enumerated discrete and ordered values and brings a portion of the data records into a rapid access memory. A cluster model for the data includes a table of probabilities for the enumerated, discrete data fields of the data records. The cluster model for data fields that are ordered comprises a mean and spread of the cluster. The cluster model is updated from the database records brought into the rapid access memory. At least some of the database records in the rapid access memory are summarized and stored within the rapid access memory. A criteria is then evaluated to determine if further data should be accessed from the database to further cluster data records in the database. Based on the evaluating step, additional database records in the database are accessed and brought into the rapid access memory for further updating of the cluster model.
    Type: Grant
    Filed: January 31, 2001
    Date of Patent: June 17, 2003
    Assignee: Microsoft Corporation
    Inventors: Usama Fayyad, Paul S. Bradley, Cory A. Reina
  • Patent number: 6567936
    Abstract: A generalization of frequent item sets to error-tolerant frequent item sets (ETF) is disclosed, together with its application in data clustering using error-tolerant frequent item sets to either build clusters or as an initialization technique for standard clustering algorithms. Efficient feasible computational algorithms for computing ETF's from very large databases is presented. In one embodiment, a method determines a plurality of weak ETF's, which are strongly tolerant of errors, and determines a plurality of strong ETF's therefrom, which are less tolerant of errors. The resulting clusters can be used as an initial model for a standard clustering approach, or may themselves be used as the end clusters. In one embodiment, the data covered by the strong clusters is removed from the data, and the process is repeated, until no more weak clusters can be found.
    Type: Grant
    Filed: February 8, 2000
    Date of Patent: May 20, 2003
    Assignee: Microsoft Corporation
    Inventors: Cheng Yang, Usama M. Fayyad, Paul S. Bradley
  • Publication number: 20030028541
    Abstract: A dimensionality reduction method of generating a reduced dimension matrix data set Dnew of dimension m×k from an original matrix data set D of dimension m×k wherein n>k. The method selects a subset of k columns from a set of n columns in the original data set D where the m rows correspond to observations Ri where i=1, . . . , m and the n columns correspond to attributes Aj where j=1, . . . , n and dij is the data value associated with observation Ri and attribute Aj. The data values in the reduced data set Dnew for each of the selected k attributes is identical to the data values of the corresponding attributes in the original data set.
    Type: Application
    Filed: June 7, 2001
    Publication date: February 6, 2003
    Applicant: Microsoft Corporation
    Inventors: Paul S. Bradley, Demetrios Achlioptas, Christos Faloutsos, Usama Fayyad
  • Publication number: 20030018652
    Abstract: A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for the segments, scores the selected segments through the similarity measures, and then forms and visually depicts hierarchical organizations of those selected clusters. The system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization, through elimination of unnecessary hierarchical levels and inter-nodal links, based on similarity measures of segments or segment groups. Attribute/value data that tends to meaningfully characterize each segment is also scored, rank ordered based on normalized scores, and then graphically displayed.
    Type: Application
    Filed: April 30, 2001
    Publication date: January 23, 2003
    Applicant: Microsoft Corporation
    Inventors: David E. Heckerman, Paul S. Bradley, David M. Chickering, Christopher A. Meek
  • Patent number: 6490582
    Abstract: Iterative validation for efficiently determining error-tolerant frequent itemsets is disclosed. A description of the application of error-tolerant frequent itemsets to efficiently determining clusters as well as initializing clustering algorithms are also given. In one embodiment, a method determines a sample set of error-tolerant frequent itemsets (ETF's) within a uniform random sample of data within a database. This sample set of ETF's is independently validated, so that, for example, spurious ETF's and spurious dimensions within the ETF's can be removed. The validated sample set of ETF's, is added to the set of ETF's for the database. This process is repeated with additional uniform samples that are mutually exclusive from prior uniform samples, to continue building the database's set of ETF's, until no new sample sets can be found.
    Type: Grant
    Filed: February 8, 2000
    Date of Patent: December 3, 2002
    Assignee: Microsoft Corporation
    Inventors: Usama M. Fayyad, Cheng Yang, Paul S. Bradley
  • Patent number: 6449612
    Abstract: In one exemplary embodiment the invention provides a data mining system for use in finding cluster of data items in a database or any other data storage medium. A portion of the data in the database is read from a storage medium and brought into a rapid access memory buffer whose size is determined by the user or operating system depending on available memory resources. Data contained in the data buffer is used to update the original model data distributions in each of the K clusters in a clustering model. Some of the data belonging to a cluster is summarized or compressed and stored as a reduced form of the data representing sufficient statistics of the data. More data is accessed from the database and the models are updated. An updated set of parameters for the clusters is determined from the summarized data (sufficient statistics) and the newly acquired data. Stopping criteria are evaluated to determine if further data should be accessed from the database.
    Type: Grant
    Filed: June 30, 2000
    Date of Patent: September 10, 2002
    Assignee: Microsoft Corporation
    Inventors: Paul S. Bradley, Usama Fayyad
  • Patent number: 6374251
    Abstract: A data mining system for use in finding clusters of data items in a database or any other data storage medium. The clusters are used in categorizing the data in the database into K different clusters within each of M models. An initial set of estimates (or guesses) of the parameters of each model to be explored (e.g. centriods in K-means), of each cluster are provided from some source. Then a portion of the data in the database is read from a storage medium and brought into a rapid access memory buffer whose size is determined by the user or operating system depending on available memory resources. Data contained in the data buffer is used to update the original guesses at the parameters of the model in each of the K clusters over all M models. Some of the data belonging to a cluster is summarized or compressed and stored as a reduced form of the data representing sufficient statistics of the data. More data is accessed from the database and the models are updated.
    Type: Grant
    Filed: March 17, 1998
    Date of Patent: April 16, 2002
    Assignee: Microsoft Corporation
    Inventors: Usama Fayyad, Paul S. Bradley, Cory Reina
  • Patent number: 6263337
    Abstract: In one exemplary embodiment the invention provides a data mining system for use in finding clusters of data items in a database or any other data storage medium. Before the data evaluation begins a choice is made of the number M of models to be explored, and the number of clusters (K) of clusters within each of the M models. The clusters are used in categorizing the data in the database into K different clusters within each model. An initial set of estimates for a data distribution of each model to be explored is provided. Then a portion of the data in the database is read from a storage medium and brought into a rapid access memory buffer whose size is determined by the user or operating system depending on available memory resources. Data contained in the data buffer is used to update the original model data distributions in each of the K clusters over all M models.
    Type: Grant
    Filed: May 22, 1998
    Date of Patent: July 17, 2001
    Assignee: Microsoft Corporation
    Inventors: Usama Fayyad, Paul S. Bradley, Cory Reina
  • Patent number: 6115708
    Abstract: As an optimization problem, clustering data (unsupervised learning) is known to be a difficult problem. Most practical approaches use a heuristic, typically gradient-descent, algorithm to search for a solution in the huge space of possible solutions. Such methods are by definition sensitive to starting points. It has been well-known that clustering algorithms are extremely sensitive to initial conditions. Most methods for guessing an initial solution simply make random guesses. In this paper we present a method that takes an initial condition and efficiently produces a refined starting condition. The method is applicable to a wide class of clustering algorithms for discrete and continuous data. In this paper we demonstrate how this method is applied to the popular K-means clustering algorithm and show that refined initial starting points indeed lead to improved solutions. The technique can be used as an initializer for other clustering solutions.
    Type: Grant
    Filed: March 4, 1998
    Date of Patent: September 5, 2000
    Assignee: Microsoft Corporation
    Inventors: Usama Fayyad, Paul S. Bradley
  • Patent number: 6012058
    Abstract: In one exemplary embodiment the invention provides a data mining system for use in evaluating data in a database. Before the data evaulation begins a choice is made of a cluster number K for use in categorizing the data in the database into K different clusters and initial guesses at the means, or centriods, of each cluster are provided. Then a portion of the data in the database is read from a storage medium and brought into a rapid access memory. Data contained in the data portion is used to update the original guesses at the centroids of each of the K clusters. Some of the data belonging to a cluster is summarized or compressed and stored as a summarization of the data. More data is accessed from the database and assigned to a cluster. An updated mean for the clusters is determined from the summarized data and the newly acquired data. A stopping criteria is evaluated to determine if further data should be accessed from the database.
    Type: Grant
    Filed: March 17, 1998
    Date of Patent: January 4, 2000
    Assignee: Microsoft Corporation
    Inventors: Usama Fayyad, Paul S. Bradley, Cory Reina