Patents by Inventor Cory Reina

Cory Reina has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Scalable system for clustering of large databases having mixed data attributes

Patent number: 6581058

Abstract: One exemplary embodiment of a scalable clustering algorithm accesses a database of records having attributes or data fields of both enumerated discrete and ordered values and brings a portion of the data records into a rapid access memory. A cluster model for the data includes a table of probabilities for the enumerated, discrete data fields of the data records. The cluster model for data fields that are ordered comprises a mean and spread of the cluster. The cluster model is updated from the database records brought into the rapid access memory. At least some of the database records in the rapid access memory are summarized and stored within the rapid access memory. A criteria is then evaluated to determine if further data should be accessed from the database to further cluster data records in the database. Based on the evaluating step, additional database records in the database are accessed and brought into the rapid access memory for further updating of the cluster model.

Type: Grant

Filed: January 31, 2001

Date of Patent: June 17, 2003

Assignee: Microsoft Corporation

Inventors: Usama Fayyad, Paul S. Bradley, Cory A. Reina
Scalable system for clustering of large databases

Patent number: 6374251

Abstract: A data mining system for use in finding clusters of data items in a database or any other data storage medium. The clusters are used in categorizing the data in the database into K different clusters within each of M models. An initial set of estimates (or guesses) of the parameters of each model to be explored (e.g. centriods in K-means), of each cluster are provided from some source. Then a portion of the data in the database is read from a storage medium and brought into a rapid access memory buffer whose size is determined by the user or operating system depending on available memory resources. Data contained in the data buffer is used to update the original guesses at the parameters of the model in each of the K clusters over all M models. Some of the data belonging to a cluster is summarized or compressed and stored as a reduced form of the data representing sufficient statistics of the data. More data is accessed from the database and the models are updated.

Type: Grant

Filed: March 17, 1998

Date of Patent: April 16, 2002

Assignee: Microsoft Corporation

Inventors: Usama Fayyad, Paul S. Bradley, Cory Reina
Scalable system for expectation maximization clustering of large databases

Patent number: 6263337

Abstract: In one exemplary embodiment the invention provides a data mining system for use in finding clusters of data items in a database or any other data storage medium. Before the data evaluation begins a choice is made of the number M of models to be explored, and the number of clusters (K) of clusters within each of the M models. The clusters are used in categorizing the data in the database into K different clusters within each model. An initial set of estimates for a data distribution of each model to be explored is provided. Then a portion of the data in the database is read from a storage medium and brought into a rapid access memory buffer whose size is determined by the user or operating system depending on available memory resources. Data contained in the data buffer is used to update the original model data distributions in each of the K clusters over all M models.

Type: Grant

Filed: May 22, 1998

Date of Patent: July 17, 2001

Assignee: Microsoft Corporation

Inventors: Usama Fayyad, Paul S. Bradley, Cory Reina
Scalable system for K-means clustering of large databases

Patent number: 6012058

Abstract: In one exemplary embodiment the invention provides a data mining system for use in evaluating data in a database. Before the data evaulation begins a choice is made of a cluster number K for use in categorizing the data in the database into K different clusters and initial guesses at the means, or centriods, of each cluster are provided. Then a portion of the data in the database is read from a storage medium and brought into a rapid access memory. Data contained in the data portion is used to update the original guesses at the centroids of each of the K clusters. Some of the data belonging to a cluster is summarized or compressed and stored as a summarization of the data. More data is accessed from the database and assigned to a cluster. An updated mean for the clusters is determined from the summarized data and the newly acquired data. A stopping criteria is evaluated to determine if further data should be accessed from the database.

Type: Grant

Filed: March 17, 1998

Date of Patent: January 4, 2000

Assignee: Microsoft Corporation

Inventors: Usama Fayyad, Paul S. Bradley, Cory Reina

Scalable system for clustering of large databases having mixed data attributes

Scalable system for clustering of large databases

Scalable system for expectation maximization clustering of large databases

Scalable system for K-means clustering of large databases