Patents by Inventor Zhaohui Tang
Zhaohui Tang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 7062408Abstract: Systems and methods are provided for producing a mining model accuracy display that depicts the model's accuracy at predicting a state for a multiple-state variable. The model predicts a state and provides an associated probability for each case. Points are graphed such that one coordinate of the data point corresponds to a number N of cases and the other coordinate corresponds to the number of correct predictions made in the top N cases by probability.Type: GrantFiled: October 25, 2004Date of Patent: June 13, 2006Assignee: Microsoft CorporationInventors: Zhaohui Tang, Pyunchul Kim
-
Patent number: 7028036Abstract: Distribution displays for categories are provided which illuminate the distribution of continuous attributes over all cases in a category, and which provide a histogram of the population of the different states of categorical attributes. An array of such displays by attribute (in one dimension) and category (in another dimension) may be provided. Category diagram displays are also provided for visualizing the different categories, and their distributions, populations, and similarities. These are displayed through different shading of nodes and edges representing categories and the relationship between two categories, and through proximity of nodes.Type: GrantFiled: June 28, 2002Date of Patent: April 11, 2006Assignee: Microsoft CorporationInventors: David Maxwell Chickering, Zhaohui Tang, David Earl Heckerman, Robert L. Rounthwaite, Alexei V. Bocharov, Scott Conrad Oveson
-
Publication number: 20060020620Abstract: The subject disclosure pertains to extensible data mining systems, means, and methodologies. For example, a data mining system is disclosed that supports plug-in or integration of non-native mining algorithms, perhaps provided by third parties, such that they function the same as built-in algorithms. Furthermore, non-native data mining viewers may also be seamlessly integrated into the system for displaying the results of one or more algorithms including those provided by third parties as well as those built-in. Still further yet, support is provided for extending data mining languages to include user-defined functions (UDFs).Type: ApplicationFiled: June 21, 2005Publication date: January 26, 2006Applicant: Microsoft CorporationInventors: Raman Iyer, Ioan Crivat, C. MacLennan, Scott Oveson, Rong Guan, ZhaoHui Tang, Pyungchul Kim, Irina Gorbach
-
Publication number: 20060010110Abstract: A system that facilitates data mining comprises a reception component that receives command(s) in a declarative language that relate to utilizing an output of a first data mining model as an input to a second data mining model. An implementation component analyzes the received command(s) and implements the command(s) with respect to the first and second data mining models. In another aspect of the subject invention, the reception component can receive further command(s) in a declarative language with respect to causing one or more of the first and second data mining models to output a prediction, the prediction desirably generated without prediction input, the implementation component causes the one or more of the first and second data mining models to output the prediction.Type: ApplicationFiled: February 2, 2005Publication date: January 12, 2006Applicant: Microsoft CorporationInventors: Pyungchul Kim, ZhaoHui Tang, Ioan Crivat, C. MacLennan, Raman Iyer, Irina Gorbach
-
Publication number: 20060010142Abstract: The subject invention relates to systems and methods to extend the capabilities of declarative data modeling languages. In one aspect, a declarative data modeling language system is provided. The system includes a data modeling language component that generates one or more data mining models to extract predictive information from local or remote databases. A language extension component facilitates modeling capability in the data modeling language by providing a data sequence model or a time series model within the data modeling language to support various data mining applications.Type: ApplicationFiled: April 28, 2005Publication date: January 12, 2006Applicant: Microsoft CorporationInventors: Pyungchul Kim, C. MacLennan, ZhaoHui Tang
-
Publication number: 20050283357Abstract: A method for performing data mining is provided. The method includes selecting at least one data source of unstructured text. Additionally, a transformation is selected to identify a list of terms in the unstructured text. A run-time path is established to connect the data source to the transformation to load the list of terms identified into a destination database.Type: ApplicationFiled: October 21, 2004Publication date: December 22, 2005Applicant: Microsoft CorporationInventors: C. MacLennan, Hang Li, Ming Zhou, Yunbo Cao, ZhaoHui Tang
-
Publication number: 20050283459Abstract: A language schema that integrates multidimensional extensions (e.g., MDX) and data mining extensions (e.g., DMX) for performing data mining operations on data residing in OLAP cubes. The schema provides that the <source-data-query> can not only be a relational query, rather a multidimensional query formed using MDX, for example. The operations of model creation, training and prediction are described.Type: ApplicationFiled: June 22, 2004Publication date: December 22, 2005Applicant: Microsoft CorporationInventors: C. MacLennan, Pyungchul Kim, ZhaoHui Tang
-
Patent number: 6931391Abstract: Systems and methods are provided for generating prediction queries to help a user build and execute prediction queries. A user interface (UI) is provided that is easy to use and understand in connection with the generation of a prediction query for data mining. The UI can be instantiated from a variety of disparate sources that may request query building services. While prediction queries and relational queries are quite different, the UI enables prediction queries to be built in a manner that is similar to the way relational queries are built. In one embodiment, the main screen of the UI includes four main components: (1) a table column mapping area, (3) a selection grid area, (4) a query text display area and (5) a query result grid area. In one embodiment, the query text display area and the query result grid area are initially not presented to the user.Type: GrantFiled: June 21, 2002Date of Patent: August 16, 2005Assignee: Microsoft CorporationInventors: Zhaohui Tang, Rong Jian Guan, Amir M. Netz, Scott Conrad Oveson
-
Publication number: 20050144163Abstract: Systems and methods are provided for generating prediction queries to help a user build and execute prediction queries. A user interface (UI) is provided that is easy to use and understand in connection with the generation of a prediction query for data mining. The UI can be instantiated from a variety of disparate sources that may request query building services. While prediction queries and relational queries are quite different, the UI enables prediction queries to be built in a manner that is similar to the way relational queries are built. In one embodiment, the main screen of the UI includes four main components: (1) a table column mapping area, (3) a selection grid area, (4) a query text display area and (5) a query result grid area. In one embodiment, the query text display area and the query result grid area are initially not presented to the user.Type: ApplicationFiled: January 7, 2005Publication date: June 30, 2005Applicant: Microsoft CorporationInventors: Zhaohui Tang, Rong Guan, Amir Netz, Scott Oveson
-
Publication number: 20050108285Abstract: Distribution displays for categories are provided which illuminate the distribution of continuous attributes over all cases in a category, and which provide a histogram of the population of the different states of categorical attributes. An array of such displays by attribute (in one dimension) and category (in another dimension) may be provided. Category diagram displays are also provided for visualizing the different categories, and their distributions, populations, and similarities. These are displayed through different shading of nodes and edges representing categories and the relationship between two categories, and through proximity of nodes.Type: ApplicationFiled: September 30, 2004Publication date: May 19, 2005Applicant: Microsoft CorporationInventors: David Chickering, Zhaohui Tang, David Heckerman, Robert Rounthwaite, Alexei Bocharov, Scott Oveson
-
Publication number: 20050108196Abstract: Distribution displays for categories are provided which illuminate the distribution of continuous attributes over all cases in a category, and which provide a histogram of the population of the different states of categorical attributes. An array of such displays by attribute (in one dimension) and category (in another dimension) may be provided. Category diagram displays are also provided for visualizing the different categories, and their distributions, populations, and similarities. These are displayed through different shading of nodes and edges representing categories and the relationship between two categories, and through proximity of nodes.Type: ApplicationFiled: September 30, 2004Publication date: May 19, 2005Applicant: Microsoft CorporationInventors: David Chickering, Zhaohui Tang, David Heckerman, Robert Rounthwaite, Alexei Bocharov, Scott Oveson
-
Publication number: 20050108284Abstract: Distribution displays for categories are provided which illuminate the distribution of continuous attributes over all cases in a category, and which provide a histogram of the population of the different states of categorical attributes. An array of such displays by attribute (in one dimension) and category (in another dimension) may be provided. Category diagram displays are also provided for visualizing the different categories, and their distributions, populations, and similarities. These are displayed through different shading of nodes and edges representing categories and the relationship between two categories, and through proximity of nodes.Type: ApplicationFiled: September 30, 2004Publication date: May 19, 2005Applicant: Microsoft CorporationInventors: David Chickering, Zhaohui Tang, David Heckerman, Robert Rounthwaite, Alexei Bocharov, Scott Oveson
-
Publication number: 20050060331Abstract: Systems and methods are provided for producing a mining model accuracy display that depicts the model's accuracy at predicting a state for a multiple-state variable. The model predicts a state and provides an associated probability for each case. Points are graphed such that one coordinate of the data point corresponds to a number N of cases and the other coordinate corresponds to the number of correct predictions made in the top N cases by probability.Type: ApplicationFiled: October 25, 2004Publication date: March 17, 2005Applicant: Microsoft CorporationInventors: Zhaohui Tang, Pyungchul Kim
-
Publication number: 20050041027Abstract: Distribution displays for categories are provided which illuminate the distribution of continuous attributes over all cases in a category, and which provide a histogram of the population of the different states of categorical attributes. An array of such displays by attribute (in one dimension) and category (in another dimension) may be provided. Category diagram displays are also provided for visualizing the different categories, and their distributions, populations, and similarities. These are displayed through different shading of nodes and edges representing categories and the relationship between two categories, and through proximity of nodes.Type: ApplicationFiled: September 30, 2004Publication date: February 24, 2005Applicant: Microsoft CorporationInventors: David Chickering, Zhaohui Tang, David Heckerman, Robert Rounthwaite, Alexei Bocharov, Scott Oveson
-
Publication number: 20050027478Abstract: Systems and methods are provided for producing a mining model accuracy display that depicts the model's accuracy at predicting a state for a multiple-state variable. The model predicts a state and provides an associated probability for each case. Points are graphed such that one coordinate of the data point corresponds to a number N of cases and the other coordinate corresponds to the number of correct predictions made in the top N cases by probability.Type: ApplicationFiled: September 1, 2004Publication date: February 3, 2005Applicant: Microsoft CorporationInventors: Zhaohui Tang, Pyungchul Kim
-
Publication number: 20050021489Abstract: A mining structure is created which contains processed data from a data set. This data may be used to train one or more models. In addition to the selection of data to be used by model from data set, processing parameters are set, in one embodiment. For example, the discretization of a continuous variable into buckets, the number of buckets, and/or the sub-range corresponding to each bucket is set when the mining structure is created. The mining structure is processed, which causes the processing and storage of data from data set in the mining structure. After processing, the mining structure can be used by one or more models.Type: ApplicationFiled: July 22, 2003Publication date: January 27, 2005Inventors: C. MacLennan, Zhaohui Tang, Pyungchul Kim, Raman Iyer
-
Publication number: 20050021482Abstract: A drill-through feature is provided which provides a universal drill-through to mining model source data from a trained mining model. In order for a user or application to obtain model content information on a given node of a model, a universal function is provided whereby the user specifies the node for a model and data set, and the cases underlying that node for that model and data set are returned. A sampling of underlying cases may be provided, where only a sampling of the cases represented in the node is requested.Type: ApplicationFiled: June 30, 2003Publication date: January 27, 2005Inventors: Pyungchul Kim, C. MacLennan, Zhaohui Tang, Raman Iyer
-
Patent number: 6810357Abstract: Systems and methods are provided for producing a mining model accuracy display that depicts the model's accuracy at predicting a state for a multiple-state variable. The model predicts a state and provides an associated probability for each case. Points are graphed such that one coordinate of the data point corresponds to a number N of cases and the other coordinate corresponds to the number of correct predictions made in the top N cases by probability.Type: GrantFiled: June 28, 2002Date of Patent: October 26, 2004Assignee: Microsoft CorporationInventors: Zhaohui Tang, Pyungchul Kim
-
Publication number: 20040073528Abstract: The present invention relates to a system and methodology to generate and provide a lift chart to determine accuracy of one or more models that predict continuous variable data. Systems and processes are provided that process continuous variable prediction data in accordance with various analytical techniques. The processed data is then formatted for display, wherein model performance can then be determined by comparisons between models and/or by comparisons to idealized model performance. In one aspect, a system is provided that generates a continuous variable prediction lift chart. The system includes an analyzer that receives data from one or more models and a continuous variable test data set, wherein the formatter then generates a lift chart based on the analyzed models and the continuous variable test data set.Type: ApplicationFiled: October 15, 2002Publication date: April 15, 2004Inventors: Zhaohui Tang, David E. Heckerman, David M. Chickering
-
Publication number: 20040002929Abstract: Systems and methods are provided for producing displays of the accuracy of data mining or statistical models that produce associative predictions. For all cases in a testing data set, the model makes predictions and provides associated probabilities. The cases are sorted by their probability of making accurate predictions and a graph is made of the accuracy of the model over various subsets containing the highest probability cases as evaluated by the model. Where a number of probabilities are presented for the predictions in a basket of predictions, those probabilities are combined to yield a probability score for the entire basket. Additionally, the accuracy of a model over different basket sizes may be graphed. The accuracy graph may also be produced for any models making a prediction, by graphing the probability of making accurate predictions and a graph made of the accuracy of the model over various subsets of the data containing the highest probability cases.Type: ApplicationFiled: June 28, 2002Publication date: January 1, 2004Applicant: Microsoft CorporationInventors: Pyungchul Kim, Zhaohui Tang, David Earl Heckerman, Scott Conrad Oveson