Patents by Inventor David E. Heckerman

David E. Heckerman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Dependency network based model (or pattern)

Publication number: 20040243548

Abstract: A dependency network is created from a training data set utilizing a scalable method. A statistical model (or pattern), such as for example a Bayesian network, is then constructed to allow more convenient inferencing. The model (or pattern) is employed in lieu of the training data set for data access. The computational complexity of the method that produces the model (or pattern) is independent of the size of the original data set. The dependency network directly returns explicitly encoded data in the conditional probability distributions of the dependency network. Non-explicitly encoded data is generated via Gibbs sampling, approximated, or ignored.

Type: Application

Filed: May 29, 2003

Publication date: December 2, 2004

Inventors: Geoffrey J. Hulten, David M. Chickering, David E. Heckerman
Message rendering for identification of content features

Publication number: 20040221062

Abstract: Architecture for detecting and removing obfuscating clutter from the subject and/or body of a message, e.g., e-mail, prior to filtering of the message, to identify junk messages commonly referred to as spam. The technique utilizes the powerful features built into an HTML rendering engine to strip the HTML instructions for all non-substantive aspects of the message. Pre-processing includes pre-rendering of the message into a final format, which final format is that which is displayed by the rendering engine to the user. The final format message is then converted to a text-only format to remove graphics, color, non-text decoration, and spacing that cannot be rendered as ASCII-style or Unicode-style characters. The result is essentially to reduce each message to its common denominator essentials so that the junk mail filter can view each message on an equal basis.

Type: Application

Filed: May 2, 2003

Publication date: November 4, 2004

Inventors: Bryan T. Starbuck, Robert L. Rounthwaite, David E. Heckerman, Joshua T. Goodman
Cluster-based visualization of user traffic on an internet site

Publication number: 20040196311

Abstract: Visualizing Internet web traffic is disclosed. In one embodiment, a number of windows are displayed, corresponding to a number of clusters into which users have been partitioned based on similar web browsing behavior. The windows are ordered from the cluster having the greatest number of users to the cluster having the least number of users. Each window has one or more rows, where each row corresponds to a user within the cluster. Each row has an ordered number of visible units, such as blocks, where each block corresponds to a web page visited by the user. The blocks can be color coded by the type of web page they represent. In one embodiment, the corresponding cluster models for the clusters are alternatively displayed in the windows.

Type: Application

Filed: April 22, 2004

Publication date: October 7, 2004

Applicant: Microsoft Corporation

Inventors: Igor Cadez, David E. Heckerman, Christopher A. Meek, Steven J. White
Apparatus and accompanying methods for visualizing clusters of data and hierarchical cluster classifications

Publication number: 20040181554

Abstract: A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for the segments, scores the selected segments through the similarity measures, and then forms and visually depicts hierarchical organizations of those selected clusters. The.system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization, through elimination of unnecessary hierarchical levels and inter-nodal links, based on similarity measures of segments or segment groups. Attribute/value data that tends to meaningfully characterize each segment is also scored, rank ordered based on normalized scores, and then graphically displayed.

Type: Application

Filed: March 24, 2004

Publication date: September 16, 2004

Inventors: David E. Heckerman, Paul S. Bradley, David M. Chickering, Christopher A. Meek
Feedback loop for spam prevention

Publication number: 20040177110

Abstract: The subject invention provides for a feedback loop system and method that facilitate classifying items in connection with spam prevention in server and/or client-based architectures. The invention makes uses of a machine-learning approach as applied to spam filters, and in particular, randomly samples incoming email messages so that examples of both legitimate and junk/spam mail are obtained to generate sets of training data. Users which are identified as spam-fighters are asked to vote on whether a selection of their incoming email messages is individually either legitimate mail or junk mail. A database stores the properties for each mail and voting transaction such as user information, message properties and content summary, and polling results for each message to generate training data for machine learning systems. The machine learning systems facilitate creating improved spam filter(s) that are trained to recognize both legitimate mail and spam mail and to distinguish between them.

Type: Application

Filed: March 3, 2003

Publication date: September 9, 2004

Inventors: Robert L. Rounthwaite, Joshua T. Goodman, David E. Heckerman, John D. Mehr, Nathan D. Howell, Micah C. Rupersburg, Dean A. Slawson
Adaptive junk message filtering system

Publication number: 20040167964

Abstract: The invention relates to a system for filtering messages—the system includes a seed filter having associated therewith a false positive rate and a false negative rate. A new filter is also provided for filtering the messages, the new filter is evaluated according to the false positive rate and the false negative rate of the seed filter, the data used to determine the false positive rate and the false negative rate of the seed filter are utilized to determine a new false positive rate and a new false negative rate of the new filter as a function of threshold. The new filter is employed in lieu of the seed filter if a threshold exists for the new filter such that the new false positive rate and new false negative rate are together considered better than the false positive and the false negative rate of the seed filter.

Type: Application

Filed: February 25, 2003

Publication date: August 26, 2004

Inventors: Robert L. Rounthwaite, Joshua T. Goodman, David E. Heckerman, John C. Platt, Carl M. Kadie
Cluster-based visualization of user traffic on an internet site

Patent number: 6771289

Abstract: Visualizing Internet web traffic is disclosed. In one embodiment, a number of windows are displayed, corresponding to a number of clusters into which users have been partitioned based on similar web browsing behavior. The windows are ordered from the cluster having the greatest number of users to the cluster having the least number of users. Each window has one or more rows, where each row corresponds to a user within the cluster. Each row has an ordered number of visible units, such as blocks, where each block corresponds to a web page visited by the user. The blocks can be color coded by the type of web page they represent. In one embodiment, the corresponding cluster models for the clusters are alternatively displayed in the windows.

Type: Grant

Filed: March 2, 2000

Date of Patent: August 3, 2004

Assignee: Microsoft Corporation

Inventors: Igor Cadez, David E. Heckerman, Christopher A. Meek, Steven J. White
Apparatus and accompanying methods for visualizing clusters of data and hierarchical cluster classifications

Patent number: 6742003

Abstract: A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for the segments, scores the selected segments through the similarity measures, and then forms and visually depicts hierarchical organizations of those selected clusters. The system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization, through elimination of unnecessary hierarchical levels and inter-nodal links, based on similarity measures of segments or segment groups. Attribute/value data that tends to meaningfully characterize each segment is also scored, rank ordered based on normalized scores, and then graphically displayed.

Type: Grant

Filed: April 30, 2001

Date of Patent: May 25, 2004

Assignee: Microsoft Corporation

Inventors: David E. Heckerman, Paul S. Bradley, David M. Chickering, Christopher A. Meek
Continuous variable prediction lift chart systems and methods

Publication number: 20040073528

Abstract: The present invention relates to a system and methodology to generate and provide a lift chart to determine accuracy of one or more models that predict continuous variable data. Systems and processes are provided that process continuous variable prediction data in accordance with various analytical techniques. The processed data is then formatted for display, wherein model performance can then be determined by comparisons between models and/or by comparisons to idealized model performance. In one aspect, a system is provided that generates a continuous variable prediction lift chart. The system includes an analyzer that receives data from one or more models and a continuous variable test data set, wherein the formatter then generates a lift chart based on the analyzed models and the continuous variable test data set.

Type: Application

Filed: October 15, 2002

Publication date: April 15, 2004

Inventors: Zhaohui Tang, David E. Heckerman, David M. Chickering
Staged mixture modeling

Publication number: 20040073537

Abstract: A system and method for generating staged mixture model(s) is provided. The staged mixture model includes a plurality of mixture components each having an associated mixture weight, and, an added mixture component having an initial structure, parameters and associated mixture weight. The added mixture component is modified based, at least in part, upon a case that is undesirably addressed by the plurality of mixture components using a structural expectation maximization (SEM) algorithm to modify at the structure, parameters and/or associated mixture weight of the added mixture component.

Type: Application

Filed: October 15, 2002

Publication date: April 15, 2004

Inventors: Bo Thiesson, Christopher A. Meek, David E. Heckerman
Method and system for identifying lossy links in a computer network

Publication number: 20040044765

Abstract: A computer network has links for carrying data among computers, including one or more client computers. Packet loss rates are determined for the client computers. Probability distributions for the loss rates of each of the client computers are then developed using various mathematical techniques. Based on an analysis of these probability distributions, a determination is made regarding which of the links are excessively lossy.

Type: Application

Filed: March 3, 2003

Publication date: March 4, 2004

Applicant: Microsoft Corporation

Inventors: Christopher A. Meek, Venkata N. Padmanabhan, Lili Qiu, Jiahe Wang, David B. Wilson, Christian H. Borgs, Jennifer T. Chayes, David E. Heckerman
Goal-oriented clustering

Patent number: 6694301

Abstract: Clustering for purposes of data visualization and making predictions is disclosed. Embodiments of the invention are operable on a number of variables that have a predetermined representation. The variables include input-only variables, output-only variables, and both input-and-output variables. Embodiments of the invention generate a model that has a bottleneck architecture. The model includes a top layer of nodes of at least the input-only variables, one or more middle layer of hidden nodes, and a bottom layer of nodes of the output-only and the input-and-output variables. At least one cluster is determined from this model. The model can be a probabilistic neural network and/or a Bayesian network.

Type: Grant

Filed: March 31, 2000

Date of Patent: February 17, 2004

Assignee: Microsoft Corporation

Inventors: David E. Heckerman, D. Maxwell Chickering, John C. Platt, Christopher A. Meek, Bo Thiesson
Noise reduction for a cluster-based approach for targeted item delivery with inventory management

Patent number: 6665653

Abstract: Reduction of noise within a cluster-based approach for item (such as ad) allocation, such as by using a linear program, is described. In one embodiment, probabilities are discretized into a predetermined number of groups, where the mean for the group that a particular probability has been discretized into is substituted for the particular probability when the items are being allocated. In another embodiment, the probabilities are decreased by a power function of the variances for them. In a third embodiment, allocation of items to clusters is not changed unless the sample sizes used to determine the corresponding probabilities for those ads is greater than a threshold. In a fourth embodiment, after allocation is performed a first time, a predetermined number of item are removed, and reallocation is performed.

Type: Grant

Filed: May 4, 2000

Date of Patent: December 16, 2003

Assignee: Microsoft Corporation

Inventors: David E. Heckerman, D. Maxwell Chickering
Modifying advertisement scores based on advertisement response probabilities

Publication number: 20030229531

Abstract: Advertisement response probabilities are utilized to alter advertisement scores. A plurality of possible advertisements is accessed from, for example, an advertisement database or advertisement pipeline. A response probability for each advertisement is determined. A response probability may be a probability that a user will “click,” or otherwise select an advertisement. Advertisements may be associated with probabilistic prediction models that take advertisement recipient attribute values as inputs and provide a probability distribution as output. A score associated with each of the possible advertisements is altered based on the response probability for each of the advertisements. Statistical prediction is used to determine how scores are to be altered. Advertisements with response probabilities less than a mean probability may have associated scores decreased. Conversely, advertisements with response probabilities greater than a mean probability may have associated scores increased.

Type: Application

Filed: June 5, 2002

Publication date: December 11, 2003

Inventors: David E. Heckerman, Martin Luo, Guy Shani, Mahbubul Alam Ali
Caching techniques for streaming media

Publication number: 20030217113

Abstract: A streaming media caching mechanism and cache manager efficiently establish and maintain the contents of a streaming media cache for use in serving streaming media requests from cache rather than from an original data source when appropriate. The cost of caching is incurred only when the benefits of caching are likely to be experienced. The caching mechanism and cache manager evaluate the request count for each requested URL to determine whether the URL represents a cache candidate, and further analyze the URL request rate to determine whether the content associated with the URL will be cached. In an embodiment, the streaming media cache is maintained with a predetermined amount of reserve capacity rather than being filled to capacity whenever possible.

Type: Application

Filed: April 8, 2002

Publication date: November 20, 2003

Applicant: Microsoft Corporation

Inventors: Ariel Katz, Yifat Sagiv, Guy Friedel, David E. Heckerman, John R. Douceur, Joshua Goodman
Preference-based catalog browser that utilizes a belief network

Patent number: 6633852

Abstract: An electronic shopping aid is provided that assists a user in selecting a product from an electronic catalog of products based on their preferences for various features of the products. Since the electronic shopping aid helps a user select a product based on the user's preferences, it is referred to as a preference-based product browser. In using the browser, the user initially inputs an indication of their like or dislike for various features of the products as well as an indication of how strongly they feel about the like or dislike. The browser then utilizes this information to determine a list of products in which the user is most likely interested. As part of this determination, the browser performs collaborative filtering and bases the determination on what other users with similar characteristics (e.g., age and income) have liked.

Type: Grant

Filed: May 21, 1999

Date of Patent: October 14, 2003

Assignee: Microsoft Corporation

Inventors: David E. Heckerman, Christopher A. Meek, Usama M. Fayad
Decision theoretic approach to targeted solicitation by maximizing expected profit increases

Publication number: 20030139963

Abstract: A decision theoretic approach to targeted solicitation, by maximizing expected profit increases, is disclosed. A decision theoretic model is used to identify a sub-population of a population to solicit, where the model is constructed to maximize an expected increase in profits. A decision tree in particular can be used as the model. The decision tree has paths from a root node to a number of leaf nodes. The decision tree has a split on a solicitation variable in every path from the root node to each leaf node. The solicitation variable has two values, a first value corresponding to a solicitation having been made, and a second value corresponding to a solicitation not having been made.

Type: Application

Filed: December 8, 2000

Publication date: July 24, 2003

Inventors: D. Maxwell Chickering, David E. Heckerman
Determining whether a variable is numeric or non-numeric

Patent number: 6542878

Abstract: Determination as to whether a variable is numeric or non-numeric. In one embodiment, a variable is input having a plurality of values, where each value has a count. The variable is determined to be numeric or non-numeric by assessing closeness of counts for adjacent values of the variable. Whether the variable is numeric or non-numeric is then output.

Type: Grant

Filed: April 23, 1999

Date of Patent: April 1, 2003

Assignee: Microsoft Corporation

Inventors: David E. Heckerman, Robert L. Rounthwaite, Jeffrey R. Bernhardt
Generating improved belief networks

Patent number: 6529888

Abstract: An improved belief network generator is provided. A belief network is generated utilizing expert knowledge retrieved from an expert in a given field of expertise and empirical data reflecting observations made in the given field of the expert. In addition to utilizing expert knowledge and empirical data, the belief network generator provides for the use of continuous variables in the generated belief network and missing data in the empirical data.

Type: Grant

Filed: October 30, 1996

Date of Patent: March 4, 2003

Assignee: Microsoft Corporation

Inventors: David E. Heckerman, Dan Geiger, David M. Chickering
Determining a distribution of a numeric variable

Patent number: 6529895

Abstract: Determination of a distribution of a numeric variable. In one embodiment, a data set is first input. The data set has a plurality of records. Each record has a value for each of a plurality of raw non-transactional variables. The plurality of raw non-transactional variables includes a numeric variable. It is determined whether a Gaussian or a log-Gaussian distribution better predicts the numeric variable, based on the plurality of records. This determination is then output.

Type: Grant

Filed: April 23, 1999

Date of Patent: March 4, 2003

Assignee: Microsoft Corporation

Inventor: David E. Heckerman

prev 1 2 3 4 5 6 7 next