Patents by Inventor David E. Heckerman

David E. Heckerman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20040243548
    Abstract: A dependency network is created from a training data set utilizing a scalable method. A statistical model (or pattern), such as for example a Bayesian network, is then constructed to allow more convenient inferencing. The model (or pattern) is employed in lieu of the training data set for data access. The computational complexity of the method that produces the model (or pattern) is independent of the size of the original data set. The dependency network directly returns explicitly encoded data in the conditional probability distributions of the dependency network. Non-explicitly encoded data is generated via Gibbs sampling, approximated, or ignored.
    Type: Application
    Filed: May 29, 2003
    Publication date: December 2, 2004
    Inventors: Geoffrey J. Hulten, David M. Chickering, David E. Heckerman
  • Publication number: 20040221062
    Abstract: Architecture for detecting and removing obfuscating clutter from the subject and/or body of a message, e.g., e-mail, prior to filtering of the message, to identify junk messages commonly referred to as spam. The technique utilizes the powerful features built into an HTML rendering engine to strip the HTML instructions for all non-substantive aspects of the message. Pre-processing includes pre-rendering of the message into a final format, which final format is that which is displayed by the rendering engine to the user. The final format message is then converted to a text-only format to remove graphics, color, non-text decoration, and spacing that cannot be rendered as ASCII-style or Unicode-style characters. The result is essentially to reduce each message to its common denominator essentials so that the junk mail filter can view each message on an equal basis.
    Type: Application
    Filed: May 2, 2003
    Publication date: November 4, 2004
    Inventors: Bryan T. Starbuck, Robert L. Rounthwaite, David E. Heckerman, Joshua T. Goodman
  • Publication number: 20040196311
    Abstract: Visualizing Internet web traffic is disclosed. In one embodiment, a number of windows are displayed, corresponding to a number of clusters into which users have been partitioned based on similar web browsing behavior. The windows are ordered from the cluster having the greatest number of users to the cluster having the least number of users. Each window has one or more rows, where each row corresponds to a user within the cluster. Each row has an ordered number of visible units, such as blocks, where each block corresponds to a web page visited by the user. The blocks can be color coded by the type of web page they represent. In one embodiment, the corresponding cluster models for the clusters are alternatively displayed in the windows.
    Type: Application
    Filed: April 22, 2004
    Publication date: October 7, 2004
    Applicant: Microsoft Corporation
    Inventors: Igor Cadez, David E. Heckerman, Christopher A. Meek, Steven J. White
  • Publication number: 20040181554
    Abstract: A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for the segments, scores the selected segments through the similarity measures, and then forms and visually depicts hierarchical organizations of those selected clusters. The.system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization, through elimination of unnecessary hierarchical levels and inter-nodal links, based on similarity measures of segments or segment groups. Attribute/value data that tends to meaningfully characterize each segment is also scored, rank ordered based on normalized scores, and then graphically displayed.
    Type: Application
    Filed: March 24, 2004
    Publication date: September 16, 2004
    Inventors: David E. Heckerman, Paul S. Bradley, David M. Chickering, Christopher A. Meek
  • Publication number: 20040177110
    Abstract: The subject invention provides for a feedback loop system and method that facilitate classifying items in connection with spam prevention in server and/or client-based architectures. The invention makes uses of a machine-learning approach as applied to spam filters, and in particular, randomly samples incoming email messages so that examples of both legitimate and junk/spam mail are obtained to generate sets of training data. Users which are identified as spam-fighters are asked to vote on whether a selection of their incoming email messages is individually either legitimate mail or junk mail. A database stores the properties for each mail and voting transaction such as user information, message properties and content summary, and polling results for each message to generate training data for machine learning systems. The machine learning systems facilitate creating improved spam filter(s) that are trained to recognize both legitimate mail and spam mail and to distinguish between them.
    Type: Application
    Filed: March 3, 2003
    Publication date: September 9, 2004
    Inventors: Robert L. Rounthwaite, Joshua T. Goodman, David E. Heckerman, John D. Mehr, Nathan D. Howell, Micah C. Rupersburg, Dean A. Slawson
  • Publication number: 20040167964
    Abstract: The invention relates to a system for filtering messages—the system includes a seed filter having associated therewith a false positive rate and a false negative rate. A new filter is also provided for filtering the messages, the new filter is evaluated according to the false positive rate and the false negative rate of the seed filter, the data used to determine the false positive rate and the false negative rate of the seed filter are utilized to determine a new false positive rate and a new false negative rate of the new filter as a function of threshold. The new filter is employed in lieu of the seed filter if a threshold exists for the new filter such that the new false positive rate and new false negative rate are together considered better than the false positive and the false negative rate of the seed filter.
    Type: Application
    Filed: February 25, 2003
    Publication date: August 26, 2004
    Inventors: Robert L. Rounthwaite, Joshua T. Goodman, David E. Heckerman, John C. Platt, Carl M. Kadie
  • Patent number: 6771289
    Abstract: Visualizing Internet web traffic is disclosed. In one embodiment, a number of windows are displayed, corresponding to a number of clusters into which users have been partitioned based on similar web browsing behavior. The windows are ordered from the cluster having the greatest number of users to the cluster having the least number of users. Each window has one or more rows, where each row corresponds to a user within the cluster. Each row has an ordered number of visible units, such as blocks, where each block corresponds to a web page visited by the user. The blocks can be color coded by the type of web page they represent. In one embodiment, the corresponding cluster models for the clusters are alternatively displayed in the windows.
    Type: Grant
    Filed: March 2, 2000
    Date of Patent: August 3, 2004
    Assignee: Microsoft Corporation
    Inventors: Igor Cadez, David E. Heckerman, Christopher A. Meek, Steven J. White
  • Patent number: 6742003
    Abstract: A system that incorporates an interactive graphical user interface for visualizing clusters (categories) and segments (summarized clusters) of data. Specifically, the system automatically categorizes incoming case data into clusters, summarizes those clusters into segments, determines similarity measures for the segments, scores the selected segments through the similarity measures, and then forms and visually depicts hierarchical organizations of those selected clusters. The system also automatically and dynamically reduces, as necessary, a depth of the hierarchical organization, through elimination of unnecessary hierarchical levels and inter-nodal links, based on similarity measures of segments or segment groups. Attribute/value data that tends to meaningfully characterize each segment is also scored, rank ordered based on normalized scores, and then graphically displayed.
    Type: Grant
    Filed: April 30, 2001
    Date of Patent: May 25, 2004
    Assignee: Microsoft Corporation
    Inventors: David E. Heckerman, Paul S. Bradley, David M. Chickering, Christopher A. Meek
  • Publication number: 20040073528
    Abstract: The present invention relates to a system and methodology to generate and provide a lift chart to determine accuracy of one or more models that predict continuous variable data. Systems and processes are provided that process continuous variable prediction data in accordance with various analytical techniques. The processed data is then formatted for display, wherein model performance can then be determined by comparisons between models and/or by comparisons to idealized model performance. In one aspect, a system is provided that generates a continuous variable prediction lift chart. The system includes an analyzer that receives data from one or more models and a continuous variable test data set, wherein the formatter then generates a lift chart based on the analyzed models and the continuous variable test data set.
    Type: Application
    Filed: October 15, 2002
    Publication date: April 15, 2004
    Inventors: Zhaohui Tang, David E. Heckerman, David M. Chickering
  • Publication number: 20040073537
    Abstract: A system and method for generating staged mixture model(s) is provided. The staged mixture model includes a plurality of mixture components each having an associated mixture weight, and, an added mixture component having an initial structure, parameters and associated mixture weight. The added mixture component is modified based, at least in part, upon a case that is undesirably addressed by the plurality of mixture components using a structural expectation maximization (SEM) algorithm to modify at the structure, parameters and/or associated mixture weight of the added mixture component.
    Type: Application
    Filed: October 15, 2002
    Publication date: April 15, 2004
    Inventors: Bo Thiesson, Christopher A. Meek, David E. Heckerman
  • Publication number: 20040044765
    Abstract: A computer network has links for carrying data among computers, including one or more client computers. Packet loss rates are determined for the client computers. Probability distributions for the loss rates of each of the client computers are then developed using various mathematical techniques. Based on an analysis of these probability distributions, a determination is made regarding which of the links are excessively lossy.
    Type: Application
    Filed: March 3, 2003
    Publication date: March 4, 2004
    Applicant: Microsoft Corporation
    Inventors: Christopher A. Meek, Venkata N. Padmanabhan, Lili Qiu, Jiahe Wang, David B. Wilson, Christian H. Borgs, Jennifer T. Chayes, David E. Heckerman
  • Patent number: 6694301
    Abstract: Clustering for purposes of data visualization and making predictions is disclosed. Embodiments of the invention are operable on a number of variables that have a predetermined representation. The variables include input-only variables, output-only variables, and both input-and-output variables. Embodiments of the invention generate a model that has a bottleneck architecture. The model includes a top layer of nodes of at least the input-only variables, one or more middle layer of hidden nodes, and a bottom layer of nodes of the output-only and the input-and-output variables. At least one cluster is determined from this model. The model can be a probabilistic neural network and/or a Bayesian network.
    Type: Grant
    Filed: March 31, 2000
    Date of Patent: February 17, 2004
    Assignee: Microsoft Corporation
    Inventors: David E. Heckerman, D. Maxwell Chickering, John C. Platt, Christopher A. Meek, Bo Thiesson
  • Patent number: 6665653
    Abstract: Reduction of noise within a cluster-based approach for item (such as ad) allocation, such as by using a linear program, is described. In one embodiment, probabilities are discretized into a predetermined number of groups, where the mean for the group that a particular probability has been discretized into is substituted for the particular probability when the items are being allocated. In another embodiment, the probabilities are decreased by a power function of the variances for them. In a third embodiment, allocation of items to clusters is not changed unless the sample sizes used to determine the corresponding probabilities for those ads is greater than a threshold. In a fourth embodiment, after allocation is performed a first time, a predetermined number of item are removed, and reallocation is performed.
    Type: Grant
    Filed: May 4, 2000
    Date of Patent: December 16, 2003
    Assignee: Microsoft Corporation
    Inventors: David E. Heckerman, D. Maxwell Chickering
  • Publication number: 20030229531
    Abstract: Advertisement response probabilities are utilized to alter advertisement scores. A plurality of possible advertisements is accessed from, for example, an advertisement database or advertisement pipeline. A response probability for each advertisement is determined. A response probability may be a probability that a user will “click,” or otherwise select an advertisement. Advertisements may be associated with probabilistic prediction models that take advertisement recipient attribute values as inputs and provide a probability distribution as output. A score associated with each of the possible advertisements is altered based on the response probability for each of the advertisements. Statistical prediction is used to determine how scores are to be altered. Advertisements with response probabilities less than a mean probability may have associated scores decreased. Conversely, advertisements with response probabilities greater than a mean probability may have associated scores increased.
    Type: Application
    Filed: June 5, 2002
    Publication date: December 11, 2003
    Inventors: David E. Heckerman, Martin Luo, Guy Shani, Mahbubul Alam Ali
  • Publication number: 20030217113
    Abstract: A streaming media caching mechanism and cache manager efficiently establish and maintain the contents of a streaming media cache for use in serving streaming media requests from cache rather than from an original data source when appropriate. The cost of caching is incurred only when the benefits of caching are likely to be experienced. The caching mechanism and cache manager evaluate the request count for each requested URL to determine whether the URL represents a cache candidate, and further analyze the URL request rate to determine whether the content associated with the URL will be cached. In an embodiment, the streaming media cache is maintained with a predetermined amount of reserve capacity rather than being filled to capacity whenever possible.
    Type: Application
    Filed: April 8, 2002
    Publication date: November 20, 2003
    Applicant: Microsoft Corporation
    Inventors: Ariel Katz, Yifat Sagiv, Guy Friedel, David E. Heckerman, John R. Douceur, Joshua Goodman
  • Patent number: 6633852
    Abstract: An electronic shopping aid is provided that assists a user in selecting a product from an electronic catalog of products based on their preferences for various features of the products. Since the electronic shopping aid helps a user select a product based on the user's preferences, it is referred to as a preference-based product browser. In using the browser, the user initially inputs an indication of their like or dislike for various features of the products as well as an indication of how strongly they feel about the like or dislike. The browser then utilizes this information to determine a list of products in which the user is most likely interested. As part of this determination, the browser performs collaborative filtering and bases the determination on what other users with similar characteristics (e.g., age and income) have liked.
    Type: Grant
    Filed: May 21, 1999
    Date of Patent: October 14, 2003
    Assignee: Microsoft Corporation
    Inventors: David E. Heckerman, Christopher A. Meek, Usama M. Fayad
  • Publication number: 20030139963
    Abstract: A decision theoretic approach to targeted solicitation, by maximizing expected profit increases, is disclosed. A decision theoretic model is used to identify a sub-population of a population to solicit, where the model is constructed to maximize an expected increase in profits. A decision tree in particular can be used as the model. The decision tree has paths from a root node to a number of leaf nodes. The decision tree has a split on a solicitation variable in every path from the root node to each leaf node. The solicitation variable has two values, a first value corresponding to a solicitation having been made, and a second value corresponding to a solicitation not having been made.
    Type: Application
    Filed: December 8, 2000
    Publication date: July 24, 2003
    Inventors: D. Maxwell Chickering, David E. Heckerman
  • Patent number: 6542878
    Abstract: Determination as to whether a variable is numeric or non-numeric. In one embodiment, a variable is input having a plurality of values, where each value has a count. The variable is determined to be numeric or non-numeric by assessing closeness of counts for adjacent values of the variable. Whether the variable is numeric or non-numeric is then output.
    Type: Grant
    Filed: April 23, 1999
    Date of Patent: April 1, 2003
    Assignee: Microsoft Corporation
    Inventors: David E. Heckerman, Robert L. Rounthwaite, Jeffrey R. Bernhardt
  • Patent number: 6529895
    Abstract: Determination of a distribution of a numeric variable. In one embodiment, a data set is first input. The data set has a plurality of records. Each record has a value for each of a plurality of raw non-transactional variables. The plurality of raw non-transactional variables includes a numeric variable. It is determined whether a Gaussian or a log-Gaussian distribution better predicts the numeric variable, based on the plurality of records. This determination is then output.
    Type: Grant
    Filed: April 23, 1999
    Date of Patent: March 4, 2003
    Assignee: Microsoft Corporation
    Inventor: David E. Heckerman
  • Patent number: 6529888
    Abstract: An improved belief network generator is provided. A belief network is generated utilizing expert knowledge retrieved from an expert in a given field of expertise and empirical data reflecting observations made in the given field of the expert. In addition to utilizing expert knowledge and empirical data, the belief network generator provides for the use of continuous variables in the generated belief network and missing data in the empirical data.
    Type: Grant
    Filed: October 30, 1996
    Date of Patent: March 4, 2003
    Assignee: Microsoft Corporation
    Inventors: David E. Heckerman, Dan Geiger, David M. Chickering