VARIABLE LEARNING RATE AUTOMATED DECISIONING
Methods and related system are described for making decisions. A described method includes selecting a choice from the available choices, receiving an outcome relating to the selected choice, and automatically learning from the received outcome by incorporating the received outcome into subsequent steps of selecting a choice. The method may also include calculating estimated probabilities associated with the each choice using Bayesian networks. The automated learning can be based on a learning rate which is variable with time, and influences the degree on which prior outcomes are relied upon when calculating an estimated probability associated with a choice. The learning rate can be a function of time and an estimate of drift of the probability associated with the selected choice.
This patent specification relates to automated decisioning. More particularly, this patent specification relates to systems and methods for automated decisioning having variable learning rates.
BACKGROUNDAutomated decisioning systems have been developed to aid people and businesses to make faster, fact-based decisions in business settings. Typically, automated decisioning systems enable the user to make real-time, informed decisions, while minimizing risk and increasing profitability. Decisioning systems can be used to quickly assess risk potential, streamline account application processes, and apply decision criteria more consistently for approving decisions and/or selling new products or services.
Conventionally, decision-making models or decisioning models have been manually or custom developed by human analysts. They have been deployed, often with the use of scoring software systems where the models score out incoming data. These conventional models do not use the data they were scoring out on to update themselves. Furthermore, they do not use the outcome of their decisions to update themselves. Since the incoming data characteristics in the real world tend to change over time, the models tend to degrade in performance unless they are updated. This updating process has also been conventionally undertaken manually by human analysts. The more quickly the trends and behavior patterns change, the shorter the lifespan of the model, and historic data becomes increasingly unreliable. Furthermore, conventional models do not normally take account of frequently changing lists of eligible choices.
SUMMARYAn adaptive decisioning system for making decisions between available choices can be provided. The system includes a processor arranged and programmed to select a choice from the available choices based at least in part on evaluating a plurality of prior outcomes for the available choices, wherein the number of prior outcomes evaluated varies with time. According to certain embodiments, the system includes an input/output system in communication with the processor and arranged to communicate the selected choice to a user and to receive an outcome relating to the selected choice, and the processor automatically learns from the outcome by basing at least some subsequently calculated estimated probabilities on the outcome. Based on further embodiments the process is further programmed to calculate estimated probabilities associated with each choice based at least in part on evaluating a number of prior outcomes for the each choice, and the selection of a choice is based at least in part on the calculated estimated probabilities. The number of prior outcomes evaluated for the each choice can be based at least in part on an estimate of drift of the estimated probability associated with the that choice. The processor can be further programmed such that the selected choice is at least sometimes a sub-optimal choice such that outcome relating to the sub-optimal choice can be obtained, and the sub-optimal choice is selected at a rate that is proportional to an estimated probability associated with the sub-optimal choice.
According to other embodiments, a method for adaptively making decisions between available choices including at least a first choice and a second choice is provided. The method includes selecting a choice from the available choices; receiving an outcome relating to the selected choice; and automatically learning from the received outcome by incorporating the received outcome into subsequent steps of selecting a choice. The method also can also include calculating a first estimated probability associated with the first choice; calculating a second estimated probability associated with the second choice, wherein the step of selecting a choice is based at least in part upon the calculated first and second estimated probabilities, and the received outcome is incorporated into subsequent steps of calculating estimated probability associated with the selected choice. The automatic learning can be based on a learning rate which is variable with time, and influences the degree on which prior outcomes are relied upon when calculating an estimated probability associated with a choice. The learning rate can be a function of time and an estimate of drift of the probability associated with the selected choice.
Articles are also described that comprise a machine-readable medium embodying instructions that when performed by one or more machines result in operations described herein. Similarly, computer systems are also described that may include a processor and a memory coupled to the processor. The memory may encode one or more programs that cause the processor to perform one or more of the operations described herein.
The details of one or more variations of the subject matter described herein are set forth in the accompanying drawings and the description below. Other features and advantages of the subject matter described herein will be apparent from the description and drawings, and from the claims.
Adaptive analytics based algorithms can be used in statistical models to provide the capability of realtime automated update of the models in deployment. It has been found that an important factor in self-updating models used for decisioning is the learning rate. The rate at which the model is updated is very important in balancing two considerations; (1) keeping the error rate (which leads to wrong decisions) relatively low; and (2) keeping the rate of learning relatively high whenever the environment (incoming data characteristic or the correct decision) changes to quickly adapt to the change. It has been found that variable learning rate models perform well under many real-world decisioning situations to balance these two considerations.
One example of automated decisioning with variable learning rate has been applied to decisions regarding making either an offer for a first product or service, or an offer for a second product or service, to a customer based on the customer's profile. The model recommends which product to offer to a customer based on known customer information. The feedback given back to the model includes whether the recommended offer was accepted or not. The learning occurs with this feedback by updating the model. With the update, there is an increase or decrease in the probability of accepting this offer by a customer with the same characteristics value. To understand whether different offers would be accepted by customers of specific type or profile, alternate or non-optimal offers are sometimes made to customers and the feedback received and model is updated.
Further detail of the statistical models are provided below. The model used to estimate the probability of accepting an offer i at time t can be represented:
{circumflex over (p)}t(t)=(1−η){circumflex over (p)}i(t−1)+ηIi(t)
where η is the learning rate parameter, which controls how much the past is relied upon. As η approaches 1, the past is weighted less, and as η approaches zero, the network parameters change slowly from the previous model. It(t) is the feedback indicator function for offer i at the time t, which can be either 1 or 0:
As described herein, separate models {circumflex over (p)}(t) can be used for each combination of segment (for example, customer age, income, gender, etc.) and offer type (for example, offer to sell cellphone A, cellphone B, etc.). The optimum offer to make to the customer in a segment at a given time is then:
A simple probability table can be used for the model {circumflex over (p)}i(t), where for every possible combination of input values there is an output probability value. The predication model takes as inputs values characteristics of the object on which a prediction needs to be made. Shown in Table 1 and Table 2 is a simple model for two offers that takes as input the characteristics of a customer and produces the predicted acceptance rate for a given offer as the output. Tables 1 and 2 also correspond to models 210 and 220 respectively as shown in
A variable learning rate can be provided. One example of a dynamic learning rate in the context of prediction Bayesian networks, as opposed to decisioning systems, is described in I. Cohen, A. Bronstein, and F. Cozman, Adaptive Online Learning of Bayesian Network Parameters, HPL-2001-156.pdf (2001), and in United States Patent Application Pub. No. US2003/0115325, both of which are incorporated by reference herein. In order to understand the changes in the underlying model in real time, capture that change and take action, the above formulae are modified. The new formulae makes the learning rate a function of both: t (counts of the observations); and how far the estimate is away from the moving average over a period of a number of run. It has been found, for example, that for many applications, the deviation from a moving average of 100 runs is suitable.
The average and sample standard deviation of the adaptive learning algorithm {circumflex over (p)}i(t)=(1−η){circumflex over (p)}i(t−1)+ηIi(t) can be given by:
In other words, the learning parameter is both a function of time and of the estimated drift. In one variation, the foregoing formulae is implemented using the following computer code:
Active experimentation can be used in the learning process for decisioning systems—where decisions are recommended for more than one offer. It has been found that the decisioning system should make non-optimal offers (i.e. alternate offers), going against what the model recommends, in order to generate new training data for non-optimal target values. In order to learn how a particular customer type would respond to non optimal (according to the model) offers, such non-optimal offers need can be made at regular intervals. Without the use of experimentation, the optimum offer (e.g. with the highest value or greatest probability of being accepted) is always selected. In this case it becomes difficult or impossible to detect changes in the other, non-optimal offers. Unless the non-optimal offers are very close to the optimal offer, the non-optimal offers will never be selected and therefore those models will not detect changes with respect to those non-optimal offers. Thus in real world applications when making decisions among multiple offers whose probabilities are changing with time, simply selecting the optimal offer without experimentation will not allow for determining the accurate estimates of non optimal offer's acceptance probability.
Note that when the second best offer is close to the most likely accepted offer, there will not be much loss with both a low and high learning rates. In
On the other hand, when the offers are quite different in terms of their response rates, a higher learning rate causes a faster capture of the change but also causes many more errors due to the high variance in the expected rate. In
Making non optimal offers involves a cost higher than that of offering the optimal offer and hence should be minimized. At the same time, making non optimal offers is required to detect changes in the customer preferences. It has been found that the rate at which the alternate offers, or non-optimal offers, are made can be tied to the learning rate, which can be calculated as described above. Thus the rate at which alternate, or non-optimal offers are made can be governed by the learning rate: increasing when the learning rate is high and decreasing when it is low.
As in the context of predicting with respect to a single offer, with two or more offers decisioning systems with slow learning have more exposure to systematic errors, such as shown in
Since the decision rule i*(t)=argmax{circumflex over (p)}i(t) does not drive estimates of the alternative offers, alternative offers need to be tried to estimate the alternative offer probabilities. In one variation, a simple method for experimentation is to select the offers according to the probability of that offer being accepted over the sum of the probabilities of all the offers. In other words, an offer j is selected with having probability:
It has been found that, in general, more effective results are achieved when the learning rate is incorporated. Relying on the convergence of the estimate, bias the selection towards i* by weighting the sum by the learning parameter.
where γ is a scaling parameter. The above formula can be implemented using the following computer code:
According to yet alternative embodiments, the likelihood of error estimates can be used (as described above) to drive the decision to try alternatives, or the cost of “good” and “bad” decisions can be incorporated.
In
The adaptive learning rate and experimentation techniques described herein can be applied to different model types like decision tree and nearest neighbor.
In one variation, the following code can be used to update the window size for a decision tree algorithm.
One decision tree can be used for each offer. For a given input value, the tree for different offers is used and the offers are compared. When choosing from multiple offers, using different trees, the best offer is not always chosen. The alternate offer selection mechanism is applied here as well and the non optimal offers are chosen to get data points for non optimal offers.
As more feedback is gathered with time, more and more examples or data points are placed on the neighborhood space shown in
The updates to the window size occur in a manner analogous to the way the learning rate is updated, as described above. When the window size is updated based on performance of the system, the nearest neighbor model adapts to the changes. The following code can be used to update the window size used in a nearest neighbor algorithm.
Several example embodiments of variable learning rate decisioning systems will now be described in further detail. In a marketing setting, the objective is often to make the right offer to the customer who walks in to a store or call a customer service center. The same customer might not prefer the same thing at various time instances. Over time, preferences of customers change and so the same offer might not work later, even though it would have worked in the past. To counter this problem, the decision system as described herein is used to make decisions on what to offer. The system advantageously adapts to the changing reaction to offers and adjust itself to detect and react to the changing preferences. In order to more efficiently detect changes, the system also performs experimentation by making non optimal decisions as a means of exploring the various offers and seeing if the response rate for the different offers has changed. This constant experimentation and adaptation leads to the system being able to help with making the offer recommendation decision, even if the preferences changes.
The decisioning techniques described herein can be applied to decisioning in the context of a buyer deciding which product or service to purchase or use.
Various implementations of the subject matter described herein may be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations may include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the term “machine-readable medium” refers to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the subject matter described herein may be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user may provide input to the computer. Other kinds of devices may be used to provide for interaction with a user as well; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The subject matter described herein may be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a client computer having a graphical user interface or a Web browser through which a user may interact with an implementation of the subject matter described herein), or any combination of such back-end, middleware, or front-end components. The components of the system may be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), and the Internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
Although a few variations have been described in detail above, other modifications are possible. For example, while some of the variations described herein have been described for some applications, other uses of the adaptive decisioning systems include applications such as fraud detection systems, where an adaptive decisioning system is used to react quickly to emerging fraudulent behavior. In addition, the logic flow depicted in the accompanying figures and/or described herein do not require the particular order shown, or sequential order, to achieve desirable results. Other embodiments may be within the scope of the following claims.
Claims
1. An adaptive decisioning system for making decisions between available choices the system comprising a processor arranged and programmed to select a choice from the available choices based at least in part on evaluating a plurality of prior outcomes for the available choices, wherein the number of prior outcomes evaluated varies with time.
2. A system according to claim 1 further comprising an input/output system in communication with the processor and arranged to communicate the selected choice to a user and to receive an outcome relating to the selected choice, wherein the processor automatically learns from the outcome by basing at least some subsequently calculated estimated probabilities on the outcome.
3. A system according to claim 1 wherein the process is further programmed to calculate a first estimated probability associated with a first choice based at least in part on evaluating a plurality of prior outcomes for the first choice, and to calculate a second estimated probability associate with a second choice based at least in part on evaluating a plurality of prior outcomes for the second choice, wherein the selection of a choice is based at least in part on the calculated first and second estimated probabilities.
4. A system according to claim 3 wherein the number of prior outcomes evaluated for the first choice is based at least in part on an estimate of drift of the estimated probability associated with the first choice; and the number of prior outcomes evaluated for the second choice is based at least in part on an estimate of drift of the estimated probability associated with the second choice.
5. A system according to claim 4 wherein the drifts are estimated by calculating how far each estimate is away from a moving average.
6. A system according to claim 5 wherein the number of evaluated outcome values is smaller when the estimated drift is low and higher when the estimate drift is high.
7. A system according to claim 5 wherein the calculating makes use of at least one Bayesian network.
8. A system according to claim 3 wherein the choice having the highest estimated probability is selected.
9. A system according to claim 3 where the processor is further programmed to calculate a first profit estimate based on the first estimated probability and a second profit estimate based on the second estimated probability, and wherein the choice having the highest profit estimate is selected.
10. A system according to claim 1 wherein the processor is further programmed such that the selected choice is at least sometimes a sub-optimal choice such that outcome relating to the sub-optimal choice can be obtained.
11. A system according to claim 10 wherein the sub-optimal choice is selected at a rate that is proportional to an estimated probability associated with the sub-optimal choice.
12. A method for adaptively making decisions between available choices including at least a first choice and a second choice comprising:
- selecting a choice from the available choices;
- receiving an outcome relating to the selected choice; and
- automatically learning from the received outcome by incorporating the received outcome into subsequent steps of selecting a choice.
13. A method according to claim 12 further comprising:
- calculating a first estimated probability associated with the first choice;
- calculating a second estimated probability associated with the second choice, wherein the step of selecting a choice is based at least in part upon the calculated first and second estimated probabilities, and the received outcome is incorporated into subsequent steps of calculating estimated probability associated with the selected choice.
14. A method according to claim 13 wherein the automatically learning includes a learning rate which is variable with time, the learning rate influencing the degree on which prior outcomes are relied upon when calculating an estimated probability associated with a choice, and the learning rate being a function of time and an estimate of drift of the probability associated with the selected choice.
15. A method according to claim 12 wherein the automatically learning includes a learning rate which is variable with time, the learning rate influencing the degree on which prior outcomes are relied upon when selecting a choice.
16. A method according to claim 12 wherein the selecting a choice from the available choices includes at least sometimes selecting a sub-optimal choice such that outcome relating to the sub-optimal choice can be obtained.
17. A method according to claim 16 wherein the sub-optimal choice is selected at a rate that is proportional to an estimated probability associated with the sub-optimal choice.
18. A method according to claim 13 wherein the selecting a choice from the available choice includes at least sometimes selecting a sub-optimal choice such that outcome relating to the sub-optimal choice can be obtained, the sub-optimal choice being selected at a rate that is proportional to an estimated probability associated with the sub-optimal choice, and wherein the selection rate for the sub-optimal choice is inversely related to a learning rate which influences the degree on which prior outcomes are relied upon when estimating a probability associated with the selected choice.
19. A method according to claim 13 wherein the calculating first and second estimated probabilities comprises the use of one or more Bayesian networks.
20. A method according to claim 19 wherein at least one Bayesian network is associated with each estimated probability.
21. A method according to claim 12 wherein the choice having the highest estimated probability is selected.
22. A method according to claim 13 further comprising calculating a first profit estimate based on the first estimated probability and estimating a second profit estimate based on the second estimated probability, and wherein the choice having the highest profit estimate is selected.
23. A method according to claim 12 wherein the selecting a choice is based at least in part on an automatically adapting decision tree based algorithm.
24. A method according to claim 23 wherein the decision tree based algorithm automatically re-arranges one or more structures within the decision tree based on a number of prior received outcomes, said number being variable with time and being a function of accuracy of prior selected choices.
25. A method according to claim 23 wherein the selecting a choice from the available choices includes at least sometimes selecting a sub-optimal choice such that outcome relating to the sub-optimal choice can be obtained.
26. A method according to claim 12 wherein the selecting a choice is based at least in part on an automatically adapting nearest neighbor algorithm.
27. A method according to claim 26 wherein the nearest neighbor algorithm uses a number of prior received outcomes, said number being variable with time and being a function of accuracy of prior selected choices.
28. A method according to claim 26 wherein the selecting a choice from the available choices includes at least sometimes selecting a sub-optimal choice such that outcome relating to the sub-optimal choice can be obtained.
29. A method according to claim 12 wherein the choices represent offers for sale of goods or services.
30. A method according to claim 12 wherein the choices represent alternative purchasing options.
31. A method according to claim 12 wherein the choices represent alternative services to use.
32. A method according to claim 12 wherein the choices represent choices relating to placement of advertisements on web pages.
33. A method for adaptively making decisions between available choices including at least a first choice and a second choice comprising:
- receiving a plurality of first choice outcome values each representing an outcome for the first choice occurring at an earlier time;
- receiving a plurality of second choice outcome values each representing an outcome for the second choice occurring at an earlier time;
- calculating a first estimated probability associated with the first choice based at least in part on evaluating a number of the first choice outcome values;
- calculating a second estimated probability associated with the second choice based at least in part on evaluating a number of the second choice outcome values; and
- selecting a choice from the available choices based at least in part upon the calculated first and second estimated probabilities.
34. A method according to claim 33 wherein the number of first choice outcome values evaluated is a function of time and an estimate of drift associated with estimated probability associated with the first choice; and the number of second choice outcome values evaluated is a function of time and an estimate of drift associated with estimated probability associated with the second choice.
35. A method according to claim 34 wherein the drifts are estimated by calculating how far the estimate is away from a moving average.
36. A method according to claim 33 wherein the calculating makes use of at least one Bayesian network.
37. A method according to claim 33 wherein the selecting a choice from the available choices includes at least sometimes selecting a sub-optimal choice such that outcome relating to the sub-optimal choice can be obtained.
38. A method according to claim 37 wherein the sub-optimal choice is selected at a rate that is proportional to an estimated probability associated with the sub-optimal choice.
39. A method according to claim 33 further comprising recommending the selected choice to a user.
40. A method for adaptively estimating the likelihood of an event comprising:
- receiving a plurality of outcome values each representing an outcome for the event occurring at an earlier time; and
- calculating an estimate of the likelihood for the event based at least in part on evaluating a number of the outcome values, wherein the number is a function of time an estimate of drift associated with the likelihood estimation.
41. A method according to claim 40 wherein the number of evaluated outcome values is smaller when the estimated drift is low and higher when the estimate drift is high.
42. A method according to claim 41 wherein the drift is estimated by calculating how far the estimate of the likelihood for the event is away from a moving average of estimated likelihoods for the event.
43. A method according to claim 40 wherein the calculating makes use of at least one Bayesian network.
Type: Application
Filed: Dec 21, 2007
Publication Date: Jun 25, 2009
Patent Grant number: 8706545
Inventors: Deenadayalan Narayanaswamy (Alameda, CA), Marc-david Cohen (Ross, CA), Zhenyu Yan (El Cerrito, CA)
Application Number: 11/963,501
International Classification: G06Q 10/00 (20060101);