DEMOGRAPHIC ANALYSIS USING TIME-BASED CONSUMER TRANSACTION HISTORIES
Systems, apparatus, and methods for determining groups of similar consumers and for identifying a trend in consumer behavior are provided. Likelihoods of a transaction being initiated at various times by one consumer can be calculated based on previous transactions of the consumer. The likelihoods for different consumers can be used to determine a group of similar consumers as a demographic. The likelihoods of a transaction being initiated at various times by a consumer of a demographic (or other entity) can be used to forecast trends (such as a demand for a product) and make business decisions, such as for marketing campaigns, inventory levels (e.g. at particular stores or for all stores), pricing, and store locations. Such likelihoods when focused to a particular category of transactions can provide even greater accuracy.
The present application claims priority from and is a non provisional application of U.S. Provisional Application No. 61/175,381, entitled “SYSTEMS AND METHODS FOR DETERMINING AUTHORIZATION, RISK SCORES, AND PREDICTION OF TRANSACTIONS” filed May 4, 2009, the entire contents of which are herein incorporated by reference for all purposes.
This application is related to commonly owned and concurrently filed U.S. patent applications entitled “PRE-AUTHORIZATION OF A TRANSACTION USING PREDICTIVE MODELING” by Faith et al. (attorney docket number 016222-046210US), “DETERMINING TARGETED INCENTIVES BASED ON CONSUMER TRANSACTION HISTORY” by Faith et al. (attorney docket number 016222-046220US), “TRANSACTION AUTHORIZATION USING TIME-DEPENDENT TRANSACTION PATTERNS” by Faith et al. (attorney docket number 016222-046240US), and “FREQUENCY-BASED TRANSACTION PREDICTION AND PROCESSING” by Faith et al. (attorney docket number 016222-046250US), the entire contents of which are herein incorporated by reference for all purposes.
BACKGROUNDThe present application is generally related to tracking and processing consumer transactions, and more specifically to using a history of consumer activity in determining a demographic of consumers and trends in transactions of a demographic.
As part of marketing campaigns for a product, companies have identified a particular group of consumers that are likely to buy the product. A group of consumer is sometimes referred to as a demographic. Typically, a demographic is determined by external features of a consumer, such as age or geographic location. Although such demographics can be useful, other characteristics of consumers with a same age or location can vary significantly. Thus, a marketing campaign or other business decision based on such demographics can be faulty due to conflicting data since the consumers have disperse characteristics. Moreover, such demographics typically provide only very general information, and thus do not provide information for making specific business decisions.
Therefore, it is desirable to provide better methods of identifying groups of consumers and for providing more detailed information about the transactions of a group of consumers.
BRIEF SUMMARYEmbodiments provide systems, apparatus, and methods for determining groups of similar consumers and for identifying a trend in consumer behavior. Certain embodiments can use likelihoods of a transaction being initiated at various times to determine a group of similar users as a demographic (affinity group). The likelihoods at a plurality of times can be used to forecast trends (such as a demand for a product) and make business decisions, such as for marketing campaigns, inventory levels (e.g. at particular stores or for all stores), pricing, and store locations. Such likelihoods when focused to a particular category of transactions (e.g. a particular product) can provide even greater accuracy.
According to one embodiment, a method of identifying a consumer as belonging to a particular demographic is provided. Data respectively associated with transactions of a first entity and one or more second entities are received. A computer system identifies patterns of the first entity and patterns of the second entity. Each pattern includes a plurality of values, with at least two of the values respectively including contributions from transactions corresponding to different time ranges. Whether the first entity and the one or more second entities belong to a same demographic is determined based on a comparison of patterns of the first entity and the second entities.
According to another embodiment, a method of identifying a trend in consumer behavior is provided. Data associated with previous transactions of an entity are received. A computer system determines one or more patterns of the previous transactions. Each pattern includes a plurality of values, with at least two of the values respectively including contributions from transactions corresponding to different time ranges. Likelihoods for an occurrence of a transaction are determined according to the one or more patterns. Each likelihood is for the occurrence of the transaction at a respective one of a plurality of different times. A trend in occurrences of the transaction is identified based on the likelihoods at the plurality of different times.
Other embodiments of the invention are directed to systems, apparatuses, portable consumer devices, and computer readable media associated with methods described herein.
A better understanding of the nature and advantages of the present invention may be gained with reference to the following detailed description and the accompanying drawings.
Information about a group of consumers can be useful in making business decisions (such as shaping marketing decisions or inventory levels). However, consumers have typically been organized by broad external factors like age. Embodiments can provide more narrowly tailored affinity groups, e.g., ones that have similar transaction patterns. Such affinity groups can lead to greater accuracy in determining success of a business decision. Likelihoods of a transaction at various times can also be used to identify trends in certain types of transactions, and therefore allow accurate forecasting.
I. System Overview
The system 20 shown in
As used herein, an “acquirer” is typically a business entity, e.g., a commercial bank that has a business relationship with a particular merchant or an ATM. An “issuer” is typically a business entity (e.g., a bank) which issues a portable consumer device such as a credit or debit card to a consumer. Some entities can perform both issuer and acquirer functions. Embodiments of the invention encompass such single entity issuer-acquirers.
The consumer 30 may be an individual, or an organization such as a business that is capable of purchasing goods or services. In other embodiments, the consumer 30 may simply be a person who wants to conduct some other type of transaction such as a money transfer transaction or a transaction at an ATM.
The portable consumer device 32 may be in any suitable form. For example, suitable portable consumer devices can be hand-held and compact so that they can fit into a consumer's wallet and/or pocket (e.g., pocket-sized). They may include smart cards, ordinary credit or debit cards (with a magnetic strip and without a microprocessor), keychain devices (such as the Speedpass™ commercially available from Exxon-Mobil Corp.), etc. Other examples of portable consumer devices include cellular phones, personal digital assistants (PDAs), pagers, payment cards, security cards, access cards, smart media, transponders, and the like. The portable consumer devices can also be debit devices (e.g., a debit card), credit devices (e.g., a credit card), or stored value devices (e.g., a stored value card).
The merchant 22 may also have, or may receive communications from, an access device 34 that can interact with the portable consumer device 32. The access devices according to embodiments of the invention can be in any suitable form. Examples of access devices include point of sale (POS) devices, cellular phones, PDAs, personal computers (PCs), tablet PCs, handheld specialized readers, set-top boxes, electronic cash registers (ECRs), automated teller machines (ATMs), virtual cash registers (VCRs), kiosks, security systems, access systems, and the like.
If the access device 34 is a point of sale terminal, any suitable point of sale terminal may be used including card readers. The card readers may include any suitable contact or contactless mode of operation. For example, exemplary card readers can include RF (radio frequency) antennas, magnetic stripe readers, etc. to interact with the portable consumer devices 32.
The access device 34 may also be a wireless phone. In one embodiment, the portable consumer device 32 and the access device are the same device. For example, a consumer may use a wireless to phone to select items to buy through a browser.
When the access device 34 is a personal computer, the interaction of the portable consumer devices 32 may be achieved via the consumer 30 or another person entering the credit card information into an application (e.g. a browser) that was opened to purchase goods or services and that connects to a server of the merchant, e.g. through a web site. In one embodiment, the personal computer may be at a checkout stand of a retail store of the merchant, and the application may already be connected to the merchant server.
The portable consumer device 32 may further include a contactless element, which is typically implemented in the form of a semiconductor chip (or other data storage element) with an associated wireless transfer (e.g., data transmission) element, such as an antenna. Contactless element is associated with (e.g., embedded within) portable consumer device 32 and data or control instructions transmitted via a cellular network may be applied to contactless element by means of a contactless element interface (not shown). The contactless element interface functions to permit the exchange of data and/or control instructions between the mobile device circuitry (and hence the cellular network) and an optional contactless element.
The portable consumer device 32 may also include a processor (e.g., a microprocessor) for processing the functions of the portable consumer device 32 and a display to allow a consumer to see phone numbers and other information and messages.
If the portable consumer device is in the form of a debit, credit, or smartcard, the portable consumer device may also optionally have features such as magnetic strips. Such devices can operate in either a contact or contactless mode.
Referring again to
The payment processing network 26 may include a server computer. A server computer is typically a powerful computer or cluster of computers. For example, the server computer can be a large mainframe, a minicomputer cluster, or a group of servers functioning as a unit. In one example, the server computer may be a database server coupled to a Web server. The payment processing network 26 may use any suitable wired or wireless network, including the Internet.
As shown in
As used herein, an “incentive” can be any data or information sent to a consumer to encourage a transaction. For example, a coupon can be sent to a consumer as an incentive since the consumer can obtain a better transaction price. As another example, an advertisement can be sent to a consumer to encourage a transaction by making the consumer aware of a product or service. Other example of incentives can include rewards for making a transaction and preferential treatment when making the transaction.
The issuer 28 may be a bank or other organization that may have an account associated with the consumer 30. The issuer 28 may operate a server which may be in communication with the payment processing network 26.
Embodiments of the invention are not limited to the above-described embodiments. For example, although separate functional blocks are shown for an issuer, payment processing network, and acquirer, some entities perform all or any suitable combination of these functions and may be included in embodiments of invention. Additional components may also be included in embodiments of the invention.
II. Transaction Patterns and Demographics
Consumer activity can include transactions, among other things. Knowledge of a pattern of transactions of a consumer can allow identification of consumers with similar transaction patterns. Patterns of transactions can also allow identification of similar merchants, or other entities. The similar consumers can be identified as belonging to a demographic, also called an affinity group. Transaction patterns of a demographic can also be used to forecast business decisions for a company who has customers from the determined demographic. However, the identification of a pattern can be difficult given the enormous amount of data, some of which might exhibit patterns and some of which may not.
As used herein, the term “pattern” refers broadly to a behavior of any set of events (e.g. transactions) that have a likelihood of repeating. In one aspect, the likelihood can be greater than a random set of events, e.g., events that are uncorrelated. The likelihood can be expressed as a probability (e.g. as a percentage or ratio), a rank (e.g. with numbers or organized words), or other suitable values or characters. One type of pattern is a frequency-based pattern in which the events repeats with one or more frequencies, which may be predefined. To define a pattern, a reference frame may be used. In various embodiments, the reference frame may be or include an elapsed time since a last event (e.g. of a type correlated to the current event), since a beginning of a fixed time period, such as day, week, month, year, . . . (which is an example of a starting event), before an end of a fixed time period, or before occurrence of a scheduled event (an example of an ending event). Another event can be certain actions by the consumer, such as traveling to a specific geographic location or browsing a certain address location on the Internet.
Transaction patterns can also be used to determine whether consumers are similar, and thus whether they might be part of a same demographic. As a simple example, one demographic might be an pregnant woman or a new mother. Such consumers may have similar transaction patterns in buying products for a baby. But such a demographic may not easily obtained from age or other easily identifiable facts about a person. Also, certain consumers may have similarities for which a cause may not be so easily identifiable. In these cases, a demographic may not even be known to exist.
The pattern 220 of the first consumer can then be compared to the pattern 270 of the second consumer. As shown, the pattern 220 is similar to the pattern 270. In one embodiment, the similarity can be based on value of the patterns at various times, e.g., a comparison of the likelihood of a transaction at various times. In other embodiments, parameters of functions that define or approximate the pattern can be compared to determine if the patterns are similar. For example, a function can be a linear combination of prescribed set of basis functions (e.g. sines and cosines) times corresponding coefficients, which may be compared.
In one embodiment, the second consumer is a demographic, and the pattern 270 is a combination of patterns of consumers in the demographic. In such an instance, the number of occurrences of transaction may be quite larger, but can be normalized to provide an accurate comparison.
The identification of a pattern can have many difficulties. If the previous transactions 210 include all of the transactions of a consumer and exhibit only one pattern, then the identification of a pattern may be relatively easy. However, if only certain types of transactions for a consumer show a pattern, then the identification can be more difficult. Some embodiments can use keys (K1, K2, . . . ) to facilitate the analysis of certain types of transactions, where a key can correspond to a type of transaction. A key can also allow identification of transactions as being relevant for a current task (e.g. the key being associated with a transaction to be incentivized).
Adding to the complexity can be whether the path to a particular transaction has an impact on the pattern, e.g., a pattern that exists only when certain transactions precede or follow a transaction. Embodiments can store transaction data associated with a specific order of keys (e.g. K1, K3). In this manner, the data for that specific order can be analyzed to determine the pattern. The order of keys also allows the further identification of relevant transactions.
All of this complexity can be further compounded in instances where a certain path (sequence of two or more transactions) can have more than one pattern. Embodiments can use certain functional forms to help identify different patterns. In some embodiments, a combination of periodic functions are used, e.g., e−iwt, where w is a frequency of a pattern. In one embodiment, the frequencies are pre-selected thereby allowing an efficient determination of the patterns. Further, the frequencies can be identified by an associated wavelength, or wavelength range. Counters can be used for each wavelength range, thereby allowing a pattern to be very quickly identified by analyzing the values of the counters.
III. Determining and Using Affinity Groups
In step 310, data respectively associated with transactions previously performed by a first consumer and with transactions by one or more second consumers are received. In one embodiment, the one or more second consumers are already determined to be part of an affinity group. In one aspect, the affinity group can be considered as single entity with transaction patterns being for the entire group.
In one embodiment, the data in the transaction history database 26(b) can be received at a transaction analyzer 26(c) of system 20 in
This data can be used to identify a particular type of transaction. In one embodiment, the data for a transaction is parsed to identify one or more keys, which are used as identifiers for a particular transaction. In various embodiments, a key can includes parts of the transaction data and/or data derived from the transaction data. A key could also be composed of results from an analysis of a transaction, e.g., whether the transaction is a card-present transaction or a card-not-present transaction could be determined from the transaction data and included in the key. In one embodiment, a mapping module can perform the mapping of the transaction data to one or more keys.
A key can be composed of multiple pieces of data (referred to herein as a key element). A longer key has more key elements and may be a more selective identifier of a type of transactions. Each transaction can be associated with different keys, each with a different scope of specificity for characterizing the transaction.
In step 320, transactions are correlated with other transactions and events. In one implementation, transactions of the first consumer are only correlated with each other, and similarly for the transactions of the second consumers. In this manner, different transaction patterns can be identified for different types of transactions, and for different entities. Other events (e.g. start or end of a day, week, etc.) can be correlated to transactions as well. An event can also be a movement of the consumer from one state to another (e.g. from an at-home state to an on-vacation state). Different events can also be identified with keys. Herein, examples are used to described how keys are used to identify transaction types, but other suitable methods can be used.
In one embodiment, pairs of correlated keys (e.g. a key pair <KI:KF>) are determined based on whether transactions associated with an initial key (KI) are correlated with transactions with a final a final key (KF). A first (initial) event can be correlated with a later (final) transaction. The initial key and the final key may be the same or different from each other. For example, a transaction at one merchant may be correlated to a later purchase at another merchant, which might occur if the merchants are near to each other. In one embodiment, a group of more than two keys could be correlated together, e.g. a group of three keys can be correlated.
Two transactions can be correlated in multiple ways depending on how many keys are associated with each transaction. Thus, two transactions can contribute to more than one key pair, when the transactions are associated with multiple keys. For example, if an initial transaction is associated with two keys and the final transaction is also associated with two keys, then there could be four resulting key pairs. Also, a transaction may be correlated to another transaction only via certain keys.
In one embodiment, the transactions of a group of consumers can all be analyzed together to determine correlations of transactions having certain keys. Certain key pairs can be pre-determined for tracking. For example, a store may want to have transactions at a specific location (or all locations) tracked.
In step 330, a computer system identifies one or more first patterns of the transactions by a first consumer and one or more second patterns of transaction by the one or more second consumers. The computer system can be transaction analyzer 26(c), which can be a subsystem or one apparatus. In some embodiments, a pattern can be defined by a set of indicia, which can be a set of numerical values. The indicia can convey the likelihood of a transaction as a function of time. For example, pattern 220 conveys that transactions are likely when the function has a higher value. In one embodiment, each pattern includes a plurality of values (e.g. likelihood values) with at least two of the values respectively including contributions from transactions corresponding to different time ranges.
In one embodiment, pairs of correlated transactions (or other events) are used to determine a pattern, e.g., as times of final transactions related to initial events. The times can be stored as an absolute time and/or date for each transaction (e.g. in chronological order) or organized as elapsed times for correlated events of certain key pairs. The elapsed time may be the time between a transaction with K1 and the next transaction with K2 for the correlated <K1:K2> pair. Other data can be stored as well, e.g. data not included in the keys, such as an amount of the transaction. The elapsed time can effectively equal an absolute time if the initial event is the beginning of a time period.
In some embodiments, the time information is stored (e.g. in transaction history database 26b) associated with the corresponding key pair. For example, a key pair identifier (e.g. a unique ID number) can be associated with the stored time information. As examples of an association, a key pair identifier could point to the time information, the time information could be stored in a same row as the key pair identifier, and the key pair identifier could be stored associated with the pointer.
In other embodiments, the time information for the key pair <K1:K2> can be stored in a database table that can be accessed with a query containing K1, K2, or the combination (potentially in the order of K1:K2). For example, a search for K1 and/or K2 can provide the associated identifiers. In one embodiment, a hash of each key of a pair is also associated with the key pair identifier, so that information for each key can be indexed and found separately. For example, hashes of K1 and K2 can be stored in a lookup table so the key pair identifiers (and thus the key pair information) can be easily found.
In one aspect, storing time information in association with certain key pairs can allow the time information for specific types of transactions to be easily accessed. Also, such organization can provide easier analysis of the data to identify patterns for specific key pairs. The occurrences of the transaction can then be analyzed (e.g. Fourier analysis or other functional analysis) to identify a pattern of the times and dates of these transactions.
In step 340, the one or more first patterns are compared to the one or more second patterns. In one embodiment, a respective first pattern is compared to a corresponding (matching) second pattern. A matching second pattern can be a pattern with an exact same key pair, or a similar key pair. In one implementation, patterns can be compared by comparing numerical values of certain indicia of the patterns. In another embodiment, patterns that do not match are also determined and incorporated into the comparison.
In one embodiment, the indicia can be the likelihood of patterns at respective times. In one implementation, the likelihood is for any transaction by a consumer, and thus the entire transaction history can be used. In another implementation, the likelihood is for a particular key pair. When a particular transaction is being investigated, the relevant pattern can be found by querying a database using the key(s) of the particular key pair. In various embodiments, the indicia for a likelihood may be a number of transactions in a time range or the probability (or other measure of likelihood) at a given point in time, e.g., as calculated from a value of the pattern function at the point in time.
In step 350, whether first and second consumers belong to same demographic is determined based on the comparison. In one embodiment, corresponding pairs of indicia from two patterns can be subtracted from each other and compared to a threshold. In another embodiment, corresponding pairs of indicia from two patterns can be multiplied times each other and compared to a threshold. As mentioned above, non-matching patterns can also be used. In one aspect, more non-matching patterns can correspond to a lower similarity of the first consumer to the second consumers. In yet another embodiment, the first consumer can be similar to an affinity group (i.e. belong to a demographic) with varying degrees of similarity (e.g. by percentage of similarity).
In some embodiments, the indicia of the patterns can be input into a modeling function as part of the determination. In various implementations, the modeling function can be an optimization function (e.g. a neural network) or can be a decision tree (e.g. composed of IF THEN logic that compares the indicia). In one embodiment, an optimization function can be trained on previous transactions of multiple entities, and thus can determine how much a pattern of a particular entity (e.g. a consumer, merchant, or affinity group) is similar to a patter of another entity. In another embodiment, the number of keys associated with the transaction relates to the number of inputs into the modeling function. The relationship is not necessarily one-to-one as similar keys (e.g. ones of a same category) may be combined (e.g. same key elements, but just different values), but there may be a correspondence between the number of different types of keys and the number of inputs.
Other embodiments can determine a demographic in other ways. For example, a merchant affinity group can be determined by identifying transaction patterns for a certain set of stores, which can be grouped by merchant code. At least some of the consumer's transaction patterns can then be compared to the transaction patterns for a merchant affinity group. Consumers having patterns similar to the merchant affinity group can belong to a same consumer affinity group. As another example, a consumer can be sent an incentive to buy a new product (e.g. music) from a coffee shop, when the consumer is next predicted to visit the coffee shop to buy coffee. Once the consumer performs the transaction, the consumer can be partially defined by a certain music demographic according to the fact that the consumer bought music and potentially to a more narrow demographic of the type of music. As yet another example, a consumer can then be sent an incentive for a transaction consistent with a demographic to which the consumer may belong. The use of incentive can be used to determine whether the person is actually part of the affinity group. If the person does not use the incentive, then the consumer is less likely to be part of the affinity group.
In step 360, likelihoods of transactions are determined according to demographic patterns at a plurality of times. For example, after a demographic has been determined, the transaction patterns of the demographic can be used to determine the likelihood of a transaction at a plurality of times. The likelihood can be for transactions at any time, e.g. in the past and/or in the future. In one embodiment, the times include at least two different time ranges.
In step 370, a trend in the pattern for transactions can be identified based on likelihoods of transactions at the plurality of times. A trend can be the change (increase or decrease) of a likelihood from one time to another. In one embodiment, a trend can be identified by a likelihood of a transaction being above an upper threshold or being below a lower threshold. In such instances, a high or low likelihood can indicate that the likelihood is different from an average or outside of an expected range, and thus a trend can exist. In one aspect, a likelihood of a pattern can be integrated over time to determined an average level or range of likelihood.
A trend can also be related to a change in a pattern calculated at one time to a change in a pattern calculated at a different time. For example, transaction patterns for a demographic can be calculated at various times (e.g. periodically, such as every month). A pattern calculated at one month can be compared to the same pattern (e.g. same key pair) calculated at another month. The change in the likelihood of a transaction at various times can be used to determine a trend. For instance, the likelihood of a transaction at a particular time may have increased and thus a trend can be detected. The amount of change can be used to forecast a likelihood even greater than the predicted likelihood since there is a trend higher. In one embodiment, three or more patterns calculated at different times may be used to determine whether a trend actually exists. For each calculation of a pattern, the specific transactions being used to determine the pattern can change as new transactions may have been received and old transactions can be dropped.
In one embodiment, a similar process can be performed for the entities, such as merchants and groups of similar merchants. Transaction patterns of one merchant can be compared to transaction patterns of another merchant to determine similar merchants. As an example, knowledge of a similar merchant can be used to determine when fraud might occur at a merchant. Trends in transactions of a merchant can be used to determine inventory levels inventory levels (e.g. at particular stores or for all stores) and other business decisions based on forecasting.
IV. Analysis of a Pattern
If a pattern of when transactions occur is known, then the pattern can be used to determine when transactions are likely, and thus determine demographics and trends. For example, if a pattern (e.g. a pattern of transactions associated with specific keys) for one or more previous months is known, embodiments can use this pattern to determine a pattern for a future month (e.g. for same month next year or for a next month). The patterns can be analyzed in numerous ways, and
In one embodiment, the X axis can have discrete times. For example, only transactions for each day may be tracked. Thus, if the initial event was the start of a month, then the number of discrete time periods would have a maximum of 31 days. In such an embodiment, elapsed time values within a certain range can all contribute to a same parameter, and bars 410 may be considered as counters. For example, if the discrete times were by day, any two transactions that have an elapsed time of 12 days since a correlated KI event would both cause the same counter to be increased. In one embodiment, these counters are the time information that is stored as mentioned above. In some implementations, the time ranges do not all have the same length. For example, the time ranges closer to zero can have a smaller length (e.g. just a few minutes) than the time ranges further from zero (e.g. days or months).
A pattern 420 can be discerned from the elapsed times. As shown, pattern 420 has a higher value at elapsed times where more transactions have occurred. In one embodiment, pattern 420 could simply be the counters themselves. However, in cases where the time intervals are not discrete or have a small range, bars 410 might have zero or low value at times that happen to lie between many transactions. In these cases, certain embodiments can account for transactions at a specific time as well as transactions at times that are close. For example, as shown, a function representing pattern 420 begins curving up and plateaus near the cluster 460 of transactions to form a peak 430. In one embodiment, each time point of the function can have a value of a moving average of the number of transaction within a time period before and after (or just one or the other) the time point. In other embodiments, function can be determined from interpolation or other fitting method (e.g., a fit to periodic functions) performed on the counters.
Indicia of the pattern 420, e.g., the function values, can be analyzed to determine when a transaction is likely and when one pattern is similar to another pattern. In one implementation, peaks of the pattern 420 are identified as corresponding to times when a transaction is likely and troughs (e.g. 470) are identified as corresponding to times when a transaction is unlikely, both of which can correspond to a trend in the pattern. In one embodiment, a width of the function at specific values or times may then be used as a time window for identifying a trend. For example, a time window (e.g. a two day or 1.5 day period) of when transactions often occur (or generally do not occur) may be determined. The time window can be used, for example, to determine when and for how long inventory levels should be increased or decreased.
In one embodiment, a full width at half maximum may be used, such as the width of peak 430. In another embodiment, the window (e.g., 440) above a threshold value 450 is used, or just part of this window, e.g., starting at the time where pattern 420 is above the threshold and ending at the top (or other part) of peak 430. In yet another embodiment, the time window may have a predetermined width centered or otherwise placed (e.g. starting or ending) around a maximum or other value above a threshold.
In embodiments using a threshold, the value of the pattern function may be required to be above the threshold value before a transaction is considered likely enough to be related to a trend. As mentioned above, multiple threshold levels can be used, e.g., a lower and upper threshold defining a range outside of which a likelihood value can be identified with a trend. The use of thresholds encompass using the exact likelihood values, which can be equivalent to using many threshold levels. The modeling function mentioned above may be used to perform any of these determinations.
In one embodiment, a threshold determination could be whether a counter has a high enough value (absolute or relative to one or more other counter). In another embodiment, a threshold level can be relative (e.g. normalized) compared to a total number of transactions. A normalization or determination of a threshold can be performed by adjusting the level depending on the low values of likelihood of a pattern, e.g., a peak to trough height could be used. In one aspect, the troughs may be offset to zero.
Storing time information that includes a number of transaction at certain elapsed times, one can not only handle paths (such as initial key to final key), but one can also easily identify multiple patterns. Each peak can correspond to a different pattern. For example, each peak can correspond to a different frequency of occurrence for a transaction associated with the final key relative to an event (e.g. a transaction) associated with the initial key. In one embodiment, the time information for the elapsed times can be stored by storing a time of when both events occur. In another embodiment, time information can store the elapsed time as one value. In yet another embodiment, the time of one event might implicitly include the time of the initial event (e.g. when the first event is beginning of a month or other fixed time period).
From
V. Use of Periodic Functions and Counters
Some embodiments use certain functional forms to help identify different patterns. As mentioned above, periodic functions can be used, e.g., eiwt, where w is a frequency of the pattern. For example, each bar (counter) 410 of
where Cw corresponds to the counter value at the frequency w and w runs over all of the frequencies. Cw can be considered a coefficient of the periodic function eiwt at a particular frequency. Thus, conceptually, a probability can be calculated directly from the above formula.
In one embodiment, the frequencies are pre-selected thereby allowing an efficient determination of the patterns. Further, the frequencies can be identified only by the associated wavelength, or wavelength range. Note that in certain embodiments, the use of eiwt is simply a tool and the actual value of the function is not determined.
In one embodiment, each column 510 corresponds to a different time range. The time range may correspond to ranges mentioned above with reference to
In some embodiments, table 500 only has one row. In other embodiments, the rows of table 500 correspond to different dollar amounts (or dollar amount ranges). Thus, each time range may have subgroups for set ranges of amounts (e.g. dollar amounts). The organization is similar to a matrix, where a row or a column can be viewed as a group or subgroup. Although five amount ranges are shown, table 500 can have any number of dollar amounts. In some embodiments, there is only one row. i.e. when dollar amounts are not differentiated. Note that the convention of row and column is used for ease of illustration, but either time or amount could be used for either row or column (each an example of an axis). Also, the data for a table can be stored in any manner, e.g. as a single array or a two-dimensional array.
The values for the matrix elements 520 correspond to a number of KF transactions that have elapsed times relative to a KI event (e.g. a transaction) that fall within the time range of a particular column 510. In one embodiment, each newly received K2 transaction can cause a box (element) 520 of the table (matrix) 500 to be increased. The value of the matrix element (an example of a likelihood value) can be incremented by one for each transaction, or another integer or non-integer value. The value can also be a complex number, as described below. In another embodiment, a table can be required to have a certain total of all values, average of the values, minimum value in any matrix element, or other measure of the values in the table. Such a requirement can ensure that enough data has been received to provide an accurate pattern.
The values of the matrix elements can be used to determine the pattern for the key pair <KI:KF>, e.g. as part of step 330 of method 300. For example, matrix elements with high values relative to the other matrix elements can indicate a pattern of transactions in the corresponding time range, which can correspond to a particular frequency w. In another embodiment, one could view each matrix element in isolation to determine whether a transaction is likely. For example, if a matrix element exceeds a threshold value, it may be determined that a transaction is likely to occur in that time range. The threshold can be determined in various ways, for example, as a function of a sum of all of the matrix elements, or equivalently can be fixed with the matrix elements being normalized before a comparison to a threshold. Thus, step 330 can be accomplished easier based on how the time information is stored.
As mentioned above, the time ranges can all be of the same length (e.g. 24 hours) or be of varying lengths. In one embodiment, the first column is of very short time length, the second column is of longer time length, and so on. In this manner, more detail is obtained for short wavelengths while still allowing data to be stored for long wavelengths without exhausting storage capacity. In another embodiment, dollar amount ranges are progressively structured in a similar manner as the time ranges can be. In one implementation, the dollar amount range can be used to track the likelihood of transactions having certain dollar amounts.
The wavelength λ of a pattern corresponds to the time range of a column. For embodiments, using time relative to another transaction, then the λ is the time between transactions. In one embodiment, 16 time domains (ranges) are selected as follows: λo is under 1 minute, λI is between 1 minute and 2.7 minutes, λ2 is between 2.7 minutes and 7.4 minutes, λ3 is between 7.4 minutes and 20 minutes, and λ15 is over 1.2 million minutes.
The amount values can also be used to determine patterns for transactions of certain dollar amounts. If the amount is not of concern, then the values in a column can be summed to get a total value for a certain time range. The amounts can also be incorporated into the mathematical concept introduced above. For example, in mathematical notation, a value function can be defined as
where A is an amount of a transaction.
When a transaction is received, the amount and corresponding elapsed time for a particular key pair can be used to determine a corresponding matrix element for the key pair table. The values in the matrix elements can be normalized across one table and across multiple tables. For example, a value can be divided by a sum for all the values of a particular key pair table. Also, a sum can be calculated for all values across multiple tables, and the values for each table divided by this sum. As part of a normalization, the value for a matrix element may be decreased when some of the data used to determine the value becomes too old. For example, for a time range that includes short time intervals, counts from transactions that have occurred more than a year ago may be dropped as being out of data since short timeframe patterns can change quickly.
In various embodiments, tables for different key pairs can have different time ranges and/or amount ranges. If such differences do occur, the differences can be accounted when a summing operation is performed. In one embodiment, the values in the matrix elements can be smoothed to account for values in nearby matrix elements, e.g., in a similar fashion as pattern 420.
In another embodiment, tables for different consumers can be compared to determine affinity groups. For example, tables with matching or similar key pairs can be subtracted (lower value more similarity) or multiplied (higher value more similarity). The closer the tables are, the more similarity (e.g. as a percentage) the consumers are, where non-matching tables can be used for normalization. In one example, one set of tables can correspond to the affinity group, and the calculation can be used to determine whether a person is within the affinity group.
In other embodiments, specific amount ranges or time ranges can be suppressed. For example, if only certain types of patterns (e.g. only certain frequencies) are desired to be analyzed, then one can suppress the data for the other frequencies. In one embodiment, the suppression is performed with a mask matrix that has zeros in frequency columns and/or amount rows to be suppressed. Thus, one can just multiply the matrices to obtain the desired data. The amount ranges can be similarly suppressed. When suppressing certain frequencies, these mask matrices can act similarly to a high pass, low pass, or notch filters. For example, if one wanted a coupon to be good only for 7 days, and it takes 1 day to create the coupon, the desired time window is any time range that includes those 6 days. Accordingly, the time information for transactions outside the time window can be suppressed as not being of interest.
Regarding the creation and updating of such tables, after an event (e.g. a consumer transaction) is received, embodiments can determine which tracked key pairs have finals keys that match with the keys resulting from the transaction. As a transaction can be associated with many keys and key pairs, a transaction may cause many tables to have a matrix element updated. For example, the transaction may cause different tables for a specific consumer to be updated. The updates could be for one table for all transactions by that consumer (an example of a general table), and more specific tables for particular zip codes, merchants, and other key elements. The transaction can also cause updates of tables for the particular merchant where the transaction occurred.
As there are different tables that can be updated, each with a different initial key, the time range (and thus the matrix element) that is updated may be different for each table. For example, when elapsed time is used, the last transaction for each table may be at a different elapsed time since the different initial transactions. The transaction amount would typically be the same, thus the exact row for the matrix element to be increased can be the same, as long as the tables have the same amount ranges. But the column (i.e. time) could be different for each table.
Regarding which time column to update, there can also be more than one column updated for a particular table. For example, a K2 transaction may have different time patterns relative to K1 transactions (i.e., <K1:K2> pair). Accordingly, when a K2 transaction is received, elapsed times from the last two, three, or more K1 transactions could be used to update the table.
In a similar manner, one key pair table could be <*:K2>, which includes correlations from a plurality of initial keys to the K2 key in the same table. Effectively, this table could equal the sum of all tables where K2 is the final key for a particular consumer or other entity. However, if the individual key pairs are not significant enough, the <*:K2> table may be the only table that is tracked. Tables of the type <K1:*> could also be tracked.
VI. Impedance (Likelihood of Another Transaction)
Besides being able to predict when a particular transaction will occur, embodiments can also predict if another transaction is going to occur after a current or a predicted transaction, which is referred to as impedance. In some embodiments, such information can be tracked by using complex numbers for the matrix elements of the final event, with the imaginary part corresponding to the impedance. In other embodiments, the impedance can be tracked simply using another number for a matrix element or using another table. In one embodiment, impedance values can be used as indicia in the comparison of two patterns, e.g., to determine a demographic. In another embodiment, impedance values can be used to determine to identify a trend. For example, if impedance values change from one time range to another that can signal a trend or if impedance values change from one calculation of a pattern to the next calculation of a pattern.
In such embodiments, the imaginary part of a matrix element can correspond to an impedance that measures how likely it is that another transaction will occur. Such values can be tracked for individual consumers and/or groups of similar consumers (affinity groups). The likelihood can specifically correspond to a future transaction being correlated to the current transaction having the time range and dollar amount of the matrix element. The real value of a matrix element can correspond to the probability that the KF event will occur, and the imaginary value can relate to the probability that another event will be correlated to the KF event. The imaginary part can be updated when another transaction is correlated to the KF event of the specific time and amount. In one embodiment, a table can have just one impedance value for the likelihood of any transaction occurring later. Thus, just one imaginary part could be stored for an entire table. In another embodiment, the imaginary parts could be different for each matrix element.
In an embodiment, a low impedance (e.g. a large negative imaginary part) for a matrix element means that there is a high probability that another transaction is going to occur, and a high impedance (e.g. high positive value) means that it is unlikely that another transaction is going to occur, with zero being indeterminate. The implication of negative and positive values can be swapped. In another embodiment, a high impedance is provided by a low number (negative or positive), with larger numbers providing low impedance, or vice versa. Certain future transactions can be ignored (e.g. not counted) in determining impedance, for example, if the dollar amount is too low.
In one embodiment, the imaginary number can be tracked (e.g. increased or decreased) in the following way. (1) When a KF event is received, each of the key pair tables that have the transaction as the ending event are increased in real part of the appropriate matrix element, with an elapsed time measured from the respective starting event KI. (2) For each key pair table, the specific KI event to which KF was correlated is determined. Then for that KI event, each K0 event to which KI is correlated as a final event is determined, and the appropriate matrix elements of specific tables are determined using an elapsed time between the specific KI and K0 events. (3) The imaginary part of the individual matrix elements can then be adjusted (e.g. decreased to obtain a reduced impedance) to reflect a higher likelihood that a another transaction follows KI, since the KF event did indeed follow. If all of the matrix elements for a table have the same imaginary part, then the specific KI event does not need to be known, just the tables that have the key for an ending KI need to be known, which can be determined with filters operating on the final keys.
In another embodiment, the imaginary part could be updated in a forward manner. (a) A KF event is identified. Each of the key pair tables that have KF as the ending event are increased for the real part in the appropriate matrix element, with a KI event being the starting event. In one aspect, KF might not have just come in, but could be part of a whole collection of events being processed. (b) Then specific K2 final events that correlate to the KF event as an initial event are identified. (c) Depending on the number of K2 transactions, the imaginary part of the appropriate matrix element can then be adjusted (e.g. increased, decreased, set, or reset). At this point, the imaginary part for just one matrix element (e.g. the matrix element from (a)) of tables for KF could be determined. Or, all of the other matrix elements of the tables could also be determined as well based on the value for the specific matrix elements just determined. For example, all of the other matrix elements of a table can be updated to reflect that the K2 transaction occurred. This can be done when all of the imaginary parts are the same, or if just one value is stored for an entire table.
In one embodiment, the default for the imaginary part can be set at zero or some average value for a likelihood that a transaction occurs. If after a certain amount of time, there are no transactions correlated to it, then the value might increase and continue to increase. Or the default could be set at a high impedance, and then lowered as more transactions occur. In another embodiment, if the future transaction is fraudulent, then the complex part can also be changed to reflect a higher impedance since a valid transaction does not occur. In another embodiment, if a decline occurs after a transaction then the impedance is increased (e.g. the imaginary part is decreased by one), if an acceptance occurs after a transaction then the impedance is decreased (e.g. the imaginary part is increased by one).
In this way, one can determine the specific instances where the transaction is a dead end (i.e. not leading to other transactions), and other instances where the transaction leads to other transactions. A high impedance would convey that the transaction is a dead end as no further transactions occur very often. Conversely, one can determine that a transaction is a gateway to many other transactions when it has a low impedance. In one embodiment, an average or sum of all of the imaginary parts of the matrix elements can be used to determine whether any future transaction is likely.
Some embodiments can aggregate the imaginary part over all KI correlated to a KF to determine a total likelihood that a KF will provide more transactions. Thus, one can incentivize KF if the likelihood is high. Each of the tables with KF as the final event can be used to determine exactly when to send an incentive and what the incentive might be.
In one embodiment, one can see a dead end for one affinity group, but then look at another affinity group that does not show this dead end. An analysis can then be made as to why the one group dead ends, and strategy developed for causing the dead end not to occur (e.g. sending a coupon, pre-authorization, or other inventive). For example, one can identify stores that the one affinity group does go to after the transaction, and send coupons to that store. As another example, one can identify stores geographically near a merchant that is a dead end and send a coupon for a nearby store, even potentially for use within a short time period after a predicted visit to the dead end merchant. After seeing if a strategy works by sending coupons to a couple people in an affinity group, coupons can be sent to more people in the affinity group (including people just partially in the affinity group).
Instead of or in addition to the above use of imaginary values for impedance, greater impedance can also correspond to fraud. If a fraud transaction K2 is found to correlate to a transaction K1, then the <KI:K1> matrix elements (or just a specific element) can have the impedance increased. Thus, the impedance can reflect the profitability of the present transaction. For example, certain transactions happening right after buying a concert ticket can be associated with fraud, which is an example of where each matrix element may have its own complex part.
In some embodiments, both real and imaginary parts of a matrix element can contribute to an overall value, which can be used to determine whether a trend exists or patterns are similar. In other embodiments, values for the real or the imaginary components can be analyzed separately to determine whether a trend exists or patterns are similar.
VII. Determination of Demographic
Once the relevant patterns (e.g. key pair tables resulting from a filtering process) are obtained for two entities, the patterns can be compared to determine indicia regarding a similarity between the patterns. The indicia can be used to determine whether a person belongs to a particular demographic.
The calculation of the indicia can be performed in numerous ways. In one embodiment, the indicia can include specific matrix elements corresponding to a type of transaction. The matrix elements can be modified (e.g. normalized) and/or summed to provide an overall indicia. In another embodiment, the indicia can result from an operation combining values from patterns of the two entities. This section describes some embodiments for obtaining relevant indicia of a likelihood for an event from certain event patterns.
A. Determining Similarity of Transactions to Established Patterns
Whether a specific transaction or groups of transactions by one entity is similar to transaction patterns of another entity can be of interest. Such a similarity can be used to determine a demographic, which in turn can be used, for example, to predict transactions or authorize a current transaction. To determine a similarity, the current transaction can be compared to established transaction patterns. In an example where one or more recent transactions are received, these recent transaction(s) can be compared to established transaction patterns of other entities (e.g. a specific demographic).
In this simple example, suppose the first transactions are associated only with K1, and K1 is correlated only to K2. Then a table <K2:K1> can be created with a number of transactions in the proper matrix elements of table 610 relative to previous correlated K2 transactions. Table 610 shows non-zero values for the first dollar amount and the fourth time range, second dollar amount and second time range, and third dollar amount and fourth time range, with zeros in the other matrix elements. Then, matrix elements of table 610 can multiply the corresponding matrix elements of the <K2:K1> key pair table 620 (which has been matched and retrieved).
In one embodiment, the resulting indicia 630 is single value showing an overlap between the tables. As shown, the result is 11*2, 8*7, 15*0, and 6*0 equaling 78, which can be normalized later. The value of this matrix element 630 can (e.g. when normalized) can provide a measure of a likelihood of similarity between the two entities. In this example, the larger the value the higher the similarity. The more transactions that occur with a particular dollar and time in table 610 that correspond to non-zero elements in table 620, the higher the resulting value, and thus the patterns are more similar. In one aspect, the indicia can be normalized to account for the total number of transactions in both tables, and thus a more accurate percentage of similarity can be determined. In another embodiment, all transactions with any dollar amount can be summed before multiplying, which would give 17*2, instead of 11*2 and 6*0. The different embodiments might depend on the specific demographic being investigated.
Overall, multiple tables of the first entity might result for a K1 transaction. Also, multiple tables may be used in the comparison. The values can be aggregated (e.g. summed) or analyzed separately to determine a likelihood of similarity.
B. Multiplying Tables—Alignment
There may be instances where the key pair for a table of one entity is not found in the key pairs of tables of another entity. When this occurs, a first table may be aligned with a second table to determine a matching table for multiplying. In one embodiment, a key for the first table can be broadened until a match occurs.
For example, a first table can have a final key of :4812,345>, where 4812 is the merchant code and 345 is the first three digits of a zip code. However, the tables of a second entity may not contain a table with this final key. This may be because the consumer has few transaction in zip codes starting with 345. But the first table may still contain useful information as to a similarity of the entities. Thus, the key :4812,345> can be broadened to be :4812,*> so that it matches with a table of the second entity. The zip code can be broadened in one step or incrementally to :4812,34>, :4812,3>, and then :4812,*>, where a match is found.
Such alignment can be performed between sets of key pair tables. In a general sense, a set of key pair tables can be viewed as a key manifold. When the key manifolds are normal (i.e. both spaces have identical amounts of keys), then one can apply the operations directly. However, if the key spaces are not normalized, then an alignment may be performed.
In one embodiment, each table of one manifold is aligned with exactly one table of the other manifold. In another embodiment, there may not be a match found for a table from one manifold to another. In such a case, the non-matching tables can be dropped, or distinguished from tables that did match after alignment. A distinction can also be made between tables that only match after alignment and tables that match exactly. For example, it may be useful to know what the entities do that is not the same (no match), or maybe just similar (match after some alignment). Also, other operations besides multiplication can be performed, such as division, subtraction, and addition.
C. Other Calculations of Likelihood
With this framework of aligning and multiplying keys, more complicated calculations of likelihood can be performed. Other operations, such as division can be used. A purpose of division can be to normalize a key manifold (i.e. a set of tables).
Second tables 710 can be obtained from transactions across multiple entities, thereby using a combined entity (e.g. an affinity group). In one embodiment, the specific set of tables have a common key element, e.g., transactions for a specific merchant or during a specific month. Other key elements can be used, e.g., zip code, country, or any other suitable key element.
Total transactions tables 720 can be obtained from all transactions across all or many entities. In one embodiment, the total transactions tables 720 are obtained from the same entities as tables 710. In another embodiment, the total transactions tables 720 are obtained from the same entities as tables 710 and the first entity. Similar to second tables 720, total transactions tables 720 can share a same key element, for example, the same key element as in second tables 710. The total transactions can include fraudulent transactions and valid transactions, or just valid transactions.
Once the tables are aligned, the total transaction tables 720 can be used to normalize the second tables 710 by dividing a second table by the corresponding total transaction table. After the division, the normalized tables can be stored in RAM (or any other memory with faster access than disk). As with
In one embodiment, each second table is aligned with exactly one transaction table. For example, if there are 100 second tables tracked (i.e. for a given group having a common element, such as month), then 100 tables result from the alignment and division. Note that the alignment can be implicit in the notation of a division operation. In some embodiments, there may not be a match of a second table to a total transaction table, although this may happen rarely. In such a case, the second table may be dropped, and thus there may be fewer resulting tables than second tables. In an embodiment, one can differentiate second tables that do not have a match from tables that did match, or between tables that only match after alignment and tables that match exactly.
First tables 730 can then aligned with the normalized tables. Before alignment, some or all of the first tables can be summed. In one embodiment, two first tables can be summed when the keys are similar. In effect, the final transactions for each of the tables can be considered to be of a same type, i.e. have the same key. For example, if the merchant is the same, but the zip codes are different, the two tables can be merged and the zip code dropped or broadened (which can be considered an intersection of the two key pair tables). This summing may be particularly appropriate when both tables would be aligned with a same second table. In such a case, a summing after multiplying the first tables by the normalized tables provides the same resulting table.
After alignment, the first tables can be multiplied element-by-element with the normalized tables, thereby providing a plurality of resulting tables. In one embodiment, these resulting tables can be summed to provide one final table, or one final value. In one aspect, the summing can be due to the second tables 710 being grouped to have a similar key element, and thus the final table can relate to the one key element. This final table can provide a measure of an overall similarity of the transaction patterns, and can be used (e.g. by a modeling function) to determine a likelihood of similarity. In another embodiment, each of the resulting tables can independently be final tables that are used by a modeling function.
In another embodiment, a mask matrix can be used to remove certain matrix elements from the resulting tables or from the final table. For example, the mask matrix can remove low frequency or high frequency components, or be a notch filter to select frequencies in the middle. Also, certain dollar amounts can also be removed. In one implementation, the mask matrix has 1s in matrix elements that are to be kept and 0s in matrix elements that are not to be analyzed.
Although second tables 710 were normalized, the final table(s) may still have matrix elements with values that can vary widely. This variation in values can cause instability in a modeling function, which uses the matrix element as indicia of the patterns to obtain a total likelihood. Accordingly, in some embodiments, constants matrix 740 is used to constrain the final matrix element to be within a certain range of values, e.g., between −1 and 1 or 0 and 1. In one embodiment, constants matrix 740 is created from a specified functional form, such as tan h, log, or sigmoid (generally S shaped) functional form.
Constants matrix 740 can also constrain matrix elements values to correspond to a third number within the prescribed range. For example, a zero output can be mapped to a matrix element value where similarity may be more difficult to determine and thus sensitivity needs to be greater. In one embodiment, the functional form of constants matrix 740 can be kept for an extended period of time, where inputs of specific matrix element values (e.g. maximum and minimum values in a specific table) are used to determine the exact values. Which count corresponds to zero may also have an input parameter. The functional forms may be constant or vary across multiple entities.
The calculation shown in
In one embodiment, the normalized tables (and potentially the first tables) can be stored across multiple processors and each one can perform the corresponding multiplication if there is a match to an first table. As an alternative, a query can be provided to each processor and the processor that is storing the desired second table can return the requested table. The final table(s) can be provided to a single processor or set of processors that are configured to run a modeling function.
D. Subtraction of Matrices
Besides comparing different entities by multiplying their respective key manifolds (sets of key pair tables), the manifolds can also be subtracted from each other. Each table (which can be normalized) can be aligned and the elements subtracted. If the difference is large, then the two manifolds are less likely to be part of the same affinity group. Embodiments can also analyze tables that do not match exactly, and ones that match only after an alignment procedure. The values of these matrices can be analyzed along with the differences for the tables that did match.
E. Representative Consumers
Beyond just tracking ones showing a correlation, the number of tables being tracked for a consumer can be reduced in other ways. In one embodiment, to reduce the number of tables, different zip codes of a consumer could be considered the same. For example, a salesperson might have similar transaction history in cities on his/her route. All of these zip codes can be considered to be equivalent, and thus put into one table. A similar treatment can be made for other key elements. Also, in a similar manner, different consumers can be treated as being equivalent.
The knowledge of which affinity groups a consumer is similar can also be used to reduce the number of tables being tracked for a consumer. As mentioned above, certain consumers can be grouped into affinity groups (demographics) in order to obtain more information about correlated keys. In one embodiment, the affinity groups may be used to define a consumer. For example, if a group of consumers had identical or substantially similar transaction histories then separate tables do not need to be stored. When a consumer of the affinity group made a purchase, the tables of the affinity group could be accessed.
Accordingly, the amount of total storage can be reduced by defining a consumer by his/her affinity groups. Instead of storing the redundant tables for each consumer, just one set of tables can be stored. However, it may be uncommon for a consumer to have the same transaction patterns as a particular affinity group. In such cases, the consumer can be defined as being a linear combination of affinity groups, which are each a combination of key pairs. For example, there can be 100 affinity groups AG. A consumer can be defined as 10% AG1, 15% AG 23, 30% AG 41, 20% AG 66, and 25% AG 88. A consumer can also have individual tables that are used in combination with the affinity group tables.
A consumer can thus be viewed as a combination of representative consumers (affinity groups). Which representative consumers to use for a particular consumer can be determined by sampling transactions of the consumer. The sampling can be relatively small compared to the total transaction history that would be used if the consumer had his/her own tables. For example, the sample transaction of a consumer can show that he/she has similar transactions to a certain demographic group. The percentage for each affinity group can be determined by a similarity in the tables for the consumer relative to the affinity group.
In one embodiment, the similarity can be defined by taking the difference between each of the matrices (tables) of the consumer from each of the matrices of the affinity group. Where a table exists for the consumer, but not for the affinity group, a table of zeros can be assumed for the affinity group, and vice versa. As an alternative or in combination, an alignment mechanism (e.g. method 700) can be used to obtain more matches between the set of tables for the consumer and the set of tables for the affinity group.
In one aspect, tables for a consumer could be created initially until one or more affinity groups are identified. Certain tables for a consumer could still be kept to ensure that the transaction patterns do not change, thereby causing a need to reevaluate the specific linear combination being employed. Thus, transaction data can be used to determine which demographic a person fits into, as opposed to determining what a demographic is based on similar consumers.
In one embodiment, if no transaction history is available for a consumer, the consumer may be approximated using selected representative consumers. The selected representative consumers can be based on attributes, such as age or residence. In another embodiment, a representative consumer can be built for each zip code. This representative consumer can be used as a default for a consumer living in that zip code, at least until more information is obtained.
In another embodiment, a coupon can be used to probe a consumer to determine if he/she belongs to a specific group. The coupon can be for a certain product and/or merchant that is highly correlated to a certain affinity group. If the customer uses the coupon, then there is a higher likelihood that the consumer is at least partially included that affinity group.
VI. Determining Important Trends or Changes in Trends
To predict a likelihood of a future event, some embodiments can obtain the relevant key pair tables for the entity (e.g. a consumer or a demographic) and then analyze these tables. Which tables are obtained and how they are analyzed depends on exactly what events are trying to be predicted, i.e. the question being answered. In some instances, one may be interested in a particular transaction (e.g. a particular product or merchant) for marketing purposes. However, one generally may want to know trends (a transaction pattern or changes in a transaction pattern) in consumer activity and changes in trends, but it may be difficult to identify trends. Thus, automated ways to identify transaction patterns, potentially having specific features, and to identify patterns that are changing can be provided.
In step 810, data for one or more recent and/or upcoming events is received. In one embodiment, the event data (e.g. transaction data) is associated with one entity, e.g., a particular consumer or affinity group. For recent events, whether an event is “recent” can be relative to other events. For example, if an event does not occur often, a recent event (e.g. a last event of that type) can still occur a long time ago in absolute terms. For an upcoming event, the event has not occurred yet, but can be known to occur. For example, the start of a month (or other time period) has a known time of occurrence. As another example, a scheduled event (such as a sporting event or concert) can be used. Data for these scheduled events can be obtained before they occur due to the nature of these events. In one aspect, recent and/or upcoming events are used to identify patterns that may exist in the future as opposed to patterns that have existed previously.
In step 820, the event data is used to map each event to one or more keys KI. In some embodiments, the mapped keys KI are specifically keys that are being tracked for an entity. In step 830, tables of patterns that have an initial key of KI are obtained, thereby providing <KI: tables relevant to the received event data. In one embodiment, a matching and retrieval function identifies the relevant tables using methods described herein. The matching and retrieval function can also match tables that do not have the exact same key, but similar keys. A similar key can be a broader version (e.g. first 3 digits of a zip code) of a more specific key (e.g. 5 digit zip code). Examples of when such alignment would be performed include: when a specific key for a current transaction is received, but only a broader version of that key is being tracked; and when two entities are being compared and different key pairs are tracked. In embodiments where an event is an upcoming event, the upcoming event can be a final event (or effectively the time ranges can be negative with the upcoming event being an initial event), where transactions before the ending event are analyzed.
In step 840, the <KI: tables having matrix elements with sufficiently high or low counts are identified, e.g., to determine KF events that can be part of a trend. The trend may be a change in likelihood, which can be identified when a count is outside of a band of expected or average values. In one embodiment, to determine whether a matrix element has a sufficient count, one or more absolute or relative threshold numbers can be used (e.g. below a lower and above an upper bound). A relative threshold (e.g. a percentage) could be determined using a total number of counts for a table or group of tables. In another embodiment, all tables (i.e. not just ones with a matching KI for initial key) could be analyzed to find matrix elements with high or low counts, thereby eliminating steps 810 to 830. However, using recent or upcoming events can provide greater timeliness for any result, or action to be performed based on a result. The identified KF events along with the specific time ranges for the matrix elements with the high or low counts can then be analyzed.
In step 850, other matrix elements not previously identified are obtained for each likely KF event. For example, a KF event can be correlated to more initial keys than just the ones identified in step 820. These previously unanalyzed tables can also have counts outside of a band for certain matrix elements involving a KF event. The KF event can be used as a filter to identify unanalyzed tables, from which other high or low count matrix elements can be obtained. Thus, this step can be used to obtain a more accurate likelihood for a specific KF event. Obtaining these other matrix elements may not be needed, e.g., if KI is starting event, such as a beginning of a week, month, etc. In this case, since other tables might include the same data points, these other tables could just include redundant information.
Also, other matrix elements for KF events can be important if high accuracy is desired. For example, as the timeframes of the different :KF> tables can be different (due to different KI events), matrix elements having relatively average counts can correspond to the same timeframe as a high or low count matrix element. Thus, the number of counts for a likely time range can be revised.
In this manner, high probability KF events can be determined based on a few recent or upcoming KI events, and then a full analysis of :KF> tables can be performed, as opposed to randomly selecting KF events to determine when they might be likely to occur. A KF event could be chosen for analysis, but a selected KF event might not be highly likely. However, if one were interested in a specific KF event, then it may be desirable to start method 800 at step 850.
In step 860, the matrix elements (e.g., just from step 840 or also from step 850) are combined to obtain a probability distribution vs. time for a :KF> event, which is correlated to many <KI: events. In one embodiment, each of the matrix elements for the KF event are combined from a portion or all of the <KI:KF> tables, where KI runs over the initial events that are correlated to the KF event. This combination can account for the fact that the different KI events occur at different times, and thus the time ranges for each table can be different (e.g. offset).
In one implementation, the earliest or latest KI event can be identified, and offsets for the time ranges of the other tables can be determined. The corresponding matrix elements can then be added using the offsets. If a time range of a matrix element of one table only partially overlaps an offset time range of another table, then the combination can be broken up into more time ranges with proportional contributions from each previous rime range. For example, if two time ranges overlap, then three time sections can result. The overlap section can receive contributions (i.e. a percentage of the counts) from the two matrix elements, with the amount of contribution proportional to the amount of overlap in time for the respective time ranges.
To determine a time range of high likelihood, a probability distribution can be created from the resulting time ranges X after the combination and the counts Y for each time range. The resulting time ranges X with the respective counts Y can be analyzed as a function Y=F(X), which can correspond to pattern 420 of
In step 870, a trend is confirmed for a particular time or time window (e.g. time range of the identified matrix element) by analyzing the probability distribution. In one embodiment, a total likelihood for a KF event (e.g. across multiple initial events) can be calculated to confirm that the likelihood is still outside the average or expected band. Likelihood values near the time range can also be examined to identify regions of significant change in probability. A specific time window may correspond to a predetermined time range of a matrix element, or be another time range that results from an overlap of multiple time ranges. For example, if two matrix elements overlap in time (e.g. because the KI events occur at different times), then the time window may have the range of the overlap time. In one aspect, the trend can be for a one or more specific amounts of the transaction, which can be selected by multiplying with a mask matrix.
Besides confirming a time window, any new time windows related to a trend can be identified. To determine a time range of high or low likelihood, the probability function F can be analyzed. For example, the function F can be analyzed with a numerical routine to identify a maximum, minimum, or regions having values outside of a range. To identify maximum or minimum regions, techniques such as finite difference, interpolation, finite element, or other suitable methods, can be used to obtain first and second derivatives of F. These derivates can then be used in an optimization algorithm for finding a maximum. Global optimization methods can also be used, such as simulated annealing.
In addition to finding a time window when an event is part of a trend, a total probability over a specific time period can be obtained. In one embodiment, the function F can be integrated (e.g. sum counters for time ranges) over the desired time range. In effect, to obtain a probability that an event will occur within a prescribed time period, one can integrate contributions over all of the relevant key pairs during the time period. As an example with one key pair, a probability that someone will perform a certain event (e.g. a transaction) once they are visiting San Francisco can be obtained by integrating the key pair <SF:KF> over all of the desired time periods. In one aspect, time periods of greater than one month may not be relevant if a person never stays in San Francisco for that long (which could be identified from a location of a person's phone or by locations of transactions). One could also determine a probability for a transaction to occur in November in a similar way.
As an alternative to all of the above steps, one can select a particular event and a particular time, which can be used to select the relevant patterns from which the corresponding matrix element can be analyzed. If the tables indicate a desirable likelihood (e.g. relative to threshold values), then a trend can be determined. The probability distribution can also be analyzed, starting at the particular time, to find a stronger trend (e.g. increasing or decreasing probability values), or to find a trend if one did not exist at the particular time.
In one embodiment, the relevant patterns from which the corresponding matrix element are selected by creating a set of key pair tables with 1 or other non-zero values in the appropriate matrix elements. These tables are then multiplied by the saved tables (i.e. known patterns) to obtain the likelihood, effectively filtering out the desired values. Besides a particular time, a time window can also be specified, which may cause more than one matrix element in a table to have a non-zero value. In this case, the non-zero values can be based on a level of overlap of the time window with the corresponding time ranges of the matrix elements.
Referring back to method 800, in step 880, the currently calculated probability distribution can optionally be compared to other probability distributions calculated previously. In one embodiment, the previously calculated distributions are for the same :KF> event. In this manner, a change in probability at a particular time in one distribution to another can be determined. This change can be used to determine a trend. For example, the change can be plotted and a trend in the change can be determined. The change can be used to predict a trend even when the likelihood values fall within an expected or average band.
In step 890, a course of action can be determined based on a determined trend. Various example actions and determinations are now described. Examples of actions include marketing campaigns, inventory levels inventory levels (e.g. at particular stores or for all stores), pricing, and store locations. In one embodiment, the future time of the trend and its relationship to the present time can impact the action taken. For example, if the time window starts soon, an action that can be performed soon (e.g. discounting price) can be initiated. Whereas if the time window does not start for an extended period of time, an action that takes longer (e.g. moving inventory or opening a store) can be performed.
Also, once an event is found to be likely, further analysis can be performed t o determine whether and how and action is to be performed. For example, a cost of an action, such as the cost of moving inventory or discounting the price of a product, can be determined as part of a cost-benefit analysis. The analysis can also include situations where inventory levels are high, and thus the product needs to be sold quickly. In one embodiment, the cost of an action can include a possible loss due to fraud, which can be calculated by comparing a transaction pattern of an entity (e.g. a demographic) to patterns known to be fraudulent (e.g. by multiplying tables of the entity against tables of a fraud entity). In another embodiment, a profit of an event can be determined, e.g., the profit from a transaction in which a trend is identified. If the profit is high, then a higher cost and lower trend can be tolerated.
In one embodiment, calculations for the prediction of an event can be run in real time (e.g. within several hours after an event or series of events occur). In another embodiment, the calculations can be run as batch jobs that are run periodically, e.g., daily, weekly or monthly. For example, a calculation can run monthly to determine who is likely to buy a house, and then a coupon for art, furniture, etc. can be sent to that person. In various embodiments, prediction of major purchases can generally be run in larger batches, whereas prediction of small purchases can be run in real-time (e.g., in reaction to a specific transaction).
In some embodiments, ending events also can be used similarly to predict what may happen before the event. Since the occurrence of an ending event can be known ahead of time (e.g. scheduled for a particular time), the correlated initial events can still be predicted. For example, consumer activity prior to a schedule sporting event can be determined, which may be done, e.g., using tables having negative time ranges with the ending event as an initial key or with positive time ranges with the ending event as a final key. Trends in such predicted initial events can also be used to determine actions.
Any of the computer systems mentioned herein may utilize any suitable number of subsystems. Examples of such subsystems are shown in
The subsystems shown in
A computer system can include a plurality of the same components or subsystems, e.g., connected together by external interface 981. In some embodiments, computer systems, subsystem, or apparatuses can communicate over a network. In such instances, one computer can be considered a client and another computer a server. A client and a server can each include multiple systems, subsystems, or components, mentioned herein.
The specific details of particular embodiments may be combined in any suitable manner without departing from the spirit and scope of embodiments of the invention. However, other embodiments of the invention may be directed to specific embodiments relating to each individual aspect, or specific combinations of these individual aspects.
It should be understood that the present invention as described above can be implemented in the form of control logic using hardware and/or using computer software in a modular or integrated manner. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will know and appreciate other ways and/or methods to implement the present invention using hardware and a combination of hardware and software
Any of the software components or functions described in this application, may be implemented as software code to be executed by a processor using any suitable computer language such as, for example, Java, C++ or Perl using, for example, conventional or object-oriented techniques. The software code may be stored as a series of instructions, or commands on a computer readable medium for storage and/or transmission, suitable media include random access memory (RAM), a read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a compact disk (CD) or DVD (digital versatile disk), flash memory, and the like. The computer readable medium may be any combination of such storage or transmission devices.
Such programs may also be encoded and transmitted using carrier signals adapted for transmission via wired, optical, and/or wireless networks conforming to a variety of protocols, including the Internet. As such, a computer readable medium according to an embodiment of the present invention may be created using a data signal encoded with such programs. Computer readable media encoded with the program code may be packaged with a compatible device or provided separately from other devices (e.g., via Internet download). Any such computer readable medium may reside on or within a single computer program product (e.g. a hard drive, a CD, or an entire computer system), and may be present on or within different computer program products within a system or network. A computer system may include a monitor, printer, or other suitable display for providing any of the results mentioned herein to a user.
The above description of exemplary embodiments of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form described, and many modifications and variations are possible in light of the teaching above. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications to thereby enable others skilled in the art to best utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated.
Claims
1. A method of identifying a consumer as belonging to a particular demographic, the method comprising:
- receiving first data associated with first transactions of a first entity and second data associated with second transactions of one or more second entities;
- a computer system identifying one or more first patterns of the first previous transactions and identifying one or more second patterns of the second previous transactions, wherein each pattern includes a plurality of values, with at least two of the values respectively including contributions from transactions corresponding to different time ranges;
- comparing the one or more first patterns to the one or more second patterns; and
- based on the comparison, determining whether the first entity and the one or more second entities belong to a same demographic.
2. The method of claim 1, wherein the first entity is a first consumer and the one or more second entities are one or more second consumers.
3. The method of claim 2, wherein the first consumer belongs to the same demographic at less than one hundred percent.
4. The method of claim 3, further comprising:
- comparing the first patterns to other patterns associated with other demographics; and
- determining whether the first consumer belongs to the other demographics.
5. The method of claim 4, further comprising:
- defining the first consumer as a linear combination of the patterns of the demographics to which the first consumer belongs.
6. The method of claim 2, further comprising:
- sending an incentive for a first transaction to the first consumer, wherein the incentive is sent at a time correlated to when the first transaction is likely for the one or more second consumers; and
- wherein determining whether the first entity and the one or more second entities belong to the same demographic includes determining whether the first consumer uses the incentive when the first transaction is likely for the one or more second consumers.
7. The method of claim 1, wherein a value of each pattern corresponds to a likelihood of a transaction occurring within a specified time range.
8. The method of claim 1, wherein identifying one or more patterns of transactions includes:
- associating one or more keys with each previous transaction;
- correlating pairs of previous transactions, each correlated pair associated with a particular pair of keys; and
- for each correlated pair, determining time intervals between the transactions of the correlated pair; and
- for each key pair: tracking numbers of occurrences of correlated pairs having time intervals within specified time ranges, the transactions of the correlated pairs being associated with corresponding keys of the key pair.
9. The method of claim 8, wherein comparing the one or more first patterns to the one or more second patterns includes comparing a portion of first patterns associated with pairs of keys that match with pairs of keys associated with a portion of the second patterns, the method further comprising:
- identifying first patterns associated with pairs of keys that do not match with pairs of keys associated with the second patterns,
- wherein determining whether the first entity and the one or more second entities belong to a same demographic includes analyzing the non-matching patterns.
10. The method of claim 1, wherein comparing the one or more first patterns to the one or more second patterns includes determining a difference between corresponding values of matching patterns between the first entity and the one or more second entities; and
- comparing a sum of the differences to a threshold value.
11. The method of claim 1, wherein comparing the one or more first patterns to the one or more second patterns includes:
- multiplying values of a first pattern with corresponding values of a matching second pattern to obtain a plurality of resulting values; and
- analyzing the resulting values to determine whether the first consumer and the one or more second consumers belong to a same demographic.
12. The method of claim 11, wherein analyzing the resulting values is performed with a neural network.
13. A computer program product comprising a tangible computer readable medium storing a plurality of instructions for controlling one or more processors to perform the method of claim 1.
14. A computer system comprising:
- one or more processors; and
- the computer program product of claim 13.
15. A method of identifying a trend in consumer behavior, the method comprising:
- receiving data associated with previous transactions of an entity;
- a computer system determining one or more patterns of the previous transactions, wherein each pattern includes a plurality of values, with at least two of the values respectively including contributions from transactions corresponding to different time ranges;
- determining likelihoods for an occurrence of a transaction according to the one or more patterns, each likelihood at a respective one of a plurality of different times; and
- identifying a trend in occurrences of the transaction based on the likelihoods at the plurality of different times.
16. The method of claim 15, wherein the entity is a group of similar consumers.
17. The method of claim 15, wherein the entity is a merchant or a group of similar merchants.
18. The method of claim 15, further comprising:
- selecting one or more relevant patterns from the determined patterns based on a product and/or a merchant associated with the transaction, wherein the likelihoods are determined from the relevant patterns.
19. The method of claim 15, wherein determining the one or more patterns of the previous transactions includes:
- associating one or more keys with each previous transaction;
- correlating pairs of previous transactions; and
- determining time intervals between correlated pairs of previous transactions; and
- tracking the occurrences of pairs of correlated previous transactions having time intervals within specified time ranges for a plurality of pairs of keys.
20. The method of claim 15, further comprising:
- adjusting one or more characteristics of a product being sold based on the identified trend.
21. The method of claim 20, wherein the characteristic includes a number of items of the product and/or a price of the product.
22. The method of claim 21, wherein the number of items is adjusted by changing inventory levels of the product at a particular location.
23. The method of claim 15, wherein identifying a trend in occurrences of the transaction based on the likelihoods at the plurality of different times includes determining a likelihood above an upper threshold value or below a lower threshold value.
24. The method of claim 15, wherein determining likelihoods for an occurrence of a transaction according to the one or more patterns includes determining a probability distribution as a function of time from the one or more patterns, and wherein the trend is identified from changes in the probability distribution from one time to another.
25. The method of claim 15, wherein determining likelihoods for an occurrence of a transaction according to the one or more patterns includes determining a probability distribution as a function of time from the one or more patterns, the method further comprising:
- determining one or more other probability distributions from one or more other patterns associated with the entity, wherein the other probability distributions are determined at calculation times different than the probability distribution;
- calculating a difference between the probability distribution and the one or more other probability distributions, wherein the trend is identified based on the difference.
26. A computer program product comprising a tangible computer readable medium storing a plurality of instructions for controlling one or more processors to perform the method of claim 15.
27. A computer system comprising:
- one or more processors; and
- the computer program product of claim 26.
Type: Application
Filed: May 4, 2010
Publication Date: Nov 4, 2010
Inventors: Patrick Faith (Pleasanton, CA), Kevin P. Siegel (Milpitas, CA), Zhongxiao Jiang (Cupertino, CA)
Application Number: 12/773,766
International Classification: G06Q 10/00 (20060101); G06N 5/02 (20060101); G06N 3/02 (20060101);