TRANSACTION DATA CATEGORIZER SYSTEM AND METHOD

Info

Publication number: 20190205993
Type: Application
Filed: Jul 13, 2018
Publication Date: Jul 4, 2019
Inventors: Danton Rodriguez (San Francisco, CA), Melissa Pancoast (San Francisco, CA), Matthieu Tourne (San Francisco, CA)
Application Number: 16/035,217

Abstract

Various embodiments are directed to the centralization processing of transaction and payment data to categorize transaction data across different accounts and systems. Embodiments disclose a transaction categorizer to tag an incoming transaction with metadata, perform a user rule match, perform a vendor match, and/or an estimated (probabilistic) score match on the incoming transaction. By applying these various matching processes to the incoming transaction, the transaction categorizer can determine which metadata tags to remove, apply, and/or modify to accurately categorize the transaction. Accurate categorization of transactions can result to valuable data for users in personal resource management, as well as for vendors and service providers.

Description

Description

CROSS REFERENCE TO RELATED APPLICATION

This application claims benefit of U.S. Provisional Application No. 62/612,202, filed Dec. 29, 2017, entitled “CENTRALIZED PERSONAL RESOURCE MANAGEMENT APPLICATION AND SYSTEM USING DYNAMIC VISUALIZATION,” which is incorporated herein by reference for all purposes.

BACKGROUND

Users are increasingly conducting more transactions with credit cards, digital wallets, digital currencies, and other electronic forms of payment. These electronic forms of payment enable users to conduct transactions online over the Internet, and create electronic documentation of the transactions, including metadata that may be stored and analyzed for various purposes. Traditionally many transactions were conducted with cash, and in-person for security reasons to authenticate the user and authorize the transactions. Online transactions involving financial resources and/or personal assets often involve the transmission of digital data and digital documentation that introduce many technical limitations in processing for analysis with respect to regulations in taxes, reporting, and/or other operations related to finances, currencies, securities, commodities, and/or assets. Furthermore, with the plurality of types of personal resources and assets that users can obtain, manage, and conduct transactions with, financial advising, transaction processing, or even payment processing platforms encounter the technical difficulties in parsing through and categorizing transaction data in large volumes more intelligently.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments in accordance with the present disclosure will be described with reference to the drawings, in which:

FIG. 1 illustrates an example of a networked computing system for categorizing transaction data, in accordance with various embodiments of the present disclosure.

FIG. 2 illustrates an example process of categorizing transaction data, in accordance with various embodiments of the present disclosure.

FIG. 3 illustrates an example representation of transaction data, in accordance with various embodiments of the present disclosure.

FIG. 4 illustrates an example process of tagging to categorize transaction data, in accordance with various embodiments of the present disclosure.

FIG. 5 illustrates an example representation of user rule data, in accordance with various embodiments of the present disclosure.

FIG. 6 illustrates an example process of user matching to categorize transaction data, in accordance with various embodiments of the present disclosure.

FIG. 7 illustrates an example process overview of vendor matching to categorize transaction data, in accordance with various embodiments of the present disclosure.

FIG. 8 illustrates an example process overview of probabilistic score matching to categorize transaction data, in accordance with various embodiments of the present disclosure.

FIG. 9 illustrates an example process overview of displaying transaction data, in accordance with various embodiments of the present disclosure.

FIG. 10 illustrates an example representation of a transaction data categorizer computer, in accordance with various embodiments of the present disclosure.

FIG. 11 illustrates an example implementation environment, in accordance with various embodiments of the present disclosure.

DETAILED DESCRIPTION

In the following description, various embodiments will be described. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding of the embodiments. However, it will also be apparent to one skilled in the art that the embodiments may be practiced without the specific details. Furthermore, well-known features may be omitted or simplified in order not to obscure the embodiment being described.

Systems and methods in accordance with various embodiments of the present disclosure may overcome one or more of the aforementioned and other deficiencies experienced in conventional approaches for processing transaction and payment data. In particular, various embodiments are directed to centralization processing of transaction and payment data to categorize transaction data across different accounts and systems. Transactions may be conducted using a variety of personal resources, that according to various embodiments throughout this disclosure, may include, but are not limited to, assets (e.g., real estate properties), stock, mutual funds, currencies, cash, cryptocurrencies, bonds, commodities, or any other suitable financial instrument having value. Personal resources may be exchanged for other financial instruments, goods, and/or services. Personal resources may also be gifted, donated, loaned, or otherwise transferred from one user to another user or entity. Payment data and processing, according to various embodiments, may include information relating to the personal resources and include the mechanics of transferring the ownership of personal resources from one party to another to complete the transaction.

Various other functions and advantages are described and suggested below as may be provided in accordance with the various embodiments.

FIG. 1 illustrates an example of a networked computing system 100 for a centralized transaction processing and categorization system. In this example, a server 102 (or a plurality thereof) may operate as a transaction processor or categorizer. The transaction categorizer 102 may intake transactions from a payment processor 110, such as a credit card service, bank, issuer, or any other payment processing service that processes payments of transactions between users, other users, service providers, and/or retailers. Transactions may include, but are not limited to, paying for utilities, transportation, clothing, entertainment, food, grocery, taxes, rent, mortgage, etc. Other examples of transactions may also include refunds, reimbursements, price adjustments, and other modifications to transactions. In another embodiment, the transaction categorizer 102 may store and have access to transaction data stored in transaction database 112. Transaction data may include, for example, user identification, amount and type of payment, vendor, time, location, and/or the item or service. The transaction database 112 may be an external entity or may be internal to the transaction categorizer 102. Similarly, the payment processor 110 may be operated by the same entity as the transaction categorizer 102, or may be an external entity to the transaction categorizer 102.

The transaction categorizer may be in communication with a data display interface 104 to provide transaction and payment information to one or more client devices 108. The data display interface 104 may provide the categorized transaction data to the client devices 108 for the display of a dynamic, interactive visual representation of the user's personal resource and asset data for user-intuitive management. Client devices 108 include devices through which a user can view, edit, access or otherwise interact with the categorized transactions they've conducted for personal resource management. The client device 108 may include at least one form of input such as a keyboard, a touchscreen, a voice communications component such as a microphone, and at least one form of visual output such as a display. The client devices 108 can include various computing devices such as speakers, receivers, smart phones, tablet computers, wearable computers (e.g., smart glasses or watches), desktop or notebook computers, and the like. The client devices 108 can communicate with the data display interface 104 and transaction categorizer 102 over at least one network 106, such as the Internet, a cellular network, a local area network (LAN), an Ethernet, Wi-Fi, or a dedicated network, among other such options.

Embodiments of the present invention provide a set of design criteria to reduce cognitive load and increase comprehension in finance, including but not limited to visual and spatial representation of personal resource management based on categorizing transactions. Users make a plurality of transactions that need to be categorized into different tags such that the transaction processor can provide data that can be used to help users manage their finances. For example, transactions that are identified as fixed costs such as rent or mortgage are treated differently in a financial management plan than transactions involving entertainment, such as movies.

FIG. 2 illustrates an example process 200 of categorizing transaction data, in accordance with various embodiments of the present disclosure. The transaction categorizer system may be comprised of the following components: a tagging engine, a user rule match module, a vendor match module, a probabilistic score engine, and a display module, where each component performs a separate process in categorizing incoming transactions. At 202, the transaction categorizer may receive incoming transactions from a payment processing system or a bank, or any other financial institution. The transaction categorizer acts as a data intermediary and may parse transaction metadata from the transaction information that is received from the financial institution. An example of transaction data received by the transaction categorizer is shown in FIG. 3, and will be discussed in further detail. In some embodiments, the transaction data may be stored in a database.

At 204, a tagging engine in the transaction categorizer may process the transaction data. The tagging engine may analyze the basic metadata that is in the transaction, and parse, for example, a vendor from a vendor name field in the transaction, a date on which the transaction was posted to the financial account, an amount for the transaction, and a direction of the amount (e.g., incoming as income or outgoing as a payment). The tagging engine may link additional tags that are represented as non-hierarchal metadata of the transaction. The tagging engine may also analyze the metadata that included in the transaction. For example, the tagging engine may detect a false positive based on the metadata included in the transaction and either modify or link another clarifying tag to the transaction. To illustrate, the tagging engine may determine that a transaction tagged as income is actually a refund for a return of an item. The tagging engine may then update some of the metadata to reflect that this transaction should not be categorized and treated as income, but a refund to a previous outgoing payment.

At 206, the transaction data is then passed to the user rule match module of the transaction categorizer. The user rule match module contains rules that are specific to a particular user to identify the user and add additional context to the transaction. The additional context may be represented by additional metadata that is associated with the transaction. The additional metadata may be consumed by a larger system, third party entity, marketing system, advertising system, the vendor, and/or shown to users for various purposes, including aiding in personal resource management. The user rule match module may have access to a user rule database that stores user data to match with. If a user rule match is found, then the transaction data is ready to be prepared for display at 212. However, if a user rule match is not found, then the transaction data is processed by the vendor match module at 208.

At 208, the transaction data is then passed to the vendor match module of the transaction categorizer when the user rule match fails. The vendor match module may parse the vendor name from a vendor field in the transaction data to match with a vendor in a vendor database. Matching the vendor enables the transaction categorizer to determine a category or tag to link with the transaction. If a vendor match is found, then the appropriate tags may be added and the transaction data may be prepared for display at 212. However, if no vendor match is found, then the transaction data may be passed to a probabilistic score module 210.

At 210, when no match is found either by the user rule match module in 206 or the vendor match module in 208, the transaction data is then passed to the probabilistic score module of the transaction categorizer. The probabilistic score module uses various deterministic algorithms to do a “best” match to tag the transaction with a category. As such, the transaction categorizer is designed to always generate some sort of tag to categorize the transaction even if there are no known matches found by the user match module 206 or the vendor match module 208.

At 212, the categorized transaction data, and all the associated tags, are prepared for display. The categorized transaction data may be prepared for display and transmission to the user for use and analysis in a personal resource management application. In other embodiments, the categorized transaction data may be prepared for analysis by third parties, such as vendors, marketing providers, advertising providers, etc. for implementation in product development, marketing campaigns, research and development, and other applications.

FIG. 3 illustrates an example representation of transaction data 300, in accordance with various embodiments of the present disclosure. The data instance may be identified by the type of data, and in this representation of this data instance 300, it is a “Transaction” as indicated in field 302. The transaction data 300 may include various fields, where each field has a field name 304 and a type of data 306 contained in the respective field. Types of data include string, float, array, date/time, and/or an identifier. The transaction data instance 300 may include a “vendor” field 308 that is a string 310, such as a store name where an item was purchased. There may also be a “date” field 312 that contains data and time data, such as a date and a timestamp of when the transaction or purchase occurred or was posted. As discussed herein, the date of the transaction may be different depending on the source because it may be the date and time the payment was posted (e.g., when the payment is cleared) or when the transaction was made (e.g., when the customer's credit card was swiped at a point-of-sale to purchase an item). The transaction 300 also has an “amount” field 316 containing a floating point number 318 to represent a value of the transaction, for example $20.00 USD, 0.0001 BTC, or any other currency.

Another field that the transaction 300 includes is the “tags” field 320 that contains a data array 322. The transaction 300 may have associated tags in this field 320 that have been generated by the payment processor or the vendor that provide context for the transaction. For example, indicating that it is a payment for an item purchased or for services. The transaction categorizer, through the various components (e.g., tagging engine, user rule match module, vendor match module, and/or probabilistic score module) may add additional tags to the array 322 of the “tags” field 320, or modify/remove some of the tags if they were inaccurate. For example, a refund for an item previously purchased may be incorrectly tagged as income.

The transaction 300 may also include various identifiers, such as a transaction identifier (ID) field 324, a user ID field 328, and/or an account ID field 332, containing identifier data types 326, 330, and 334, respectively. The transaction ID 324 may be a unique identifier consisting of letters, numbers, and characters that identify the transaction for the payment processor or financial institution. The user ID 328 may identify the user associated with an account with the vendor, service provider, and/or payment processor, for example a username for an online shopping account or marketplace. The account ID 332 may further identify the payment account of the user, for example identifying a bank account or a credit card account associated with the username or user ID 328. In some embodiments, each of these ID fields may be the result of a cryptographic hashing function that combines multiple sources of information to embed time and date information or other relational metadata and outputs a unique, deterministic string. In some embodiments, the transaction data received from a payment processor or financial institution may be missing some of these data fields, and the tagging engine may validate the information from the financial institution and supplement to it by adding more data to the data fields.

FIG. 4 illustrates an example process of tagging 400 to categorize transaction data, in accordance with various embodiments of the present disclosure. It should be understood that, for any process discussed herein, there can be additional, fewer, or alternative steps performed in similar or alternative orders, or in parallel, within the scope of the various embodiments. The tagging engine analyzes the transaction data instance, such as the financial account type or account identifier, vendor name, amount, date, and patterns in these fields that may indicate a relationship to other transactions and adds tags as metadata to the transaction data instance. At 402, the tagging engine receives a new transaction from a financial institution or payment processor. The tagging engine in some embodiments may store the transaction data instance in a transaction database.

At 404, the transaction data instance is analyzed for tags to detect false positives (e.g., incorrect tags) that may have been added in the data processing pipeline by the financial institution or a data intermediary or aggregator and remove or modify the tags in the tag field of the transaction data instance. The transaction data instance may have existing tags from the payment processor and/or financial institution. However, typically these pre-existing tags are known to be incorrect or cause false positives. For example, having a credit card payment tag on a transfer transaction. The tagging engine may also detect whether there are false positive patterns. If a false positive pattern is found, then the offending tag is removed from the tag field of the transaction data instance.

At 406, the tagging engine may determine if transaction with similar vendor names have recurring patterns. Transactions with similar vendor names from a previous time period may be gathered to determine if there is a recurring pattern. For example, all transactions within the previous three months from a particular grocery store may be gathered and it may be determined that the user makes a weekly grocery purchase at the same grocery store on the user's way home from work. If the dates of the transactions and the amounts match recurring patterns, then tags might be added to indicate such a pattern. For example, a tag of “recurring_amount” for an amount that is recurring (e.g., a prescription for heart medication) and/or a tag of “recurring_periodic” for a transaction that occurs on a periodic pattern (e.g., grocery transaction every Monday). These tags may be added to the tag array contained in the tag field.

At 408, the tagging engine may then detect and filter refunds and transfers. If a transaction is a deposit but it appears to be from a vendor that would not normally make deposits or appears to be a transfer, then a similar transaction in the opposite direction is searched across accounts. If such a transaction is found, then the refund or transfer is added to the tag array and a link to the identifier of the corresponding transaction is saved in the transaction metadata. This process helps correctly identify that a refund is not treated or processed as income. Similarly, users may have multiple accounts (e.g., checking and savings, etc.) and a transfer from one account to another of the user also should not be treated or processed as income. Thus, transactions that are identified as refunds or transfers should be specifically filtered since they are often erroneously categorized. For example a deposit to the account that has a vendor name of a clothing retailer is likely a return or refund, and not an incoming deposit.

At 410, a keyword match process may be executed on the vendor name of the transaction data instance. If the vendor field of the transaction data instance contains a match in the database of keyword tags for the vendor, then a corresponding tag is applied. For example, if “hospital” is in the vendor field of the transaction data instance, then an additional tag of “healthcare” may be added to the metadata of the transaction data instance.

At 412, account tags may be added based on the keyword match. If the transaction matches a keyword and belongs to a certain account, then an account tag may be added. For example, a deposit in a credit card account may have the tag “credit_card_payment” applied to the transaction. In another example, a transfer to a savings account may have the tag “savings_deposit” applied to the transaction. At 414, the tagging engine may store all added tags in an in-memory tag array and added as metadata of the transaction.

Referring back to FIG. 2, the transaction data may then be passed onto the user rule match module at 206 to determine whether there is a user rule module for user data in the metadata of the transaction data instance. FIG. 5 illustrates an example representation of user rule data 500, in accordance with various embodiments of the present disclosure. A user rule has information that the user rule match module tries to match properties of a series of transactions. As such a user rule data instance contains information about a series of transactions. As shown in 500, the data instance may be identified by the type of data, and in this representation of this data instance 500, it is a “User Rule” as indicated in field 502. The user rule data 500 may include various fields, where each field has a field name 504 and a type of data 506 contained in the respective field. Types of data include string, float, array, date/time, and/or an identifier. The user rule data instance 500 may include a “reduced vendor” field 508 that is a string 510, and may be a concatenated store name where an item was purchased. There may also be a “periodicity” field 512 that contains an array of data and time data of each transaction in the series of transactions at the same vendor. As discussed herein, the date of the transaction may be different depending on the source because it may be the date and time the payment was posted (e.g., when the payment is cleared) or when the transaction was made (e.g., when the customer's credit card was swiped at a point-of-sale to purchase an item). The user rule 500 also has an “amounts” field 516 containing an array of floating point numbers 518 to represent the amount of each transaction in the series of transactions.

Another field that the user rule 500 includes is the “tags” field 520 that contains an array of strings 522. The string array 522 contains tags associated with each transaction in the series of transactions. The user rule 500 may also include various identifiers, such as a transaction IDs field 524, a user ID field 528, and/or an account IDs field 532. The transactions IDs field 524 contains an array of identifiers 526 of transaction IDs for the series of transactions. The account IDs 532 contains an array of identifiers 534 for the account IDs for the series of transactions.

To illustrate, a series of transactions might be a series of checks for a user's rent. For the user rule, the vendor name may be reduced for the reduced vendor field 508 by removing any idiosyncratic elements from that vendor name. In this example, the series of transactions may include check 234; check 235, check 236; the reduced vendor name would simply be “check”. The periodicity of the transactions in this series may indicate that they occur once a month and what date each month. The user rule match module may compile and analyze the statistical properties of these periodic occurrences and then build a forecasting mechanism to match against. For example, the user rule match module may look at the amounts in the recurring series and then analyze the statistical properties of these amounts to determine variance over time. The statistical analysis may enable the transaction categorizer to calculate the probability of a new transaction and a respective amount to fit within the historical data of the transaction in this recurring series. As such, the user rule match module may build an array of common tags for all the transactions in this series to match against.

FIG. 6 illustrates an example process of user matching to categorize transaction data, in accordance with various embodiments of the present disclosure. The transaction may be first checked against pre-existing user specific rules. If a rule is matched, it is categorized according to the rule. At 602, the user rule match module may receive and process a transaction data instance from a new transaction. A new transaction may be received by the transaction categorizer and as discussed in FIG. 4, may be first processed by the tagging engine to be tagged before it is passed to the user rule match module. At 604, the user rule match module may access a user database based on the user ID from the transaction metadata to retrieve user rule instances (see FIG. 5) from the user database. The user database may store a plurality of user rule tables.

At 606, the user rule match module may retrieve the vendor name from the new transaction data instance and simplify the vendor metadata. Concatenating the vendor name reduces the storage need for the field and also unifies the naming convention so that it is more consistent and easier for the user rule match module to match user rule instances by a vendor name. The user rule match module may reduce the vendor name field by removing idiosyncratic data from the name to make it generic. For example, “check #2346” may be reduced to “check”.

At 608, a vendor name match may be performed against the user rule instances to determine whether the reduced vendor name matches any reduced vendor names in a user rule instance representing a recurring series of transactions. The reduced vendor name may be matched against all of the reduced vendor fields for all transactions for each user rule instance. When a vendor field in a transaction in a user rule instance matches the reduced vendor field of the new transaction, that user rule instance is kept. For example, any reduced vendor field that has less than a 90% string similarity score may be discarded as not a match. If no user rule instances remain, then no matches are found and the user rule match module exits to continue onto the vendor match module (see FIG. 2).

At 610, the user rule match module may perform date matching within a threshold deviation to determine recurring patterns. The periodic properties of the user rule series are evaluated. If there is a match at 608, then it is determined whether the date would fall within the statistical properties of the recurring series. For example, if the transaction occurs within the parameters of the series (e.g., +/−2 days from the first of each month, or every 14-16 days), then the user rule is kept. If it does, we look to see how many user rule instances are remaining and kept. If there is only one user rule instance that remains, then it is considered a match and the user rule instance is updated with metadata to include this new transaction as part of this recurring series. As such, when there is a vendor name match and a date match for a series a transactions, then the user rule match module applies the user rule instance metadata to this transaction as part of the series of transactions represented by the user rule instance. If there are no remaining user rules instances left (i.e., no match), then the user rule match module exits and proceeds to the vendor match module (see FIG. 2).

At 612, when there are multiple user rules remaining, then an amount match may be performed, similar to the date match. The amount match module retrieves the amount metadata from the new transaction instance and evaluates whether it falls within a range of a determined amount for each user rule instance representing a series of transactions. If the amount of the new transaction is within the statistical parameters of what is an expected amount for the series, then that user rule instance is kept as a match. If the amount in the new transaction is outside of the range for a particular user rule instance, that user rule instance is discarded. This process is iterated for each user rule instance, and if there is at least one user rule instance left, there are matches. When there is one user rule instance remaining, then that user rule instance's metadata may be applied to the new transaction and the new transaction is categorized as part of the recurring series represented by that user rule instance. If no user rules remain, then there are no matches and the transaction data is processed for a vendor match (see FIG. 2). However, there may be multiple user rule instances remaining as potential matches to the new transaction instance.

At 614, if there are still multiple user rule instances as potential matches, then the user match module may implement a tie breaking process to determine what metadata to update in the user rule instance database and how to categorize the new transaction. The tie breaker module evaluated the date of the occurrence of the transaction and the amount to perform a Euclidean distance calculation. A match in the tie breaking module would be, for example, the user rule instance with the lowest Euclidean distance from the new transaction instance. Selecting the lowest Euclidean distance from the new transaction instance should result in exactly one user rule instance. However, if there are still multiple user rule instances remaining, then the tie breaker module may implement another tie-breaking rule. For example, the tie breaker module may then look at the time stamp on the creation of the user rule instance and select the most recently created user rule instance to ultimately determine a user rule instance to best match the new transaction. Using the timestamp is ultimately deterministic because the creation of user rule instances is serial and thus user rule instances cannot be created simultaneously, resulting in unique timestamps for user rule instances. For example, there will be at least a millisecond timestamp difference between user rule instances.

Identifying recurring transactions is beneficial for personal resource management services because it can aid in providing recommendations for adjustments in personal resource allocation and investment. Instead of focusing merely on significant transactions by amount of value (e.g., dollars), the transaction categorizer can identify significant transactions by volume. For example, a series of recurring transactions for a student loan should be treated differently than a series of recurring transactions for coffee every morning—it may not be realistic to propose changes to the user's student loans as opposed to lifestyle changes in the user's personal resource management plan. In another example, a series of recurring transactions for rent may also be treated differently from a series of recurring transactions for student loans because a lease for an apartment may be for one year compared to a student loan repayment schedule. As such, correctly and accurately categorizing series of recurring transactions through iterative learning provides valuable data for personal resource management services, vendors, payment processors, etc. to determine what transactions in a user's life may be malleable and what transactions are less so.

According to various embodiments, if the user rule match module does not successfully categorize the new transaction, then the new transaction instance is passed to the vendor match module (see FIG. 2). In some embodiments, the transaction categorizer may access a vendor name database that has been curated to map vendor name fields, vendors, and their corresponding categories. This vendor database may be maintained, curated, and provided by an external entity, or may be internal to the transaction categorizer. The vendor match module may evaluate whether that particular vendor is exclusive to that category. For example, a grocery store may be categorized exclusively for grocery; however a department store or an online marketplace may be associated with several categories.

FIG. 7 illustrates an example process overview of vendor matching 700 to categorize transaction data, in accordance with various embodiments of the present disclosure. The vendor match module checks the vendor name of the transaction instance for a match against the known vendor database(s). If matched, the corresponding category of the vendor in the transaction may be mapped to the categories for the matched vendor in the database. Additionally, categories of the matched vendor in the database may be applied to the vendor of the new transaction instance as additional metadata. If no matches are found, then the vendor match module exits and proceeds to the probabilistic score module.

At 702, the vendor match module may receive the transaction instance for processing. At 704, similar to the user rule match module, the vendor name is reduced and simplified for easier matching. The vendor match module may retrieve the vendor name from the vendor field of the transaction instance and remove idiosyncratic data from it to make it more generic. At 706, a vendor match may be performed with the reduced vendor name. The reduced vendor name may be matched against vendors in a vendor database that correlates vendors with specific categories. The vendor database may be loaded as an in-memory hash table. If there is exactly one match for the vendor, then the category metadata for the vendor from the database is applied to the new transaction tags metadata, and the vendor match module may exit to proceed to preparing the transaction instance and its associated tags for display (see FIG. 2). If no matches are found, then the transaction instance is processed for an estimated vendor match at 708. If multiple matches remain, then the transaction instance is processed by a tie-breaker process at 710 in the vendor match module.

At 708, when no match is found, then an estimated vendor match is performed to find a “best” match. The estimated vendor name may be matched using a distance algorithm to generate a likeness score. For example, the estimated vendor match may use a Levenshtein distance to calculate a likeness score. Any vendor with a likeness score exceeding a minimal threshold may be considered a match. If only one match remains after implementing the estimated vendor match, then the vendor category is applied to the transaction instance as a tag. If there are no matches, then the vendor match module exits and the transaction instance is processed by the probabilistic score module (see FIG. 2). If there are multiple matches, then a tie breaker process is executed at 710.

At 710, the vendor match module may then implement a tie breaker process to resolve multiple matches. If the matched categories have amount restrictions then the amount restriction is used to determine the category. For example, transactions with certain vendors may be expected to be certain amounts (e.g., a $10,000 purchase is likely not a drugstore purchase). As such, amount restrictions can be useful in determining the category of the transaction. However, in the situation where using the amount restriction fails, then the tags may be analyzed. Some categories cannot have transactions with certain prohibited tags. If one of the prohibited tags is present, then the category may be eliminated. Should detecting for a prohibited tag fail, then a geographic tie breaker may be used to see if the vendor category pair has geographic limitations. If there is a geographic limitation, then the presumed location of the transaction, or if not available, the presumed hometown of the user, is used to match the geographies. Lastly, if geographical limitations are unsuccessful in determining a tie-breaker, then the financial institution is used. Some vendor values may be mapped to different financial institutions.

FIG. 8 illustrates an example process overview of probabilistic score generation 800 to categorize transaction data, in accordance with various embodiments of the present disclosure. Probabilistic score matching may generate a data array containing all known metadata of the transaction and passes the data array into a classifier. The classifier may attempt to classify the transaction data array as a last attempt to categorize the transaction.

At 802, for a transaction instance that cannot be matched by the user rule match module or vendor match module earlier (see FIG. 2), the transaction instance may then be passed to a probabilistic score module. At 804, a data array is generated. The transaction information from the transaction instance may be formatted into a standardized data array containing all known transaction metadata. The array generation serves to preprocess the transaction instance to allow the transaction data to match the data array of the data used during training.

At 806, processed transactions are converted into standardized arrays. The classifier may be trained using these arrays. The classification model generated by this training process may be used to classify uncategorized transactions. In some implementations, the classifier may be a neural network classifier, artificial intelligence classifier, machine learning classifier or any suitable classifier for an artificial neural network. The data array may be passed to a machine learning classifier, such as a convolutional neural network. The network's model may be regularly re-trained using sets of accurately categorized transaction data arrays. The neural network may consist of a series of convolutional layers with flattening layers preceding each. At the end of the fully connected layers may be a softmax layer. The results of the softmax layer may be used to determine the category to apply to the transaction instance. A probabilistic score array containing the likelihood that a transaction belongs to any given category may be applied to each of the transactions and the results are logged.

At 808, the classifier's results may be reviewed for categorization errors. In some embodiments, a report may be generated that includes all new neural net classified transactions. Any errors or misses are added to a log and used to correct the category determination of the transaction. The report may be automatically generated on a periodic basis for review. For example, each week a spreadsheet may be automatically generated and emailed. In some embodiments, the report may be reviewed manually and the corrections made manually.

At 810, the neural network may be retrained based on the results. The reviewed transactions and historically reviewed transactions may be run through a transaction multiplier module. The transaction multiplier module may add random noise to the arrays consistent with observed noise patterns from different payment terminals and financial institutions. The model may be retained on this dataset and the new model may be deployed, resulting in increased efficiency and volume in transaction processing.

FIG. 9 illustrates an example process overview of displaying transaction data 900, in accordance with various embodiments of the present disclosure. The display preparation module takes the results of the last matching process (e.g., user rule matching, vendor matching; see FIG. 2) and maps the backend category to a broader set of pre-defined front end display categories generated from user research. The mappings are prepared in a display that is user-intuitive and interactive with the user. At 902, the display module receives and processes the transaction instance, including all modified and added tag metadata. At 904, the category mapping may be displayed. The transaction category information may be mapped against a set of display categories shown to users. The associated display category may be assigned to the transaction metadata. At 906, the tag assignment may be displayed. The transaction tags may be reviewed against a database of display level tags. If the tags have a match in the database of display tags, then the tags may be added to the display tag array in the transaction database. At 908, the transaction information and categorization may be displayed to a user on a computing device, or prepared for digital transmission to an external entity (e.g., a vendor, marketing service provider, advertising service provider, etc.). For example, when a user logs into a personal resource management service, the user may wish to view their transaction history. The display category and tags may be visible on the front end to the user. If the user believes the tag or category to be inaccurate, there is an interface to allow the user to flag the transaction. In some embodiments, the user may be able to manually change the category or tag. The modifying event may be sent to a log and be included in the log report reviewed and used for retraining the neural network.

FIG. 10 illustrates an example representation of a transaction data categorizer computing system 1000, in accordance with various embodiments of the present disclosure. The transaction categorizer system 1000 may be comprised of the following components: a transaction intake processor 1002, a tagging engine 1006, a user rule match module 1008, a vendor match module 1012, a probabilistic score module 1014, and a display module or data display generator 1016, where each components performs a separate process in categorizing incoming transactions. The transaction categorizer 1000 may receive incoming transactions from a payment processing system or a bank, or any other financial institution using the transaction intake processor 1002. In some embodiments, the transaction data may be stored in a transactions database 1004, which may be internal to the categorizer 1000 or external to the categorizer.

The user rule match module 1008 contains rules that are specific to a particular user to identify the user and add additional context to the transaction. The additional context may be represented by additional metadata that is associated with the transaction. The user rule match module 1008 may have access to a user rule database 1010 that stores user data to match with. The vendor match module 1012 may parse the vendor name from a vendor field in the transaction data to match with a vendor in a vendor database (not shown). Matching the vendor enables the transaction categorizer to determine a category or tag to link with the transaction. If a vendor match is found, then the appropriate tags may be added and the transaction data may be prepared for display by the data display generator 1016, which may be internal to the categorizer 1000 or external to the categorizer. However, if no vendor match is found, then the transaction data may be passed to a probabilistic score module 1014.

The probabilistic score module 1014 uses various deterministic algorithms to do a “best” match to tag the transaction with a category. As such, the transaction categorizer is designed to always generate some sort of tag to categorize the transaction even if there are no known matches found by the user match module 1008 or the vendor match module 1012. When the transaction is processed, categorized transaction data, and all the associated tags, are prepared for display by the data display generator 1016. The categorized transaction data may be prepared for display and transmission to the user for use and analysis in a personal resource management application. In other embodiments, the data display generator 1016 may prepared the categorized transaction data for analysis by third parties, such as vendors, marketing providers, advertising providers, etc. for implementation in product development, marketing campaigns, research and development, and other applications.

In accordance with various embodiments, different approaches can be implemented in various environments in accordance with the described embodiments. For example, FIG. 11 illustrates an example of an environment 1100 for implementing aspects in accordance with various embodiments (e.g., a resource provider environment). In accordance with various embodiments, different approaches can be implemented in various environments in accordance with the described embodiments. For example, FIG. 11 illustrates an example of an environment 1100 for implementing aspects in accordance with various embodiments (e.g., a resource provider environment). As will be appreciated, although a Web-based environment is used for purposes of explanation, different environments may be used, as appropriate, to implement various embodiments. The system includes voice communications device 104, which can include any appropriate device operable to send and receive requests, messages or information over network 1104 and convey information back to an appropriate device. The network can include any appropriate network, including a telephone network provided by a telecommunication operator, an intranet, the Internet, a cellular network, a local area network, wireless network, or any other such network or combination thereof. Communication over the network can be enabled via wired or wireless connections and combinations thereof. In this example, the network includes the Internet, as the environment includes an application server 1106 for receiving requests and serving content in response thereto, although for other networks, an alternative device serving a similar purpose could be used, as would be apparent to one of ordinary skill in the art. The illustrative environment includes at least one backend server 1108 and a data store 1110. It should be understood that there can be several backend servers, layers or other elements, processes or components, which may be chained or otherwise configured, which can interact to perform tasks such as obtaining data from an appropriate data store. As used herein, the term “data store” refers to any device or combination of devices capable of storing, accessing and retrieving data, which may include any combination and number of data servers, databases, data storage devices and data storage media, in any standard, distributed or clustered environment. The backend server 1108 can include any appropriate hardware and software for integrating with the data store 1110 as needed to execute aspects of one or more applications for the client device and handling a majority of the data access and business logic for an application. The application server provides access control services in cooperation with the data store and is able to analyze audio data and other data as well as generate content such as text, graphics, audio and/or video to be transferred to the user, which may be served to the user by the Web server 1106 in the form of HTML, XML or another appropriate structured language in this example. The handling of all requests and responses, as well as the delivery of content between the voice communications device 104 and the backend server 1108, can be handled by the Web server 1106. It should be understood that the Web and application servers are not required and are merely example components, as structured code discussed herein can be executed on any appropriate device or host machine as discussed elsewhere herein. The data store 1110 can include several separate data tables, databases or other data storage mechanisms and media for storing data relating to a particular aspect. For example, the data store illustrated includes mechanisms for storing content (e.g., production data) 1112 and user information 1116, which can be used to serve content for the production side. The data store is also shown to include a mechanism for storing log or session data 1114. It should be understood that there can be other information that may need to be stored in the data store, such as page image information and access rights information, which can be stored in any of the above listed mechanisms as appropriate or in additional mechanisms in the data store 1110. The data store 1110 is operable, through logic associated therewith, to receive instructions from the backend server 1108 and obtain, update or otherwise process data in response thereto. In one such example, the voice communications device can receive a request to refine the playback of media content, such as music, news, audio books, audio broadcasts, and other such content. In this case, the data store might access the user information to verify the identity of the user and access a media service to determine media content the user is associated with. The user's speech can be analyzed and used to generate an updated active play queue or initiate the playback of media content. Each server typically will include an operating system that provides executable program instructions for the general administration and operation of that server and typically will include computer-readable medium storing instructions that, when executed by a processor of the server, allow the server to perform its intended functions. Suitable implementations for the operating system and general functionality of the servers are known or commercially available and are readily implemented by persons having ordinary skill in the art, particularly in light of the disclosure herein. The environment in one embodiment is a distributed computing environment utilizing several computer systems and components that are interconnected via communication links, using one or more computer networks or direct connections. However, it will be appreciated by those of ordinary skill in the art that such a system could operate equally well in a system having fewer or a greater number of components than are illustrated in FIG. 11. Thus, the depiction of the system 1100 in FIG. 11 should be taken as being illustrative in nature and not limiting to the scope of the disclosure. The various embodiments can be further implemented in a wide variety of operating environments, which in some cases can include one or more user computers or computing devices which can be used to operate any of a number of applications. User or client devices can include any of a number of general purpose personal computers, such as desktop or laptop computers running a standard operating system, as well as cellular, wireless and handheld devices running mobile software and capable of supporting a number of networking and messaging protocols. Such a system can also include a number of workstations running any of a variety of commercially-available operating systems and other known applications for purposes such as development and database management. These devices can also include other electronic devices, such as dummy terminals, thin-clients, gaming systems and other devices capable of communicating via a network. Most embodiments utilize at least one network that would be familiar to those skilled in the art for supporting communications using any of a variety of commercially-available protocols, such as TCP/IP, OSI, FTP, UPnP, NFS, CIFS and AppleTalk. The network can be, for example, a local area network, a wide-area network, a virtual private network, the Internet, an intranet, an extranet, a public switched telephone network, an infrared network, a wireless network and any combination thereof. In embodiments utilizing a Web server, the Web server can run any of a variety of server or mid-tier applications, including HTTP servers, FTP servers, CGI servers, data servers, Java servers and business application servers. The server(s) may also be capable of executing programs or scripts in response requests from user devices, such as by executing one or more Web applications that may be implemented as one or more scripts or programs written in any programming language, such as Java, C, C# or C++ or any scripting language, such as Perl, Python or TCL, as well as combinations thereof. The server(s) may also include database servers, including without limitation those commercially available from Oracle, Microsoft, Sybase and IBM. The environment can include a variety of data stores and other memory and storage media as discussed above. These can reside in a variety of locations, such as on a storage medium local to (and/or resident in) one or more of the computers or remote from any or all of the computers across the network. In a particular set of embodiments, the information may reside in a storage-area network (SAN) familiar to those skilled in the art. Similarly, any necessary files for performing the functions attributed to the computers, servers or other network devices may be stored locally and/or remotely, as appropriate. Where a system includes computerized devices, each such device can include hardware elements that may be electrically coupled via a bus, the elements including, for example, at least one central processing unit (CPU), at least one input device (e.g., a mouse, keyboard, controller, touch-sensitive display screen or keypad, microphone, camera, etc.) and at least one output device (e.g., a display device, printer or speaker). Such a system may also include one or more storage devices, such as disk drives, optical storage devices and solid-state storage devices such as random access memory (RAM) or read-only memory (ROM), as well as removable media devices, memory cards, flash cards, etc. Such devices can also include a computer-readable storage media reader, a communications device (e.g., a modem, a network card (wireless or wired), an infrared communication device) and working memory as described above. The computer-readable storage media reader can be connected with, or configured to receive, a computer-readable storage medium representing remote, local, fixed and/or removable storage devices as well as storage media for temporarily and/or more permanently containing, storing, sending and retrieving computer-readable information. The system and various devices also typically will include a number of software applications, modules, services or other elements located within at least one working memory device, including an operating system and application programs such as a client application or Web browser. It should be appreciated that alternate embodiments may have numerous variations from that described above. For example, customized hardware might also be used and/or particular elements might be implemented in hardware, software (including portable software, such as applets) or both. Further, connection to other computing devices such as network input/output devices may be employed. Storage media and computer readable media for containing code, or portions of code, can include any appropriate media known or used in the art, including storage media and communication media, such as but not limited to volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage and/or transmission of information such as computer readable instructions, data structures, program modules or other data, including RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices or any other medium which can be used to store the desired information and which can be accessed by a system device. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will appreciate other ways and/or methods to implement the various embodiments.

The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. It will, however, be evident that various modifications and changes may be made thereunto without departing from the broader spirit and scope of the invention as set forth in the claims.

Claims

1. A computer-implemented method comprising:

receiving, by a transaction processing system comprising a processor and a non-transitory computer-readable medium, a transaction data instance associated with a transaction;

processing the transaction data instance to retrieve data contained in one or more fields of the transaction data instance;

validating one or more tags stored in a tag array contained in a tag field of the one or more fields;

determining a match to the data in the one or more fields;

generating one or more new tags to be added to the tag array based at least in part on the match;

modifying the transaction data instance to include the tag array contained in the tag field; and

preparing the modified transaction data instance for transmission via a digital communication channel.

2. The computer-implemented method of claim 1, further comprising:

transmitting the modified transaction data instance to a user associated with the transaction, wherein the modified transaction data instance is displayed on a computing device associated with the user.

3. The computer-implemented method of claim 2, wherein determining the match to the data in the one or more fields further comprises:

accessing a user rule database;

determining whether data in the one or more fields matches a user rule in the user rule database; and

updating the tag array of the transaction data instance with the user entry in the user rule database when the data matches the user entry.

4. The computer-implemented method of claim 3, wherein the user rule is a collection of data that is common to the members of a series of transactions, each transaction in the series of transactions represented as a transaction data instance having one or more fields.

5. The computer-implemented method of claim 4, wherein determining the match to the data in the one or more fields further comprises:

when the data does not match a user rule in the user rule database, identifying a vendor contained in a vendor field of the one or more fields of the transaction data instance;

accessing a vendor database;

determining whether the vendor matches a vendor entry in the vendor database; and

updating the tag array with the vendor entry in the vendor database when the vendor matches the vendor entry.

6. The computer-implemented method of claim 5, wherein determining the match to the data in the one or more fields further comprises:

when the vendor does not match a vendor entry in the vendor database, generating an array with the one or more fields of the transaction data instance;

processing the array with a classifier;

generating a report of the array, including one or more categorizations;

modifying the report, including one or more modified categorizations; and

retraining the classifier based at least in part on the modified report.

7. The computer-implemented method of claim 1, wherein validating the one or more tags stored in the tag array contained in the tag field of the one or more fields further comprises:

identifying erroneous tags based at least in part on historical tags;

removing erroneous tags;

determining whether the transaction instance is a recurring transaction, including a recurring amount or a recurring period; and

adding a recurring tag to the tag array based at least in part on the recurring amount or the recurring period.

8. The computer-implemented method of claim 1, wherein validating the one or more tags stored in the tag array contained in the tag field of the one or more fields further comprises:

determining whether data in the one or more fields matches a keyword in a keyword database, each keyword associated with one or more keyword tags; and

updating the tag array of the transaction data instance with the one or more keyword tags in the keyword database when the data matches the keyword.

9. The computer-implemented method of claim 1, wherein validating the one or more tags stored in the tag array contained in the tag field of the one or more fields further comprises:

determining whether data in the one or more fields matches a keyword associated with an account in an account database; and

updating the tag array of the transaction data instance with the account in the account database when the data matches the keyword.

10. The computer-implemented method of claim 1, further comprising:

storing the tag array in keyword database.

11. A computing system comprising:

a processor; and

a non-transitory computer-readable medium having code executable by the processor to: receive, by the computing system, a transaction data instance associated with a transaction; process the transaction data instance to retrieve data contained in one or more fields of the transaction data instance; validate one or more tags stored in a tag array contained in a tag field of the one or more fields; determine a match to the data in the one or more fields; generate one or more new tags to be added to the tag array based at least in part on the match; modify the transaction data instance to include the tag array contained in the tag field; and prepare the modified transaction data instance for transmission via a digital communication channel.

12. The computing system of claim 11, the non-transitory computer-readable medium further having code executable by the processor to further:

transmit the modified transaction data instance to a user associated with the transaction, wherein the modified transaction data instance is displayed on a computing device associated with the user.

13. The computing system of claim 11, the non-transitory computer-readable medium further having code executable by the processor to further:

access a user rule database;

determine whether data in the one or more fields matches a user rule in the user rule database; and

update the tag array of the transaction data instance with the user entry in the user rule database when the data matches the user entry.

14. The computing system of claim 13, wherein the user rule is a series of transactions, each transaction in the series of transactions represented as a transaction data instance having one or more fields.

15. The computing system of claim 14, the non-transitory computer-readable medium further having code executable by the processor to further:

when the data does not match a user rule in the user rule database, identify a vendor contained in a vendor field of the one or more fields of the transaction data instance;

access a vendor database;

determine whether the vendor matches a vendor entry in the vendor database; and

update the tag array with the vendor entry in the vendor database when the vendor matches the vendor entry.

16. The computing system of claim 15, the non-transitory computer-readable medium further having code executable by the processor to further:

when the vendor does not match a vendor entry in the vendor database, generate an array with the one or more fields of the transaction data instance;

process the array with a classifier;

generate a report of the array, including one or more categorizations;

modify the report, including one or more modified categorizations; and

retrain the classifier based at least in part on the modified report.

17. The computing system of claim 11, the non-transitory computer-readable medium further having code executable by the processor to further:

Identify erroneous tags based at least in part on historical tags;

remove erroneous tags;

determine whether the transaction instance is a recurring transaction, including a recurring_amount or a recurring period; and

append a recurring tag to the tag array based at least in part on the recurring amount or the recurring period.

18. The computing system of claim 11, the non-transitory computer-readable medium further having code executable by the processor to further:

determine whether data in the one or more fields matches a keyword in a keyword database, each keyword associated with one or more keyword tags; and

update the tag array of the transaction data instance with the one or more keyword tags in the keyword database when the data matches the keyword.

19. The computing system of claim 11, the non-transitory computer-readable medium further having code executable by the processor to further:

determine whether data in the one or more fields matches a keyword associated with an account in an account database; and

update the tag array of the transaction data instance with the account in the account database when the data matches the keyword.

20. The computing system of claim 11, the non-transitory computer-readable medium further having code executable by the processor to further:

store the tag array in keyword database.