Abstract: A method, system, and apparatus for cleansing personally identifiable information from transaction history records. Descriptive text from each transaction entry of the transaction history is converted to tokens and each token is evaluated, on a per-user basis, for repetitiveness and, on a global basis, for uniqueness to compute a metric by which a given token may be indicated as containing personally identifiable information. Ad-hoc rules may further be employed to indicate whether a given token contains personally identifiable information. Tokens indicated as containing personally identifiable information are then masked in a cleansed transaction history output which may be further associated with metadata from the cleansing process.
Abstract: A method, system, and apparatus for cleansing personally identifiable information from transaction history records. Descriptive text from each transaction entry of the transaction history is converted to tokens and each token is evaluated, on a per-user basis, for repetitiveness and, on a global basis, for uniqueness to compute a metric by which a given token may be indicated as containing personally identifiable information. Ad-hoc rules may further be employed to indicate whether a given token contains personally identifiable information. Tokens indicated as containing personally identifiable information are then masked in a cleansed transaction history output which may be further associated with metadata from the cleansing process.