AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR AUTOMATED ADJUDICATION OF COMMERCIAL SUBSTANCE, RELATED PARTIES, AND COLLECTABILITY
Systems and methods for adjudicating AI-augmented automated analysis of documents in order to quickly and efficiently make various adjudications based on the documents are provided, including adjudications as to whether the documents represent underlying data that meets one or more predefined or dynamically-determined criteria. Criteria for adjudication may include commercial-substance criteria, related-party-transaction criteria, and/or collectability criteria. A system may receive a plurality of documents and generate a plurality of feature vectors by applying natural language processing techniques. The system may apply one or more classification models to the plurality of feature vectors to generate output data classifying each of the feature vectors. The system may identify, for each feature vector, a subset of closest matching prior feature vectors. Based on the classification and based on the identified subset, the system may adjudicate each feature vector with respect to commercial substance, including an adjudication classification and an adjudication confidence score.
Latest PricewaterhouseCoopers LLP Patents:
- SYSTEMS AND METHODS FOR CONSTRUCTING TOPIC-SPECIFIC KNOWLEDGE GRAPHS
- METHOD AND APPARATUS TO EXTRACT CLIENT DATA WITH CONTEXT USING ENTERPRISE KNOWLEDGE GRAPH FRAMEWORK
- AI-AUGMENTED COMPOSABLE AND CONFIGURABLE MICROSERVICES FOR RECORD LINKAGE AND RECONCILIATION
- AI-AUGMENTED COMPOSABLE AND CONFIGURABLE MICROSERVICES FOR DETERMINING A ROLL FORWARD AMOUNT
- METHOD AND APPARATUS FOR DECENTRALIZED PRIVACY PRESERVING AUDIT BASED ON ZERO KNOWLEDGE PROOF PROTOCOL
This application claims the benefit of U.S. Provisional Application No. 63/217,119 filed Jun. 30, 2021; U.S. Provisional Application No. 63/217,123 filed Jun. 30, 2021; U.S. Provisional Application No. 63/217,127 filed Jun. 30, 2021; U.S. Provisional Application No. 63/217,131 filed Jun. 30, 2021; and U.S. Provisional Application No. 63/217,134, filed Jun. 30, 2021, the entire contents of each of which are incorporated herein by reference.
FIELDThis related generally to AI-augmented automated analysis of documents, and more specifically to AI-augmented automated analysis of documents for use in auditing platforms to automated adjudication of commercial substance, related parties, and collectability.
BACKGROUNDTraditional methods for processing documents to evaluate commercial-substance criteria, related-party-transaction criteria, and/or collectability criteria rely on manual evaluation by humans reviewers.
SUMMARYThere is a need for improved methods for AI-augmented automated analysis of documents in order to quickly and efficiently make various adjudications based on the documents, including adjudications as to whether the documents represent underlying data that meets one or more predefined or dynamically-determined criteria.
Systems and methods for adjudicating AI-augmented automated analysis of documents in order to quickly and efficiently make various adjudications based on the documents are disclosed herein, including adjudications as to whether the documents represent underlying data that meets one or more predefined or dynamically-determined criteria. Criteria for adjudication may comprise commercial-substance criteria, related-party-transaction criteria, and/or collectability criteria.
In some embodiments, a first system is provided, the first being for classifying documents, the first system comprising one or more processors configured to cause the first system to: receive a plurality of documents; generate a plurality of feature vectors by, for each of the plurality of documents, applying one or more natural language processing techniques to generate a respective feature vector representing the document; apply one or more classification models to the plurality of feature vectors to generate output data classifying each of the feature vectors into a respective one or more of a plurality of classes; identify, based on the feature vectors, a subset of a second plurality of feature vectors that most closely matches each of the respective feature vectors of the plurality of feature vectors; determine, based on the output data classifying each of the feature vectors and based on the identification of the subset of the subset, a plurality of adjudications for each of the plurality of feature vectors representing the documents, wherein each of the plurality of adjudications comprises an adjudication classification and an adjudication confidence score.
In some embodiments of the first system: the one or more processors are configured to apply one or more models to each of the plurality of feature vectors to compute a respective change for one or more characteristics; and wherein the determination of the plurality of adjudications is further based on the computed respective changes for the one or more characteristics.
In some embodiments of the first system, the one or more characteristics comprise one or more of the following: a risk characteristic, a timing characteristic, and an amount characteristic.
In some embodiments of the first system, determining the adjudication classification comprises determining whether the document meets commercial substance criteria.
In some embodiments, a first non-transitory computer-readable storage medium is provided, the first non-transitory computer-readable storage medium storing instructions for classifying documents, the instructions configured to be executed by one or more processors to cause the system to: receive a plurality of documents; generate a plurality of feature vectors by, for each of the plurality of documents, applying one or more natural language processing techniques to generate a respective feature vector representing the document; apply one or more classification models to the plurality of feature vectors to generate output data classifying each of the feature vectors into a respective one or more of a plurality of classes; identify, based on the feature vectors, a subset of a second plurality of feature vectors that most closely matches each of the respective feature vectors of the plurality of feature vectors; determine, based on the output data classifying each of the feature vectors and based on the identification of the subset of the subset, a plurality of adjudications for each of the plurality of feature vectors representing the documents, wherein each of the plurality of adjudications comprises an adjudication classification and an adjudication confidence score.
In some embodiments, a first method is provided, the first method being for classifying documents, wherein the first method is executed by a system comprising one or more processors, the first method comprising: receiving a plurality of documents; generating a plurality of feature vectors by, for each of the plurality of documents, applying one or more natural language processing techniques to generate a respective feature vector representing the document; applying one or more classification models to the plurality of feature vectors to generate output data classifying each of the feature vectors into a respective one or more of a plurality of classes; identifying, based on the feature vectors, a subset of a second plurality of feature vectors that most closely matches each of the respective feature vectors of the plurality of feature vectors; determining, based on the output data classifying each of the feature vectors and based on the identification of the subset of the subset, a plurality of adjudications for each of the plurality of feature vectors representing the documents, wherein each of the plurality of adjudications comprises an adjudication classification and an adjudication confidence score.
In some embodiments, a second system is provided, the second system being for identifying related parties within a plurality of databases, the second system comprising one or more processors configured to cause the second system to: receive a data set indicating a first set of parties related to an entity; generate, based on the first set of parties, a graph data structure representing a first plurality of relationships between the entity and the first set of parties; submit one or more the parties of the first set of parties as one or more input queries to obtain, from a plurality of databases, a second set of parties related to the one or more input queries; update, based on the second set of parties, the graph data structure to represent a second plurality of relationships between the entity and the second set of parties.
In some embodiments of the second system, the one or more processors are configured to apply one or more deambiguation models to the second set of parties before updating the graph data structure based on the second set of parties.
In some embodiments, a second non-transitory computer-readable storage medium is provided, the second non-transitory computer-readable storage medium storing instructions for identifying related parties within a plurality of databases, the instructions configured to be executed by a system comprising one or more processors configured to cause the system to: receive a data set indicating a first set of parties related to an entity; generate, based on the first set of parties, a graph data structure representing a first plurality of relationships between the entity and the first set of parties; submit one or more the parties of the first set of parties as one or more input queries to obtain, from a plurality of databases, a second set of parties related to the one or more input queries; update, based on the second set of parties, the graph data structure to represent a second plurality of relationships between the entity and the second set of parties.
In some embodiments, a second method is provided, the second method being for identifying related parties within a plurality of databases, wherein the second method is executed by a system comprising one or more processors, the second method comprising: receiving a data set indicating a first set of parties related to an entity; generating, based on the first set of parties, a graph data structure representing a first plurality of relationships between the entity and the first set of parties; submitting one or more the parties of the first set of parties as one or more input queries to obtain, from a plurality of databases, a second set of parties related to the one or more input queries; updating, based on the second set of parties, the graph data structure to represent a second plurality of relationships between the entity and the second set of parties.
In some embodiments, a third system is provided, the third system being for anomaly recognition and analysis, the third system comprising one or more processors configured to cause the third system to: receive input data representing a plurality of interactions between a first entity and a plurality of respective entities; apply one or more anomaly-recognition models to generate anomaly data representing a first subset of the interactions as anomalous; identify a second subset of the interactions, wherein the second subset is a subset of the first subset, wherein identification of the second subset is based on the anomaly data and based on a data structure representing a plurality of relationships between the first entity and a set of entities related to the entity.
In some embodiments of the third system, the input data comprises transaction data.
In some embodiments of the third system, the second subset of interactions are identified as transactions for which there is an elevated risk of related-party anomalies.
In some embodiments, a third non-transitory computer-readable storage medium is provided, the third non-transitory computer-readable storage medium storing instructions for anomaly recognition and analysis, the instructions configured to be executed by a system comprising one or more processors to cause the system to: receive input data representing a plurality of interactions between a first entity and a plurality of respective entities; apply one or more anomaly-recognition models to generate anomaly data representing a first subset of the interactions as anomalous; identify a second subset of the interactions, wherein the second subset is a subset of the first subset, wherein identification of the second subset is based on the anomaly data and based on a data structure representing a plurality of relationships between the first entity and a set of entities related to the entity.
In some embodiments, a third method is provided, the third method being for anomaly recognition and analysis, wherein the third method is executed by a system comprising one or more processors, the third method comprising: receiving input data representing a plurality of interactions between a first entity and a plurality of respective entities; applying one or more anomaly-recognition models to generate anomaly data representing a first subset of the interactions as anomalous; identifying a second subset of the interactions, wherein the second subset is a subset of the first subset, wherein identification of the second subset is based on the anomaly data and based on a data structure representing a plurality of relationships between the first entity and a set of entities related to the entity.
In some embodiments, a fourth system is provided, the fourth system being for behavioral modeling and analysis, the fourth system comprising one or more processors configured to cause the fourth system to: receive first input data comprising a data structure representing a relationships amongst a plurality of entities; receive second input data representing behavior of one or more of the entities represented in the data structure; apply one or more behavioral models to determine, based on the first input data and the second input data, a risk of related-party anomaly represented by the second input data.
In some embodiments, a fourth non-transitory computer-readable storage medium is provided, the fourth non-transitory computer-readable storage medium storing instructions for behavioral modeling and analysis, the instructions configured to be executed by a system comprising one or more processors to cause the system to: receive first input data comprising a data structure representing a relationships amongst a plurality of entities; receive second input data representing behavior of one or more of the entities represented in the data structure; apply one or more behavioral models to determine, based on the first input data and the second input data, a risk of related-party anomaly represented by the second input data.
In some embodiments, a fourth method is provided, the fourth method being for behavioral modeling and analysis, wherein the fourth method is executed by a system comprising one or more processors, the fourth method comprising: receive first input data comprising a data structure representing a relationships amongst a plurality of entities; receive second input data representing behavior of one or more of the entities represented in the data structure; apply one or more behavioral models to determine, based on the first input data and the second input data, a risk of related-party anomaly represented by the second input data.
In some embodiments, a fifth system is provided, the fifth system being for predicting likelihood of collection, the fifth system comprising one or more processors configured to cause the fifth system to: receive a first data set comprising endogenous information pertaining to a transaction; receive a second data set comprising exogenous information related to one or more parties to the transaction; configure a collectability uncertainty model based on the first data set and to the second data; receive a third data set comprising information regarding the transaction; and provide the information regarding the interaction to the collectability uncertainty model to generate an output indicating a likelihood of collection for the transaction.
In some embodiments of the fifth system, the endogenous information comprises one or more selected from the following: payment history information of a party to the transaction; credit assessment information conducted prior to the initiation of the transaction; and payment history information of one or more parties related to a party to the transaction.
In some embodiments of the fifth system, exogenous information comprises one or more selected from the following: economic behavior information of an industry related to a party to the transaction; economic behavior information of a value chain of a party to the transaction; news information related to a party, a related industry, or a related value chain to the transaction; product review information, employee sentiment information; and consumer sentiment information.
In some embodiments of the fifth system, the third data set comprises information regarding a prior dispute between a plurality of entities to the transaction.
In some embodiments of the fifth system, applying the collectability uncertainty model comprises: generating an initial prediction of uncertainty based on the first data set comprising the endogenous information; and applying one or more predictive models based on the second data set comprising the exogenous information.
In some embodiments of the fifth system, the collectability uncertainty model is validated following the occurrence of a rare event and based on its predictions in response to the rare event.
In some embodiments, a fifth non-transitory computer-readable storage medium is provided, the fifth non-transitory computer-readable storage medium storing instructions for predicting likelihood of collection, the instructions configured to be executed by a system comprising one or more processors to cause the system to: receive a first data set comprising endogenous information pertaining to a transaction; receive a second data set comprising exogenous information related to one or more parties to the transaction; configure a collectability uncertainty model based on the first data set and to the second data; receive a third data set comprising information regarding the transaction; and provide the information regarding the interaction to the collectability uncertainty model to generate an output indicating a likelihood of collection for the transaction.
In some embodiments, a fifth method is provided, the fifth method being for predicting likelihood of collection, wherein the fifth method is executed by a system comprising one or more processors, the fifth method comprising: receiving a first data set comprising endogenous information pertaining to a transaction; receiving a second data set comprising exogenous information related to one or more parties to the transaction; configuring a collectability uncertainty model based on the first data set and to the second data; receiving a third data set comprising information regarding the transaction; and providing the information regarding the interaction to the collectability uncertainty model to generate an output indicating a likelihood of collection for the transaction.
In some embodiments, a sixth system is provided, the sixth system being for classifying documents, the sixth system comprising one or more processors configured to cause the sixth system to: receive data representing a document; apply one or more natural language processing techniques to the received data to generate a feature vector representing the document; identify, based on the feature vector, a second feature vector from a case library based on a similarity to the feature vector; apply a plurality of models to the feature vector to compute respective changes for a plurality of characteristics represented by the document; and determine, based on the identified second feature vector and based on the computed respective changes for the plurality of characteristics, an adjudication for the document, wherein the adjudication comprises an adjudication classification and an adjudication confidence score.
In some embodiments, a sixth non-transitory computer-readable storage medium is provided, the sixth non-transitory computer-readable storage medium storing instructions for classifying documents, the instructions configured to be executed by one or more processors to cause the system to: receive data representing a document; apply one or more natural language processing techniques to the received data to generate a feature vector representing the document; identify, based on the feature vector, a second feature vector from a case library based on a similarity to the feature vector; apply a plurality of models to the feature vector to compute respective changes for a plurality of characteristics represented by the document; and determine, based on the identified second feature vector and based on the computed respective changes for the plurality of characteristics, an adjudication for the document, wherein the adjudication comprises an adjudication classification and an adjudication confidence score.
In some embodiments, a sixth method is provided, the sixth method being for classifying documents, wherein the sixth method is executed by a system comprising one or more processors, the sixth method comprising: receiving data representing a document; applying one or more natural language processing techniques to the received data to generate a feature vector representing the document; identifying, based on the feature vector, a second feature vector from a case library based on a similarity to the feature vector; applying a plurality of models to the feature vector to compute respective changes for a plurality of characteristics represented by the document; and determining, based on the identified second feature vector and based on the computed respective changes for the plurality of characteristics, an adjudication for the document, wherein the adjudication comprises an adjudication classification and an adjudication confidence score.
In some embodiments, a seventh system is provided, the seventh system being for identifying relationships between entities represented within one or more data sets, the seventh system comprising one or more processors configured to cause the seventh system to: receive one or more data sets representing a plurality of entities; generate, based at least in part on the one or more data sets, a graph data structure representing entities amongst the plurality of entities as nodes and representing relationships between pairs of entities as edges between corresponding pairs of nodes; receive input data indicating a pair of query entities; and determine, based at least in part on the graph data structure, whether one or more related-entity criteria are satisfied for the pair of query entities.
In some embodiments, a seventh non-transitory computer-readable medium is provided, the seventh non-transitory computer-readable medium storing instructions for identifying relationships between entities represented within one or more data sets, the instructions configured to be executed by a system comprising one or more processors to cause the system to: receive one or more data sets representing a plurality of entities; generate, based at least in part on the one or more data sets, a graph data structure representing entities amongst the plurality of entities as nodes and representing relationships between pairs of entities as edges between corresponding pairs of nodes; receive input data indicating a pair of query entities; and determine, based at least in part on the graph data structure, whether one or more related-entity criteria are satisfied for the pair of query entities.
In some embodiments, a seventh method is provided, the seventh method being for identifying relationships between entities represented within one or more data sets, wherein the seventh method is executed by a system comprising one or more processors, the seventh method comprising: receiving one or more data sets representing a plurality of entities; generating, based at least in part on the one or more data sets, a graph data structure representing entities amongst the plurality of entities as nodes and representing relationships between pairs of entities as edges between corresponding pairs of nodes; receiving input data indicating a pair of query entities; determining, based at least in part on the graph data structure, whether one or more related-entity criteria are satisfied for the pair of query entities
In some embodiments, any one or more of the features, characteristics, or aspects of any one or more of the above systems, methods, or non-transitory computer-readable storage media may be combined, in whole or in part, with one another and/or with any one or more of the features, characteristics, or aspects (in whole or in part) of any other embodiment or disclosure herein.
Various embodiments are described with reference to the accompanying figures, in which:
There is a need for improved methods for AI-augmented automated analysis of documents in order to quickly and efficiently make various adjudications based on the documents, including adjudications as to whether the documents represent underlying data that meets one or more predefined or dynamically-determined criteria. In some embodiments, a set of documents (and/or other data) may be automatically ingested and evaluated to adjudicate whether the documents/data represent an arrangement or contract that meets criteria for commercial substance. In some embodiments, a set of documents (and/or other data) may be automatically ingested and evaluated to adjudicate whether the documents/data represent related parties, a transaction between related parties, and/or a transaction that complies with criteria/requirements regarding transactions between related parties. In some embodiments, a set of documents (and/or other data) may be automatically ingested and evaluated to adjudicate whether the documents/data represent a transaction and/or a party that satisfy collectability criteria, including by adjudicating a likelihood of collectability.
Commercial Substance
As explained above, there is a need for improved methods for AI-augmented automated analysis of documents in order to quickly and efficiently make various adjudications based on the documents, including adjudications as to whether the documents represent transaction, agreement, contract, arrangement, or other interaction of commercial substance.
Improved systems meeting these needs may have application in various use cases, including in quickly and accurately assessing compliance with revenue recognition standards (e.g., IFRS 15/ASC 606) under which one or more of the of the criteria requires that an agreement (e.g., contract, transaction, etc.) has commercial substance. A transaction, agreement, contract, arrangement, or other interaction may be said to have commercial substance when it is expected that future cash flows of an entity (e.g., a business) will change as a result of the interaction. A change in cash flows may be deemed to be present when there is a change (e.g., a change that is significant enough to meet one or more criteria) in any one or more of the following (not including tax considerations):
-
- Risk: Such as experiencing an increase in the risk that inbound cash flows will not occur as the result of a transaction; for example, a business accepts junior secured status on a debt in exchange for a larger repayment amount.
- Timing: Such as a change in the timing of cash inflows received as the result of a transaction; for example, a business agrees to a delayed payment in exchange for a larger amount.
- Amount: Such as a change in the amount paid as the result of a transaction; for example, a business receives cash sooner in exchange for receiving a smaller amount.
If monetary gains exist due to exchange transactions, the transaction may be said to have commercial substance, and if there is no change in monetary gains, the transaction may be said not to have commercial substance. In some embodiments, if commercial substance exists in a transaction, then the transaction is recorded at the fair value of an asset; while, if commercial substance does not exist in the transaction, then the transaction is recorded at a book value of the asset.
One example of a transaction with no commercial substance is a sale of assets to the owner of a sole proprietorship, who immediately leases it back to the business. There is little distinction between a proprietorship and its owner, so it is likely that no real change of ownership occurred. Another example of a transaction with no commercial substance is the swapping of bandwidth capacity between different Internet and phone service providers. By doing so, both entities recognize revenue, when in fact no real revenue generation occurs that would result in a change in profits.
Traditional methods for evaluation of commercial substance rely on human evaluation, which is introduces inaccuracy due to human error, inefficiencies, and the possibility of human-introduced biases. Furthermore, human adjudication according to known methods provides insufficient granularity (e.g., at the transaction level) and is difficult or impossible to scale (e.g., for full population testing). Thus, there is a need for systems and methods for performing automated adjudication of commercial substance based on processed documents or other data, so as to improve efficiency and accuracy and reduce human-introduced biases.
Disclosed herein are systems configured for AI-augmented adjudication of commercial substance of an interaction (e.g., an arrangement, agreement, contract, transaction, or other underlying data) represented by one or more ingested documents (or other data). As explained herein, the systems disclosed herein may apply a plurality of AI techniques—including, for example, developing feature vectors, clustering, classification, and adjudication—to enable automated determination of whether one or more commercial substance criteria are satisfied. The systems disclosed herein may perform one or more automated assessments of risk, timing, and/or amount of cash flow evidenced by the documents being analyzed in order to make a determination as to whether the documents represent an interaction of commercial substance.
As explained herein, adjudication of commercial substance by the systems disclosed herein may comprise using a feature vector to represent an interaction (e.g., a contract or transaction), wherein the feature vector may be used for clustering, classification, similarity search, and/or adjudication. As further explained herein, resulting data from multiple approaches may be integrated in order to generate an overall adjudication of commercial substance.
In some embodiments, a system for automated adjudication of commercial substance is provided. The system may be configured to receive one or more documents (e.g., PDF documents, word processing documents, JPG documents, etc.) or other data and to automatically process the received documents in order to extract information from said documents. The extracted information may be assessed to determine whether the information represents one or more interactions, such as a contract or a transaction. The system may then evaluate the extracted information regarding those one or more interactions to determine whether the one or more interactions meet predefined (or dynamically determined) commercial substance criteria.
In some embodiments, determining whether commercial substance criteria are met may be performed, at least in part, by generating and evaluating one or more feature vectors. The system may be configured to automatically generate a feature vector (which may be referred to as a “case vector”) that represents an identified interaction (e.g., an identified contract, transaction, or the like) in the received documents.
Generating the feature vector may comprise perform structural, semantic, and/or linguistic analysis of the ingested documents (e.g., contract, purchase order, etc.) using Natural Language Processing (NLP) techniques. The analysis using NLP techniques may generate an output indicating a respective scope for each of one or more sections of a document being assessed; analysis using NLP techniques may generate an output indicating a respective topic for each of one or more sections of a document being assessed.
In some embodiments, the feature vector may be configured to represent content, scope, party identity, timing, amounts, locations, terms, or the like, as extracted from one or more documents. In some embodiments, the feature vector may be configured to represent the “what,” “when,” and “how” of a contract, transaction, or the like. In some embodiments, one or more fields in a feature vector may represent (or may be associated with) a confidence value indicating a level of confidence for the feature vector field. In some embodiments, a feature vector may represent information indicating an obligation for one party to the other party in a contract. In some embodiments, a feature vector may represent information indicating consideration in a contract. In some embodiments, a feature vector may represent information indicating whether consideration comprises in-kind exchanges. In some embodiments, a feature vector may represent information indicating when an obligation and consideration will be fulfilled (e.g., a point in time, a window of time, and/or a schedule of various points/windows of time). In some embodiments, a feature vector may represent information indicating how consideration will be provided from one side to the other side in a contract. In some embodiments, a feature vector for represent information indicating entity names or identities of parties to the interaction. In some embodiments, a feature vector for represent information indicating duration (e.g., duration of an agreement or contract).
In some embodiments, the system may generate or augment feature vector based in part on contextual data and/or metadata that is available to the system via one or more sources separate from the document or documents being analyzed. For example, the system may leverage metadata from a financial system and/or data from a contract management system to generate and/or augment a feature vector.
In some embodiments, the system may generate a feature vector using feature engineering comprising one or more of the following:
-
- entity name of one or more parties to an interaction, potentially normalized with respect to data set of entity names
- duration of an agreement
- characteristics of transfer of title, obligations, transaction pricing, considerations, and/or whether it is an exchange
- payment terms
- shipping terms
- additional data (e.g., metadata) from one or more other data sources (e.g., from a financial system and/or contract management system), such as valuation information (e.g., initial value, depreciation, fair value)
In some embodiments, the system may generate a feature vector using document embedding (see Dai, A. M., Olah, C. and Le, Q. V., 2015. Document embedding with paragraph vectors. arXiv preprint arXiv:1507.07998) and/or autoencoder (see, e.g., Li, J., Luong, M. T. and Jurafsky, D., 2015. A hierarchical neural autoencoder for paragraphs and documents. arXiv preprint arXiv:1506.01057).
Following creation of a feature vector representing an interaction (e.g., representing a transaction or contract), the system may use the feature vector to cluster and/or classify the interaction represented by the feature vector. In some embodiments, clustering may be performed such that feature vectors within the same cluster as one another are more similar to one another than they are to feature vectors belonging to other clusters. In some embodiments, the system may apply one or more classification models (e.g., machine-learning and/or AI classification models) to the feature vector. In some embodiments, a classification model may be configured to classify a feature vector as either representing (a) an interaction having commercial substance or (b) an interaction not having commercial substance. In some embodiments, the classification model may be configured to classify a feature vector into one or more of any suitable number of classifications. In some embodiments, a classification model may be configured to assign a respective confidence value to a classification of a feature vector. In some embodiments, the classification model may be a machine-learning model trained using training data based on previous interactions (e.g., previous contracts) that have been determined to be with or without commercial substance.
In some embodiments, clustering analysis may be applied such that interactions with similar feature vectors may be adjudicated simultaneously to optimize computational speed for reasoning and adjudication. In some embodiments, the clustering analysis may comprise applying unsupervised clustering such as K-means, which may which enable similar documents to be clustered together. In some embodiments, the clustering analysis may comprise applying hierarchical clustering, which may reduce the number of dimensions through methods such as singular value decomposition. (See, e.g., Castelli, V., Thomasian, A. and Li, C. S., 2003. CSVD: Clustering and singular value decomposition for approximate similarity search in high-dimensional spaces. IEEE Transactions on knowledge and data engineering, 15(3), pp. 671-685.)
In some embodiments, classification analysis may comprise classifying an interaction based on the feature vector (regardless, in some embodiments, of whether the feature vector is constructed based on feature engineering or through the use of document embedding and/or autoencoder techniques). In some embodiments the classification analysis may comprise use of one or more supervised machine learning models (e.g., SVM) and/or deep learning models (see, e.g., DistilBERT (Sanh, V., Debut, L., Chaumond, J. and Wolf, T., 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.); MT-DNN (Liu, X., He, P., Chen, W. and Gao, J., 2019. Multi-task deep neural networks for natural language understanding. arXiv preprint arXiv:1901.11504)). In some embodiments, the classification analysis may be applied to classify feature vectors into a class indicating commercial substance or a class indicating lack of commercial substance. In some embodiments, the classification analysis may comprise generating one or more confidence levels associated with classification of a feature vector.
In some embodiments, the system may be configured to identify, from a set of feature vectors representing a plurality of other interactions/contracts/transactions, a subset of feature vectors that are the most similar to the target feature vector for adjudication. For example, a respective similarity score between the target feature vector and each respective feature vector in the set of other feature vectors may be generated, and a subset of top feature vectors with the highest similarity scores may be selected. In some embodiments, the subset of all feature vectors with a similarity score over a predetermined or dynamically determined threshold similarity score value may be selected. In some embodiments, the subset of feature vectors with the top k similarity scores may be selected, where the value k may be determined in accordance with system settings, user input, and/or dynamic determination based on the processed documents or other information available to the system.
In some embodiments, assessing similarity for the purpose of selecting a subset of similar feature vectors may comprise determining similarity based on any one or more of the following similarity metrics: L0, L1, L2, or Linfinity. L2 may also be referred to as Euclidean distance. In some embodiments, weighting for one or more parts of the feature vectors may be introduced in determining similarity between feature vectors. Weights may in some embodiments be learned through iterative refinement.
In some embodiments, the function F=Σci Xi may be used, where: Xi=1 {i=1 . . . k}, if the interaction is considered to have commercial substance; Xi=(−1) {i=1 . . . k} if the interaction is considered to not have commercial substance; and ci is the metric for the similarity measure of the feature vector for the interaction. The interaction may be considered to have commercial substance if the metric F for the interaction is higher than a threshold, where the threshold is between −1 and 1.
In some embodiments, the system may be configured to apply AI-augmented reasoning to determine a respective change for one or more characteristics, wherein the determination may be based on the feature vector and/or based on additional information available to the system (e.g., enterprise resource planning (ERP) data from one or more financial systems) available to the system and determined to be pertinent to the interaction. In some embodiments, the characteristics assessed for change via the interaction (e.g., under a contract) may comprise one or more of the following:
-
- a. Risk change—an assessment of the risk of inbound cash flow before & after the interaction;
- b. Timing change—an assess of the timing of cash inflow before & after the interaction; and
- c. Amount change—an assessment of cash inflow amount before and after the interaction.
In some embodiments, assessed change to each of the one or more characteristics may be quantified in terms of any suitable unit or any suitable score. In some embodiments, assessed change to each of the one or more characteristics may be classified as significant or as insignificant in accordance with whether the assessed change exceeds a threshold.
In some embodiments, assessment of change to each of the one or more characteristics may be conducted entirely automatically by reasoning engine, or may be conducted with human augmentation based on user input.
The inputs considered by the system in assessing change to each of the one or more characteristics may comprise the feature vector and an assessed impact on cash flow as a result of the result of the interaction represented by the feature vector. In some embodiments, the assessment may be performed in part based on the following:
-
- Risk—adjudicating risk to the cash flow may be based on obligations of the interaction, for example obligations represented in a contract document being analyzed. As an example, accepting a junior debt as opposed to senior debt may introduces higher risk as junior debt is placed at less priority in receiving repayment when going through liquidation during a bankrupt proceeding.
- Timing—adjudicating whether there will be a change cash inflow as a result of the timing of the execution may be assessed in terms of obligations. Delay of payment in exchange for larger payment may be considered as constituting commercial substance.
- Amount—cash inflow amount before and after the interaction (e.g., before and after execution of a contract or other agreement) may be considered. As an example, a sales order with zero amount (such as for sending samples to potential customers) may not produce impact on the cash flow and hence may not be deemed to constitute commercial substance.
After performing one or more of the preceding three analyses—(a) the clustering/classification of the feature vector, (b) the subset of feature vectors that are the most similar to the target feature vector for adjudication, and (c) the changes for one or more characteristics—the system may then render an overall adjudication as to whether or not the interaction represented by the feature vector satisfies one or more commercial substance criteria. The adjudication may be based on any one or more of the preceding analyses. In some embodiments, the overall determination as to whether commercial substance criteria are satisfied may be made based on calculating a commercial substance score based on the preceding analyses, and determining whether the commercial substance score satisfies a commercial substance score threshold.
Rendering the overall adjudication as to whether or not the interaction represented by the feature vector satisfies one or more commercial substance criteria may comprise generating an indication as to whether or not the one or more commercial substance criteria are satisfied. Rendering the adjudication may also comprise generating a confidence score indicating a level of confidence in all or part of the adjudication (e.g., with respect to the overall adjudication, with respect to one or more of the underlying preceding analyses, and/or with respect to a specific commercial substance criteria from amongst a plurality of commercial substance criteria).
In some embodiments, the overall adjudication may be based in part on the clustering/classification of the feature vector described above. A clustering/classification that classifies the feature vector as one representing commercial substance or that cluster the feature vector with other feature vectors representing commercial substance may weigh in favor of an overall adjudication in favor of commercial substance.
In some embodiments, the overall adjudication may be based in part on the identification of the subset of feature vectors that are the most similar to the target feature vector described above. Identification of one or more similar feature vectors in the identified subset that themselves are associated with interactions of commercial substance may weigh in favor of an overall adjudication in favor of commercial substance.
In some embodiments, the overall adjudication may be based in part on the assessed changes for one or more characteristics. More significant assessed changes for one or more of the characteristics may weigh in favor of an overall adjudication in favor of commercial substance.
In some embodiments, the preceding analyses ((a) the clustering/classification of the feature vector, (b) the subset of feature vectors that are the most similar to the target feature vector for adjudication, and (c) the changes for one or more characteristics) may be assessed independently or in combination with one another to determine the overall adjudication. In some embodiments, a score for one or more of the preceding analyses may be calculated. In some embodiments, if any one of the preceding analyses satisfies overall adjudication criteria (e.g., a threshold), then the overall adjudication may be rendered in favor of a finding of commercial substance. In some embodiments, if any one of the preceding analyses fails to satisfy overall adjudication criteria (e.g., a threshold), then the overall adjudication may be rendered against a finding of commercial substance. In some embodiments, the scores or assessments of the preceding analyses may be combined with another and assessed collectively, for example by being used (weighted or unweighted) to compute an overall adjudication score as a sum or as a product, and the overall adjudication score may be compared to a threshold to determine the overall adjudication.
As shown in
The input documents may be processed via one or more document understanding operations in order to extract information from the input documents.
In some embodiments, processing the documents via one or more document understanding operations may comprise performing structural, semantic, and/or linguistic analysis of the documents. Structural analysis of the one or more documents may enable the identification of the one or more sections of one or more of the documents. For example, a contract contain one or more of the following sections, any one or more of which may be identified by the system:
-
- Preamble, Recital, Words of Agreement
- Definitions
- Action Section (Consideration)
- Reps & Warranties
- Covenants & Rights
- Conditions to Obligations
- Endgame Provisions & Remedies
- General Provisions
- Signatures
In some embodiments, system 100 may be configured in accordance with the assumption that it is often in the “action section(s)” of a contract that obligations and considerations are set forth. The action section may contain the exchange of promises that is the subject matter of the agreement. It may specifically identify the value to be exchanged between the parties. For example, it may identify goods or services to be provided to the other party. It may indicate the total amount or unit rate of currency exchanged in the transaction. This section may sets the stage for other contractual terms that support this exchange.
Duties of the parties may comprise:
-
- Rights of each party
- Relevant dates
- Relevant prices or other dollar amounts
- Relevant quantities
- Payment terms
- Lump sum, COD, installments
- Payment due dates
- Taxes
- Interest
- Late fees
In some embodiments, performing semantic analysis comprises leveraging topic modeling in natural language processing (NLP) so that the intention of one or more sections, subsections, and/or paragraphs of a document is correctly identified. Linguistic analysis may classify sentences based on modality into either epistemic vs. deontic. Obligations in contracts are usually expressed in deontic modality.
Natural language processing techniques may be used to identify one or more of the following within a contract, amendment, or other document(s) being analyzed:
-
- Title transfer—may be classified based on an action sections describing how title of goods are transferred at which point (in conjunction with delivery term).
- Obligation for one party to the other party in the contract—may be classified through the action section.
- Pricing of transaction—including both point in time and over time.
- Consideration in return—may be classified based on the action sections and may comprise fixed and variable considerations (e.g., consideration involving some form of discount).
- One aspect that may be determined is whether the transaction involves in-kind exchanges as part of the considerations. In-kind exchange could be an exchange of an equipment in a manufacturing setting or retaining a portion of raw material as the payment in the oil and gas industry.
- Timing—obligation and consideration to be fulfilled. This is to facilitate the classification of the contract to be point in time or over time.
- Fulfillment of the contract—for determining the payment term (e.g., “net 30” may indicate the payment is due 30 days after receiving the invoice) and shipping term (e.g., “EXW” may indicate transfer of title occurs at the point of origin).
As shown in
First, the feature vector may be processed via a vector based clustering and classification operation, which may cluster and/or classify the target feature vector, including with reference to one or more other feature vectors available via a case library.
Second, the feature vector may be subject to a similarity search operation (e.g., a selection of a subset of other feature vectors that are most similar to the target feature vector). This assessment may be made with reference to one or more other feature vectors available via a case library.
Third, the feature vector may be processed via an adjudication engine for assessing changes to risk, timing, and amount. This processing may be based on the target feature vector itself and on other information such as ERP data.
All three (or any one or more of the three) of these underlying analyses may then be used to generate an overall adjudication as to whether the contract satisfies commercial substance criteria, for example as described above.
There is a need for improved methods for AI-augmented automated analysis of documents in order to quickly and efficiently make various adjudications based on the documents, including adjudications as to whether the documents represent underlying data that meets one or more predefined or dynamically-determined criteria. In some embodiments, a set of documents (and/or other data) may be automatically ingested and evaluated to adjudicate whether the documents/data represent an arrangement or contract that meets criteria for commercial substance. In some embodiments, a set of documents (and/or other data) may be automatically ingested and evaluated to adjudicate whether the documents/data represent related parties, a transaction between related parties, and/or a transaction that complies with criteria/requirements regarding transactions between related parties. In some embodiments, a set of documents (and/or other data) may be automatically ingested and evaluated to adjudicate whether the documents/data represent a transaction and/or a party that satisfy collectability criteria, including by adjudicating a likelihood of collectability.
As shown in
System 100 may comprise document understanding module 102, which may comprise any one or more processors configured to perform one or more document processing operations on the contract data and/or amendment data provided by data source 101. The document processing operations performed by module 102 may include information extraction and/or structural classification that recognizes and classifies different sections of a the document. Document understanding module 204 may generate data representing information extracted from the received contract data and/or amendment data. Document understanding module 204 may generate data representing recognized document/amendment sections/structure along with associated metadata that classifies or characterizes the recognized sections/structure.
Downstream of document understanding module 102, system 100 may comprise a plurality of adjudication modules configured to receive the output data generated by document understanding module 102 (optionally along with the contracts/amendments data received from data source 101) and to process said received data to generate classification data and/or adjudication data. In the example shown, system 100 comprises transfer-of-title classification and adjudication module 103, obligation classification and adjudication module 104, and transaction price classification and adjudication module 105.
In some embodiments, one or more of the adjudication modules that generate data for inclusion in a feature vector may leverage active logic and/or passive logic. For example, a transfer-of-title adjudication module may apply an active logic by generating hypotheticals and evaluating evidence to determine whether said hypotheticals can be verified, while a transfer-of-title adjudication module may apply a passive logic by analyzing document data to identify sections (e., paragraphs) that include data indicating how title is or is not transferred.
Transfer-of-title classification and adjudication module 103 may comprise any one or more processors configured to perform one or more data processing operations for classification and/or adjudication for transfer of title. (Any data processing operation referenced herein may include application of one or more models trained by machine-learning.) Module 103 may receive the output data generated by module 102 and may process the received data to generate output data representing a transfer-of-title classification and/or a transfer-of-title adjudication.
Obligation classification and adjudication module 104 may comprise any one or more processors configured to perform one or more data processing operations for classification and/or adjudication for obligations. Module 104 may receive the output data generated by module 102 and may process the received data to generate output data representing an obligation classification and/or an obligation adjudication.
Transaction price classification and adjudication module 105 may comprise any one or more processors configured to perform one or more data processing operations for classification and/or adjudication for transaction price. Module 105 may receive the output data generated by module 102 and may process the received data to generate output data representing a transaction price classification and/or a transaction price adjudication.
The output data generated by one or more of transfer-of-title classification and adjudication module 103, obligation classification and adjudication module 104, and transaction price classification and adjudication module 105 may be used to create a feature vector 106. Feature vector 106 may include an indication of a classification and/or adjudication for each of the upstream modules, and may optionally include a confidence level associated with one or more of the included classifications and/or adjudications.
System 100 may comprise risk/timing/amount adjudication module 107, which may comprise any one or more processors configured to perform one or more data processing operations for adjudication of risk, timing, and/or amount. Module 107 may receive feature vector 106 and may process feature vector 106 to generate output data representing an adjudication of risk, timing, and/or amount for the contract data and/or amendment data originally received from data source 101. Output data generated by module 107 may be provided to overall adjudication module 112, described in further detail below.
In some embodiments, the data processing operations performed by module 107 may be further based on ERP data received from ERP data source 108.
ERP data source 108 may comprise any one or more computer storage devices such as databases, data stores, data repositories, live data feeds, or the like. ERP data source 108 may be communicatively coupled to one or more other components of system 100 and configured to provide ERP data to system 100, such that the ERP data can be assessed used in generating data representing one or more adjudications regarding contract data and/or amendment data received from data source 101. In some embodiments, one or more components of system 100 may receive ERP data from ERP data source 108 on a scheduled basis, in response to a user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent. ERP data received from ERP data source 108 may be provided in any suitable electronic data format.
System 100 may comprise similarity search module 109, which may comprise any one or more processors configured to perform one or more similarity search operations. A similarity search operation may use an input feature vector to compute a distance between the feature vector and one or more (e.g., a collection of) other feature vectors, for example feature vectors that are in a case library and that characterize respective cases. The distance calculate may, for example, be a Euclidean distance. A similarity search operation may comprise ranking distances from the smallest distance (most similar) to the greatest distance (least similar).
Module 109 may receive feature vector 106 and may receive one or more additional feature vectors (e.g., feature vectors representing other source documents and generated in a same or similar manner as feature vector 106) from case library 110 (described in further detail below). Similarity search module 109 may compare feature vector 106 to the one or more feature vectors received from case library 110 in order to generate output data representing a similarity between feature vector 106 and the one or more feature vectors received from case library 110. Comparing feature vectors may comprise computing a distance (e.g., a weighted distance) between two feature vectors being compared. The output data may comprise a similarity score (e.g., including a distance metric such as a Euclidean stance) and/or an indication of one or more parameters that are similar or that are different between the compared vectors. In some embodiments, the system may search for the most-similar feature vector (or feature vectors) in case library 110 and may then look up stored adjudication results for the identified most-similar case(s). The adjudication results for the identified most-similar case(s) may be included in the output data generated by similarity search module 109, for example such that the current case may be adjudicated by the system in a similar manner. Output data generated by module 109 may be provided to overall adjudication module 112, described in further detail below.
Case library 110 may comprise any one or more computer storage devices such as databases, data stores, data repositories, live data feeds, or the like. Case library 110 may be communicatively coupled to one or more other components of system 100 and configured to provide data regarding previously-assessed and/or previously adjudicated data to system 100, such that the system may leverage the data regarding previous assessments/adjudications in rendering present assessments/adjudications. In some embodiments, case library 110 may store feature vectors representing previously-ingested contract data and/or amendment data that may have been generated in a same or similar manner as feature vector 107. In some embodiments, one or more components of system 100 may receive data from case library 110 on a scheduled basis, in response to a user input, in response to one or more trigger conditions being met, and/or in response to the data being manually sent. Data received from case library 110 may be provided in any suitable electronic data format.
System 100 may comprise case-vector clustering and classification module 111, which may comprise any one or more processors configured to perform one or more case vector clustering and/or classification data processing operations. Module 111 may receive feature vector 106 as input data and may receive one or more feature vector clusters from case library 110 as input data. Module 111 may process feature vector 106 and the received one or more feature vector clusters to generate similarity metrics and/or an indication of the most-similar identified feature-vector clusters, similar to the manner described above with respect to module 109. Module 111 may look up adjudication results for one or more feature vectors in the identified most-similar cluster(s), and said results may be included in the output data generated by module 111, for example such that the current case may be adjudicated by the system in a similar manner. In some embodiments, the output data generated by module 111 may be provided to overall adjudication module 112, described in further detail below.
In some embodiments, module 111 may additionally perform clustering based on feature vector 106, and may store feature vector 106 as part of one or more clusters in in case library 110.
System 100 may comprise overall adjudication module 112, which may comprise any one or more processors configured to perform one or more overall adjudication data processing operations. Module 112 may receive output data generated from one or more of the following: risk/timing/amount adjudication module 107, similarity search module 109, and/or case-vector clustering and classification module 111. Module 112 may process the received data to generate output data comprising an overall adjudication for the contract data and/or amendment data originally received from data source 101. The overall adjudication output data generated by module 112 may include a binary indication of an adjudication (e.g., whether one or more criteria are met, such as to whether the data received meets one or more commercial-substance criteria, one or more related-party criteria, and/or one or more collectability criteria). In some embodiments, the overall adjudication output data may comprise a tuple (L, C) where L represents data indicating a likelihood that one or more criteria (e.g., commercial substance) are met and where C represents data indicating a confidence level for the adjudication. The output data generated by module 112 may be stored, transmitted, presented to a user, used to generate one or more visualizations, and/or used to trigger one or more automated system actions.
As shown, contract data 113, which may be data received from data source 101 in
In some embodiments, module 114 may perform one or more document structure and layout analysis data processing operations including, for example, segmentation of a document into a plurality of different regions based on the layout of the document, and including, for example, classification of one or more of the regions into section classes such as title, section title, paragraphs, bullet list, number list, figures, tables, etc. Machine learning and deep learning techniques may be leveraged for this purpose.
Structural classification module 114 may generate section data 115, which may form all or part of the input for module 103. The input data received by module 103 may be provided in any suitable structured, partially structured, and/or unstructured format.
Module 103 may comprise sentence classification module 116, which may comprise any one or more processors configured to perform one or more sentence classification data processing operations. In some embodiments, sentence classification module 116 may classify, for example using machine learning, each sentence in a document that is likely to include discussions of certain topics, such as transfer of title. Module 116 may receive section data 115 as input data, and/or may receive contract data 113 as input data, and may process the received data to generate sentence data 117 representing sentence classification information, for example including data indicating what sentences of the contract data 113 include title-transfer language. Sentence data 117 may, for example, comprise an indication of a sentence that is determined to (or is determined to not) include title-transfer language. Sentence data 117 may comprise an indication of a type of title-transfer language that is included in the identified sentence(s). Sentence data 115 may comprise an indication of a confidence level indicating a confidence of one or more of the determinations indicated in sentence data 117.
Module 103 may comprise transfer-of-title classification module 118, which may comprise any one or more processors configured to perform one or more transfer-of-title classification operations. Module 118 may receive sentence data 117, section data 115, and/or contract data 113 as input data, and may process the received data to generate transfer-of-title classification data 118(a). The transfer-of-title classification data 118(a) may indicate a classification for the received data (and/or for the underlying contract data and/or amendment data received from data source 101) indicating a transfer-of-title classification, for example by indicating whether the data represents a full transfer of title, a partial transfer of title, or no transfer of title. The transfer-of-title classification data may also include an indication of a confidence level indicating a confidence of one or more of the classifications. The transfer-of-title classification data 118(a) may constitute, or may be comprised in, the output data representing a transfer-of-title classification and/or a transfer-of-title adjudication as generated by module 103 described above with reference to
Module 103 may be communicatively coupled to sample clause database 119, which may comprise any one or more computer storage devices such as databases, data stores, data repositories, live data feeds, or the like. Sample clause database 119 may be communicatively coupled to transfer-of-title classification module 118 and configured to receive classification information pertaining to documents, amendments, document sections, and/or sentences from transfer-of-title classification module 118. The clauses themselves may be stored in database 119 in association with classification results pertaining to said clauses. Transfer-of-title classification module 118 may be configured to receive data stored in sample clause database 119 and to use said received data to generate output data, for example by comparing sample clauses received from database 119 to clauses being analyzed.
Module 103 may be communicatively coupled to continuous learning module 120, which may comprise any one or more processors configured to perform one or more machine-learning operations based on clause data and/or clause classification data stored in sample clause database 119. Continuous learning module 120 may be used to train one or more data processing operations applied by module 118 in order to improve performance of module 118.
Module 104 may receive input data comprising contract data 113, which may be data received from data source 101 in
Module 104 may comprise linguistic modality classification module 121, which may comprise any one or more processors configured to perform one or more linguistic modality data processing operations. Module 121 may receive contract data 113 and may process said received data to generate output data comprising an indication of one or more linguistic modalities, for example epistemic output data 123 and/or deontic output data 124. Output data 123 and/or 124 may include an indication of a linguistic modality and/or an associated confidence level.
Module 104 may comprise structural classification module 122, which may share any one more characteristics in common with structural classification module 114 described above with reference to
Module 104 may comprise promiser/promisee/beneficiary classifier module 125, which may comprise any one or more processors configured to perform one or more promiser/promisee/beneficiary classifier data processing operations. Module 125 may receive linguistic modality classification data (e.g., data 123 and/or 124), structural classification data (e.g., as generated by module 22), and/or contract data 113, and may process said received data to generate output data that classifies the received data according to whether said input data relates to a promiser, a promise, and/or a beneficiary. In some embodiment, module 125 may receive deontic data 124 as input, and may not receive epistemic data 123 as input.
Module 104 may comprise obligation classifier module 126, which may comprise any one or more processors configured to perform one or more obligation classifier data processing operations. Module 126 may receive linguistic modality classification data (e.g., data 123 and/or 124), structural classification data (e.g., as generated by module 22), and/or contract data 113, and may process said received data to generate output data that classifies the received data according to whether said input data relates to a an obligation, a permission, and/or a prohibition. For example, module 126 may generate obligations output data 127 (identifying portions of the contracts represented by contract data 113 that relate to an obligation, and/or associated confidence level), permissions output data 128 (identifying portions of the contracts represented by contract data 113 that relate to a permission, and/or associated confidence level), and/or prohibitions output data 128 (identifying portions of the contracts represented by contract data 113 that relate to a prohibition, and/or associated confidence level). In some embodiment, module 126 may receive deontic data 124 as input, and may not receive epistemic data 123 as input.
In some embodiments, module 125 and module 126 may work (alone and/or together) to generate output data that associates identified obligations (e.g., 127), permissions (e.g., 128) and/or prohibitions (e.g., 129) with one or more identified promisers, promisees, and/or beneficiaries. Promiser output data 130 (identifying obligations, permissions, and/or associations associated with a promiser), promise output data 131 (identifying obligations, permissions, and/or associations associated with a promisee), and/or beneficiary output data 132 (identifying obligations, permissions, and/or associations associated with a beneficiary) may thereby be generated.
Obligations classification data 127, 128, 129, 130, 131, and/or 132, as generated by module 104, may constitute, or may be comprised in, the output data representing a obligations classifications and/or a obligations adjudication as generated by module 104 described above with reference to
As shown, contract data 113, which may be data received from data source 101 in
In some embodiments, module 134 may perform one or more document structure and layout analysis data processing operations including, for example, segmentation of a document into a plurality of different regions based on the layout of the document, and including, for example, classification of one or more of the regions into section classes such as title, section title, paragraphs, bullet list, number list, figures, tables, etc. Machine learning and deep learning techniques may be leveraged for this purpose.
Structural classification module 133 may generate section data 134, which may form all or part of the input for module 105. The input data received by module 105 may be provided in any suitable structured, partially structured, and/or unstructured format.
Module 105 may comprise sentence classification module 135, which may comprise any one or more processors configured to perform one or more sentence classification data processing operations. In some embodiments, a sentence classification data processing operation may include segmentation of a document into a plurality of different sentences, and may include classification of one or more of the sentences into sentence classes. Machine learning and deep learning techniques may be leveraged for this purpose. Identification of sentences may be leveraged to identify certain content in certain types of sentences, as certain types of sentences may be known to include certain content relating to a specific focus of the sentence, for basic price (fixed consideration) and potential discount rules (variable considerations).
Module 135 may receive section data 134 as input data, and/or may receive contract data 113 as input data, and may process the received data to generate sentence data 136 representing sentence classification information, for example including data indicating what sentences of the contract data 113 include title-transfer language. Sentence data 136 may, for example, comprise an indication of a sentence that is determined to (or is determined to not) include title-transfer language. Sentence data 136 may comprise an indication of a type of title-transfer language that is included in the identified sentence(s). Sentence data 136 may comprise an indication of a confidence level indicating a confidence of one or more of the determinations indicated in sentence data 136.
Module 105 may comprise considerations classification module 137, which may comprise any one or more processors configured to perform one or more considerations classification operations. Module 137 may receive sentence data 136, section data 134, and/or contract data 113 as input data, and may process the received data to generate consideration classification data. The considerations classification data may indicate a classification for the received data (and/or for the underlying contract data and/or amendment data received from data source 101) indicating a consideration classification. The consideration classification data may also include an indication of a confidence level indicating a confidence of one or more of the classifications.
Module 105 may be communicatively coupled to sample clause database 138, which may comprise any one or more computer storage devices such as databases, data stores, data repositories, live data feeds, or the like. Sample clause database 138 may be communicatively coupled to considerations classification module 137 and configured to receive classification information pertaining to documents, amendments, document sections, and/or sentences from considerations classification module 137. The clauses themselves may be stored in database 138 in association with classification results pertaining to said clauses. Considerations classification module 137 may be configured to receive data stored in sample clause database 138 and to use said received data to generate output data, for example by comparing sample clauses received from database 138 to clauses being analyzed. In some embodiments, a single database may be used in place of sample clause database 138 and sample clause database 119.
Module 105 may be communicatively coupled to continuous learning module 139, which may comprise any one or more processors configured to perform one or more machine-learning operations based on clause data and/or clause classification data stored in sample clause database 138. Continuous learning module 139 may be used to train one or more data processing operations applied by module 137 in order to improve performance of module 137.
Module 105 may comprise dependency parsing module 140, which may comprise any one or more processors configured to perform one or more dependency parsing data processing operations. A dependency parsing data processing operation may include determining dependencies between phrases of a sentence in order to determine the grammatical structure of the sentence. A sentence may be divided into sub-sections based on this determination. Dependency parsing may be based on an assumption that there is a direct relationship between each linguistic unit in a sentence. Module 140 may receive considerations classification data from module 137 and may process said considerations classification data to generate dependency parsing output data. Dependency parsing output data may comprise a dependency graph describing syntactic relationships between different part of a sentence.
Module 105 may comprise mapping module 141, which may comprise any one or more processors configured to perform one or more mapping data processing operations to map consideration data to obligation and/or duration data. Module 141 may accept input data from module 140, module 137, and/or module 126 in
The output data generated by mapping module 141 may include data indicating a correspondence between (a) one or more considerations included in a contract and/or amendment represented by contracts data 113 and (b) one or more determined obligations and/or durations. The output data may further include one or more confidence levels (e.g., scores) associated with any determined mapping. Output data generated by mapping module 141 may include transaction price classification data 142-144, which may indicate associations between identified obligations, respective fixed considerations, and respective variable considerations.
Transaction price classification data 142-144 may include a list of obligations with fixed and variable considerations (e.g., fixed price, discount). For example, obligations may include:
-
- 1. sku_1, base price_1, volume_discount_1;
- 2. sku_2, base price_2, fiexed_discount_2;
- 3. sku_3, base price_3, complex discount rules.
Transaction price classification data 142-144 may constitute, or may be comprised in, the output data representing a transaction price classification and/or a transaction price adjudication as generated by module 105 described above with reference to
In some embodiments, feature vector 107 may include one or more confidence values associated with a value in feature vector 107 and/or one or more values indicating a quantity, amount, or extent of evidence associated with a value in feature vector 107. The confidence value and/or evidence value may be provided as a component weight (or a portion of a component weight) in feature vector 107, and may be used when computing distances between different feature vectors.
In some embodiments, feature vector 107 may be six-dimensional, having three components that each have two dimensions. A first component may include a first dimension representing the existence or absence of evidence for transfer-of-title, and a second dimension representing a confidence level and/or evidence level associated with the first component. A second component may include a first dimension representing the existence or absence of evidence for obligations, and a second dimension representing a confidence level and/or evidence level associated with the second component. And a third component may include a first dimension representing the existence or absence of evidence for consideration (e.g., transaction price), and a second dimension representing a confidence level and/or evidence level associated with the third component.
Module 107 may include three parallel classification modules 146, 147, and 148, each of which may include any one more processors configured to perform respective data analysis operations based on the received input data.
Risk classification module 146 may accept feature vector 145 (or 106) as input and may generate output data indicating a risk value and an associated confidence level. The risk value may indicate whether and/or an extent to which an obligation indicated in the document data materially changed a profile of the risk (e.g., a future cash flow will no longer take place).
Timing classification module 147 may accept feature vector 145 (or 106) as input and may generate output data indicating a timing value and an associated confidence level. The output data generated by module 147 may indicate classifying whether and/or an extent to which an obligation indicated in the document data materially changed timing (e.g., the timing of the cash flow).
Consideration classification module 148 may accept feature vector 145 (or 106) as input and may generate output data indicating a consideration value (e.g., a value representing an amount) and an associated confidence level. The risk value may indicate whether and/or an extent to which an obligation indicated in the document data materially changed an amount paid amount paid as a result of a transaction.
Module 107 may include three parallel classification modules 151, 152, and 153, each of which may include any one or more processors configured to perform a respective comparison (a) values determined by classification modules 126, 127, and 148 respectively to (b) comparison ERP data.
Risk comparison module 151 may compare the risk value determined by risk classification module 146 to ERP data representing a risk value, and may generate risk change data 154, which may include a risk change value and a risk change confidence level associated with the determination of the risk change value.
Timing comparison module 152 may compare the timing value determined by timing classification module 147 to ERP data representing a timing value, and may generate timing change data 155, which may include a timing change value and a timing change confidence level associated with the determination of the timing change value.
Consideration/amount comparison module 153 may compare the consideration/amount value determined by consideration/amount classification module 148 to ERP data representing a consideration/amount value, and may generate change data 154, which may include a consideration/amount change value and a consideration/amount change confidence level associated with the determination of the consideration/amount change value.
The change values (and associated confidence levels) 154, 155, and/or 156 may constitute, or may be comprised in, the output data generated by module 105 described above with reference to
As shown in
Meanwhile, adjudication reconciliation engine 160 may also receive output data from adjudication module 107. As discussed above, adjudication module 107 may generate output data indicating a risk change and associated confidence level, timing change and associated confidence level, and amount change and associated confidence level. In some embodiments, data received by engine 160 from module 107 and from module 158 may be in the same format; in some embodiments, data received by engine 160 from module 107 and from module 158 may be in different formats.
Adjudication reconciliation engine 160 may apply one or more data processing operations to reconcile, combine, and/or otherwise process the received input data in order to generate output data as described above. Adjudication reconciliation engine 160 may, in some embodiments, average corresponding input values weighted according to confidence values and/or weighted according to one or more other weighting factors. Adjudication reconciliation engine 160 may, in some embodiments, select preferred values and/or discard non-preferred values. The overall adjudication output data generated by module 112 may include a binary indication of an adjudication (e.g., whether one or more criteria are met, such as to whether the data received meets one or more commercial-substance criteria, one or more related-party criteria, and/or one or more collectability criteria). In some embodiments, the overall adjudication output data may comprise a tuple (L, C) where L represents data indicating a likelihood that one or more criteria (e.g., commercial substance) are met and where C represents data indicating a confidence level for the adjudication. The output data generated by module 112 may be stored, transmitted, presented to a user, used to generate one or more visualizations, and/or used to trigger one or more automated system actions.
Related PartiesAs explained above, there is a need for improved methods for AI-augmented automated analysis of documents in order to quickly and efficiently make various adjudications based on the documents, including adjudications as to whether the documents indicate that relationships amongst two or more parties and as to whether the documents represent transactions, agreements, contracts, arrangements, or other interactions amongst related parties.
Improved systems meeting these needs may have application in various use cases, including in quickly and accurately assessing compliance with regulations and/or best practices regarding related-party transactions. For example, compliance with ASC 850 may require identification of related parties and adjudication as to whether one or more interactions constitute a related-party transaction.
In some embodiments, compliance with regulations and/or best practices may require that financial statements disclose material related-party transactions other than compensation arrangements, expense allowances, or other similar items that occur in the ordinary course of business. For these purposes, a related party may be defined to include any party that controls or can significantly influence the management or operating policies of another entity to the extent that the other entity may be prevented from fully pursuing its own interests. Related parties may include affiliates, investees accounted for by the equity method, trusts for the benefit of employees, principal owners, management, and/or immediate family members of owners and/or management. In some embodiments, compliance and/or best practices may require that transactions with related parties be disclosed even if there is no accounting recognition made for such transactions (e.g., even if a service is performed without payment). In some embodiments, compliance and/or best practices may require that disclosures do not assert that the terms of related-party transactions were essentially equivalent to arm's-length dealings unless those claims can be substantiated. In some embodiments, compliance and/or best practices may require that, if the financial position or results of operations of the reporting entity could change significantly because of common control or common management, disclosure of the nature of the ownership or management control must be made, even if there were no transactions between the entities.
Examples of related-party transactions (or other interactions) may include transactions (or other interactions) between: a parent entity and a subsidiary; two or more subsidiaries of a common parent entity; an entity and a trust for the benefit of employees of the entity (such as a pension trust or a profit-sharing trust that is managed by or under the trusteeship of the entity); an entity and its principal owners, management, and/or members of their immediate families; and/or affiliates.
Transactions between related parties may occur in the normal course of business. Examples of common transactions between related parties may include: sales, purchases, and/or transfers of real and/or personal property; services received and/or furnished, such as accounting, management, engineering, and/or legal services; use of property and/or equipment by lease or otherwise; borrowings, lendings, and/or guarantees; maintenance of compensating bank balances for the benefit of a related party; intra-entity billings based on allocations of common costs; filings of consolidated tax returns.
Thus, compliance with regulations and/or best practices may require accurate identification of which parties should be considered related parties to one another and of which interactions between parties should be considered related-party transactions (e.g., which interactions satisfy one or more related-party transaction criteria).
Traditional methods for determination of relationships amongst parties and identification and assessment of related-party transactions rely on voluntary manual disclosure and human evaluation, which introduces inaccuracy due to human error, inefficiencies, incompleteness, and the possibility of human-introduced biases or dishonesty. Using traditional methods, it is difficult to identify undisclosed related parties from sampled transactions, and it is difficult to distinguish mere accounting mistakes from accounting fraud.
Thus, there is a need for systems and methods for performing automated identification of related parties and automated identification and adjudication of related-party transactions based on processed documents or other data, so as to improve efficiency and accuracy and reduce human-introduced biases.
Disclosed herein are systems configured for AI-augmented identification of related-parties and adjudication of whether an interaction (e.g., an arrangement, agreement, contract, transaction, or other underlying data) represented by one or more ingested documents (or other data) meets related-party criteria. As explained herein, the systems disclosed herein may automatically generate and iteratively/recursively augment data structures representing relationships amongst parties, such as a related party graph, based on received documents and/or interrogation of a plurality of data sources. The systems disclosed herein may further automatically adjudicate whether an interaction meets related-party criteria (e.g., whether a transaction constitutes a related-party transaction) based on the generated data structure representing related parties and/or based on anomaly detection and behavioral modeling of one or more parties to the an interaction.
In some embodiments, a system for automated determination of relationships amongst parties and for automated adjudication of related-party transactions is provided. The system may be configured to receive one or more documents (e.g., PDF documents, word processing documents, JPG documents, etc.) or other data and to automatically process the received documents in order to extract information from said documents. The extracted information may be assessed to identify and characterize relationships amongst one or more parties, to identify additional relationships amongst parties based on querying additional da sources, and to determine whether the information represents one or more interactions, such as a contract or a transaction. The system may then evaluate the extracted information regarding those one or more interactions to determine whether the one or more interactions meet predefined (or dynamically determined) related-party-transaction criteria.
In some embodiments, determining whether related-party-transaction criteria are met may be performed, at least in part, based on evaluating a data structure (which may be generated by the system) that represents relationships amongst parties, for example a graph data structure. determining whether related-party-transaction criteria are met may be performed, at least in part, based on behavioral modeling. In some embodiments, determining whether related-party-transaction criteria are met may be performed, at least in part, by generating and evaluating one or more feature vectors. The system may be configured to automatically generate a feature vector (which may be referred to as a “case vector”) that represents an identified interaction (e.g., an identified contract, transaction, or the like) in the received documents.
In some embodiments, the system may initially identify one or more relationships amongst parties based on information extracted from one or more documents and/or based on information submitted by one or more users. For example, initial information about related parties may be automatically extracted, using one or more document-understanding techniques, from public disclosure documents and/or from documents provided during an audit process. The initial information may identify, for example, directors, shareholders, bond holders, investors, and/or other stakeholders as related to a corporate entity. This initial information may be used to generate a data structure, such as a graph data structure, representing an initial understanding of known relationships amongst a plurality of parties. In some embodiments, entities may be represented in the graph data structure as nodes, and relationships between entities may be represented in the graph data structure as edges linking sets of nodes together.
After an initial version of the data structure representing relationships has been generated, the system may augment or otherwise update the data structure by building upon it to represent additional relationships (and/or to represent additional information about the already-depicted relationships) that were not disclosed in the initially-processed documents. In some embodiments, the system may augment or otherwise update the data structure by sending one or more queries to one or more entity databases, wherein the query inputs are based on the name/identity of the target entity and/or the name/identity of one or more entities that have been determined initially to be related to the target entity. New entities that are returned as results of these queries may be de-ambiguated and then added to the data structure representing related parties. A data structure representing related parties may be updated in response to a user request, in response to a triggering event, in response to receiving new data/documents, and/or in response to receiving a request to make an adjudication regarding a potential related-party-transaction involving an entity represented by the data structure.
In some embodiments, the system may be additionally be configured to perform behavior modeling for one or more entities that are the subject of or are otherwise included in a data structure (e.g., a graph data structure) representing entity relationships. A system may, for example, analyze and assess a historical record of behaviors by one or more entities with respect to, for example, any one or more of the following behaviors:
-
- Order-to-cash behaviors (e.g., on a per-entity basis, over an accounting period, evaluate and identify discrepancy between collectability and original credit evaluation);
- Explicit & implicit discount given behaviors (e.g., as recorded in ERP data and/or as determined from actual payment data);
- Collection activity behaviors (e.g., behaviors regarding treatment of overdue invoices, allowance/reserve (e.g., for writing off unpaid invoices after a certain number of days beyond invoice due date)); and/or
- Management override behaviors.
Behavioral modeling may, in some embodiments, be based on data that is extracted from the one or more documents that are received by the system and subject to one or more document-understanding processes, as described herein.
Once a data structure representing related parties has been created and behavioral modeling has been performed for one or more parties represented in the data structure, the system may then use the data structure and the results of the behavioral modeling to determine any correlation between behaviors (per the behavioral models) of entities represented in a same data structure and indicated as related to one another. In some embodiments, the system may determine that a highly correlated behavior model (e.g., a correlation score exceeding a certain threshold value) and a related-party graph may indicate that there is a high risk of related-party anomaly. In some embodiments, the system may be configured to assign a score to quantify an assessed risk of related-party anomaly, as determined based on the behavior model and on the data structure representing related parties. For example, if behaviors of a transaction correlate highly with behavior models that are determined by the system to be indicative of related-party transactions, and if the data structure indicates that the parties to the transaction are related to one another, then the system may determine that there is a risk that the transaction is a “related-party transaction” that satisfies one or more related-party transaction criteria, for example requiring reporting or disclosure in order to comply with regulations or best practices.
In some embodiments, the system may render an adjudication for an interaction, such as a transaction, represented by documents received by the system, wherein the adjudication determined whether the interaction satisfies one or more related-party-transaction criteria (e.g., according to ASC 850), for example requiring reporting or disclosure in order to comply with regulations or best practices.
As shown by block 201, in some embodiments, the method steps shown downstream of block 201 may be performed for a given entity. An entity may comprise a person, corporation, partnership, organization, government, university, town, city, country, or the like. The system executing method 200 may receive data representing the given entity, for example including structured, unstructured, and/or partially structured data. In some embodiments, the system may extract data regarding the given entity from one or more documents, for example by applying one or more document understanding techniques. In some embodiments, the system may identity the given entity from among a plurality of entities represented in the received data. In some embodiments, the given entity may be specified by a user input received by the system.
Turning first to blocks 202-208, the system may generate a first graph data structure representing director relationships, parent relationships, and/or subsidiary relationships for the given entity.
At block 202, in some embodiments, the system may apply one or more data analysis operations to automatically identify, based on received documents and/or other data representing the given entity, one or more directors associated with the given entity. In some embodiments, persons or entities having a different specified role (other than director) may instead (or additionally) be identified at block 202. In some embodiments, the system may store data and/or metadata, locally and/or remotely, representing the one or more directors identified as associated with the given entity. Data indicating the identity of the director may be stored, and metadata representing information about the director's role (e.g., time information, location information, etc.) may be stored in association therewith.
At block 204, in some embodiments, the system may apply one or more data analysis operations to automatically identify one or more entities associated with the identified director(s).
The system may determine what entities are associated with the identified director(s) by analyzing received documents and/or other data representing the given entity, received documents and/or other data representing other entities, and/or any other data available to the system. The system may in some embodiments make the identification of related parties based solely on information already available to the system. In some embodiments, the system may actively seek and retrieve information associated with one or more of the identified directors, for example by scraping said information from publicly-available data sources, in order to identify entities related to the one or more directors.
In some embodiments, the system may apply one or more relationship scoring algorithms to quantify an extent of a relationship between two entities in order to determine whether an entity should be designated as “related to” (e.g., “associated with”) a director. In some embodiments, a pair of entities may be designated as related only if their relationship score exceeds a predefined (or dynamically determined) threshold value.
At block 206, in some embodiments, the system may apply one or more data analysis operations to automatically identify, one or more parent entities and/or one or more subsidiary entities associated with the given entity. In some embodiments, entities having a different relationship (other than parent or subsidiary) may instead (or additionally) be identified at block 206. In some embodiments, the system may store data and/or metadata, locally and/or remotely, representing the one or more entities identified as associated with the given entity. Data indicating the identity of the entity may be stored, and metadata representing information about the entity (e.g., time information, location information, etc.) may be stored in association therewith.
The system may identify associated entities by analyzing received documents and/or other data representing the given entity, received documents and/or other data representing other entities, received documents and/or other data representing one or more of the identified directors, and/or any other data available to the system. The system may in some embodiments make the identification of related parties based solely on information already available to the system. In some embodiments, the system may actively seek and retrieve information associated with one or more of the identified directors, for example by scraping said information from publicly-available data sources, in order to identify entities related to the one or more directors.
In some embodiments, the system may apply one or more relationship scoring algorithms to quantify an extent of a relationship between two entities in order to determine whether an entity should be designated as “related to” (e.g., “associated with”) the given entity. In some embodiments, a pair of entities may be designated as related only if their relationship score exceeds a predefined (or dynamically determined) threshold value.
At block 208, in some embodiments, the system may generate and store a graph data structure representing the given entity and one or more relationships between the given entity and other entities. The other entities in the graph data structure may include persons, corporations, partnerships, organizations, governments, universities, towns, cities, countries, or the like. In some embodiments, the other entities in the graph data structure may include one or more directors identified at block 202 and/or one or more subsidiary or parent organizations identified at block 206. In some embodiments, identified entities may be included in the graph data structure only if they meet one or more criteria, such as a relationship score exceeding a threshold value.
In some embodiments, the graph data structure may represent only certain kinds of relationships. For example, the graph data structure generated at block 202 may represent entities that are related to the given entity by (a) being a director of the given entity, (b) being a subsidiary of the given entity, or (c) being a parent of the given entity; while other kinds of relationships (e.g., being an employee of the given entity, being in a partnership with the given entity, etc.) may not be included.
In some embodiments, the graph data structure may represent entities as nodes and may represent relationship information as edges linking pairs of nodes. The graph data structure may store identifying data and/or metadata in association with a node representing an entity. The graph data structure may store information identifying, quantifying, and/or otherwise characterizing a relationship between two entities as edges linking pairs of nodes. In some embodiments, edges may be weighted or otherwise configured in accordance with data indicating a type, strength, or other characteristic of a relationship between two nodes. For example, an edge may be weighted according to a relationship score such that an edge has a higher weight when two entities are more closely related. In some embodiments, all information available to the system about various relationship types (and various respective relationship strengths) between two entities may be combined and normalized into a single relationship-score quantification.
Turning now to blocks 210-212, the system may generate a second graph data structure representing one or more relationship types other than director relationships, parent relationships, and/or subsidiary relationships for the given entity.
At block 210, in some embodiments, the system may apply one or more data analysis operations to automatically identify, based on received documents and/or other data representing the given entity, one or more entities associated with the given entity (in a manner other than being a director thereof, being a parent thereof, or being a subsidiary thereof). In some embodiments, the system may store data and/or metadata, locally and/or remotely, representing the one or more entities identified as associated with the given entity. Data indicating the identity of the identified entity may be stored, and metadata representing information about the entity's relationship with the target entity (e.g., time information, location information, etc.) may be stored in association therewith.
At block 212, in some embodiments, the system may generate and store a second graph data structure, distinct from the graph data structure generated at block 208, representing the given entity and one or more relationships between the given entity and other entities identified at block 210. The process for generating the second graph data structure at block 212 may share one or more characteristics in common with the process for generating the second graph data structure at block 208.
As shown by the arrows from blocks 208 and 212 back to block 201, all or part of method 200 may be performed iteratively. For example, after identifying one or more new entities that are related to the original given entity and generating an initial version of the graph data structure, a new entity may be selected (from among the nodes of the graph data structure) as the new given entity, and the process of identifying related entities may then be repeated. New nodes and/or new edges may thus be added to the previously-generated graph data structure based on newly-identified parties and/or newly-identified relationships.
In some embodiments, iteration of method 200 may continue until one or more cessation conditions are met. For example, cessation conditions may include one or more of the following: a predetermined amount of time passing, a predetermined number of iterations being performed, a predetermined number of nodes being added to the graph, a predetermined number of edges being added to the graph, a predetermined number of iterations being executed with below a threshold number of edges and/or nodes being added to the graph, and/or a predetermined number of iterations being executed within a sliding window number of iterations with below a threshold number of edges and/or nodes being added to the graph in the sliding window.
In some embodiments, the system may perform one or more new iterations of method 200 according to a predetermined schedule, according to a user input, and or automatically in response to one or more trigger conditions (e.g., the system detecting that new data is available for analysis).
In some embodiments, the system may select a new given entity as the focus for an iteration of method 200 according to a random or quasi-random selection. In some embodiments, a new given entity for an iteration may be selected according to a user input. In some embodiments, a new given entity for an iteration may be selected based on the new given entity's proximity to (or distance from) a previously-analyzed entity. In some embodiments, a new given entity for an iteration may be selected based on the new given entity having been recently added to the graph, for example based on it being added in a previous iteration and/or based on it not being analyzed as a target entity yet.
In some embodiments, after generating one or more graph data structures, the system may analyze the one or more graph data structures to determine whether a pair of entities meets related-party criteria. In some embodiments, related party criteria may include that a pair of entities are indicated as related by each being included in the same graph data structure. In some embodiments, related party criteria may include that a pair of entities are indicated as related by each being included in a minimum threshold number of the same graph data structures. In some embodiments, related party criteria may include that a pair of entities are indicated as related by appearing in the same graph data structure (or a threshold minimum number of the same graph data structures) within a certain distance of one another. For example, the system may calculate a distance by calculating a number of “hops” separating two nodes representing the two entities in a graph data structure, and if the number of hops falls below a threshold distance, then the two entities may be determined to meet related party criteria. In some embodiments, calculating the distance between two parties in a graph data structure may include calculating a weighted distance where the distance between nodes is calculated in accordance with the number the hops as weighted on a per-hop basis according to the weight assigned to the edge connecting the nodes for that particular hop. Thus, when a pathway between two entities includes edges that are weighted more heavily, the distance between the entities may be calculated as less (indicating that the entities are more closely related) than if the edges were assigned low weight values.
CollectabilityAs explained above, there is a need for improved methods for AI-augmented automated analysis of documents in order to quickly and efficiently make various adjudications based on the documents, including adjudications as to whether the documents indicate a likelihood of collection (e.g., an adjudication of collectability).
Improved systems meeting these needs may have application in various use cases, including in quickly and accurately assessing compliance with regulations and/or best practices regarding attesting/verifying that collection is probable. For example, compliance with ASC 606 requires that collection of a transaction price for providing services or goods to a customer must be probable, where “probable” means that the future event is likely to occur.
Traditional methods for determination of collectability provide insufficient granularity for intent interpretation & behavior analysis. For example, an invoice may become overdue and eventually go into a collection process for a wide variety of reasons, such as the invoice not being received, the content or amount of the invoice being disputed, or the party to whom the invoice was issued holding payment as bargaining chip for additional concession. Existing approaches to determining collectability measure collectability at the customer level, rather than at the transaction/contract level. These existing approaches do not consider that a customer might postpone the payment for a specific contract and/or transaction because it is in dispute.
Traditional methods for determination of collectability are limited by being unduly backward looking. Collectability intends to measure the capacity and intention of the customer to pay on time. Existing approaches are based on examining payment history (assisted by the some of the ERPs), and do not sufficiently take into account the current and future situations of parties, their value chains (both upstream or downstream), and broader economic circumstances. For example, a strike at a port could prevent the export of goods and import of parts, and the ripple effect on cash flow of an entity could be anticipated. Existing techniques fail to account for possibilities such as these.
Traditional methods for determination of collectability do not account for “black swan” events (e.g., rare events). Existing approaches do not consider broader economic factors. For example, catastrophic events such as major terrorist attacks, major national/global financial crises, and/or pandemics may all incur sudden, substantial, and comprehensive disruption that cannot be forecast from past events alone. When such black swan events occur, it is likely that the many entities may attempt to preserve the cash, including drawing down credit lines, postponing payment, and/or pausing efforts that do not contribute to immediate cash flow. For collectability consideration, it is thus beneficial to consider fallout (in terms of collectability) from such black swan events, even though prediction of the actual occurrence of such events may not be needed.
Thus, there is a need for systems and methods for performing automated adjudication of collectability that are more efficient and accurate than existing techniques. There is a need for systems that apply improved collectability adjudication techniques in such a way that provides greater granularity compared to existing systems, is not unduly/solely backward-looking, and adequately accounts for consequences of potential black swan events.
Disclosed herein are systems configured for AI-augmented adjudication of collectability based on one or more ingested documents (or other data). As explained herein, the systems disclosed herein may receive one or more documents (or other data) representing one more parties and interactions (e.g., contracts, transactions, etc.) between the parties, and the system may automatically render an adjudication as to whether one or more of the parties and/or the interactions satisfy one or more collectability criteria. Collectability criteria may include a criteria that collection is more likely than not, and/or may include a criteria that the likelihood of collection meets a certain likelihood threshold (e.g., 75%, 90%, etc.).
In order to determine whether collection is probable, a transaction price may first be determined before assessing collectability, where the determined transaction price accounts for any price concessions. Explicit concessions and implicit concessions may be considered; for example, an implicit concession may be supported by a vendor's history of providing a discount to a customer.
In order to assess financial capacity of an entity (e.g., a customer), in order to attempt to assess likelihood of collection, any one or more of the following may be considered: credit risk, credit history, past experience with an entity, past experience with a class of entities into which an entity falls, current economic conditions of an entity's industry, and/or an entity's income.
Collectability may be reassessed if significant changes in one or more facts or circumstances arise, for example in the event of one or more of the following: a party declares bankruptcy during a contract and/or party reports negative cash flow subsequent to the contract inception. If reassessment indicates that collectability is less than probable, then a vendor may stops recognizing revenue, but the vendor may not need to reverse previously recognized revenue.
If partial payment is received but collectability for the entire payment (e.g., for the remaining portion of payment) is less than probable, then, in some embodiments, one or more of three events must occur for the payment to be treated as revenue: (1) the collecting party has no remaining obligations to the paying party, and all of the payment promised by the paying party has been received and is nonrefundable; (2) the contract/agreement has been terminated and the payment is nonrefundable; and/or (3) the party collecting payment has transferred control of goods or services to which the consideration that has been received relates, the party collecting payment has stopped transferring goods or services to the other party (if applicable) and has no obligation under the contract to transfer additional goods or services, and the consideration received from the other party is nonrefundable.
In some embodiments, a system may receive one or more documents and may subject said received documents to one or more document-understanding techniques, for example as described herein, in order to extract data from said received documents. The data extracted from said documents may be used to render an adjudication as to collectability.
In some embodiments, an adjudication as to collectability may be rendered on the basis, at least in part, of endogenous information (received by the system from one or more sources that is endogenous with respect to an interaction for which collectability is to be adjudicated). In some embodiments, an adjudication as to collectability may be rendered on the basis, at least in part, of exogenous information (received by the system from one or more sources that is exogenous with respect to an interaction for which collectability is to be adjudicated). In some embodiments, the system may receive endogenous information and exogenous information together and may subject the received information to one or more data processing operations (e.g., models) in order to identify the endogenous information and identify the exogenous information. In some embodiments, the system may receive an initial input indicating certain endogenous and/or exogenous information, and the system may locate and identify other endogenous and/or exogenous information based on the information received.
In some embodiments, endogenous information that is received may include information and knowledge related to a contract (or other interaction) that can be used to determine a level of uncertainty for on-time payment (e.g., used to assess likelihood of collection). Endogenous information received may include, for example:
-
- payment history, including differentiation for different products, services and/or product/service categories;
- credit assessment (e.g., conducted when onboarding a customer, prior to initiation of a contract/interaction for which collectability is being assessed);
- payment history of other entities (e.g., entities within the same sector/industry, to establish a benchmark);
- payment history of other entities that are part of the value chain (upstream and/or downstream) of the target entity for which collectability is being assessed.
In some embodiments, exogenous information that is received may include information and knowledge that can be used to determine a level of uncertainty for on-time payment (e.g., used to assess likelihood of collection). Exogenous information received may include, for example:
-
- economic behavior of an industry related to the target entity;
- economic behavior of a value chain (upstream and/or downstream) of the target entity;
- information regarding news events related to the target entity, industry, and/or value chain of the target entity;
- product review information;
- employee sentiment information (e.g., sourced vis social media);
- consumer sentiment information (e.g., sourced vis social media).
In some embodiments, an adjudication as to collectability may be rendered on the basis, at least in part, of information relating to one or more disputes between two or more entities relevant to a contract or other interaction being assessed for collectability. This may include disputes involving the target contract/interaction and/or disputes involving other contracts/interactions. This information may be received as part of and/or in addition to the endogenous and/or exogenous information described above. In some embodiments, in the case of consignment agreement, disputes among entities may be included in consideration.
Once the endogenous information and exogenous information (optionally, along with any other information) is received by the system and subject to any data processing operations (e.g., document-understanding models), the system may use the received information to generate a collectability uncertainty model. The collectability uncertainty model may be developed, at least in part, based on the endogenous information and/or the exogenous information. The collectability uncertainty model may be configured generate an output regarding collectability uncertainty—e.g., predicting the uncertainty of the on-time payment behavior—for a particular entity or group of entities and/or for a particular interaction/contract or group of interactions/contracts.
A baseline uncertainty for the collectability uncertainty model may be derived from (e.g., determined on the basis of) previous payment behavior. Uncertainty may be increased for an entity with previous overdue payment, deteriorating payment behavior for the entity over time, ongoing disputes for the particular contract/interaction being assessed, and/or deteriorating payment behavior observed from one or more other entities in the same industry as the target entity.
In addition to the baseline uncertainty, one or more predictive models may be used to predict cash flow for the target entity for a relevant period of time for the contract/interaction being assessed. In some embodiments, a predictive model may be based, at least in part, on the exogenous information received by the system such as information regarding economic behavior of an industry relevant to the target entity, economic behavior of the value chain of the target entity, previous financial performance of the target entity (e.g., which may be available in the case that the target company is a public company), and/or information regarding the broader (e.g., local, national, and/or global) economic environment.
The system may be configured to apply one or more stress tests to the collectability uncertainty model (and/or to the one or more predictive models included therein) in order to assess performance of the model in response to black swan events. These stress tests may be used to validate the resiliency of the model in response to black swan events, for example in assessing the accuracy of behaviors predicted by the model in response to such events. This stress testing may be performed after the occurrence of a black swan event, when real-world data from the consequences of the event are available in order to assess the model's performance. In some embodiments, the model may be refined or otherwise updated in accordance with the results (e.g., output data regarding accuracy of the model) of one or more of said stress tests.
The collectability uncertainty model may be configured to leverage information regarding product reviews, employee sentiment, and/or consumer sentiment in generating an output regarding collectability uncertainty (e.g., predicting the uncertainty of the on-time payment behavior).
In some embodiments, the collectability uncertainty model may be configured to receive as inputs information regarding the specific contract/interaction to be adjudicated, and to use this information to adjudicate collectability. (In some embodiments, the system may consider this information in an alternative or additional manner aside from application of the collectability uncertainty model as described herein.) In some embodiments, the information regarding the specific contract/interaction to be adjudicated may include fine0grained information including due diligence information, correspondence with the entity since an invoice was created, any dispute information between entities on the contract/interaction to be adjudicated (whether regarding the contract/interaction to be adjudicated or regarding one or more other contracts/interactions). In some embodiments, in the case of a consignment agreement, disputes among entities may be included in consideration. After the collectability uncertainty model has been generated and optionally refined, the system may apply the collectability uncertainty model to adjudicate collectability for the target contract/interaction to be adjudicated. The system may receive information regarding the specific contract/interaction to be adjudicated, and may use this information to adjudicate collectability. Applying the collectability uncertainty model may include providing, as input, the information regarding the specific contract/interaction to be adjudicated (e.g., as described above), such that the collectability uncertainty model can generate output data indicating a metric for collectability. The output data may include a score for collectability, a classification for collectability (e.g., “collectible” versus “not collectible”), a predicted percentage likelihood that full collection will be made, a predicted percentage likelihood that partial payment will be made, and/or a predicted percentage likelihood that full or partial payment will be made by one or more particular points in time. The generated output data may be displayed to a user, transmitted to one or more other systems for storage, used as the basis for one or more visualizations, or used as a triggering event to applying one or more data processing operations to the generated graph data structure(s) in order to automatically determine.
At block 302, in some embodiments, the system may identify data representing an account receivable. In some embodiments, method 301 may be applied for each account receivable in an available dataset or across multiple datasets.
At block 304, in some embodiments, the system may identify an entity (e.g., a customer) indicated in the account receivable that was identified at block 302. In some embodiments, method 301 may be applied for each entity in the identified account receivable.
At block 306, in some embodiments, the system may retrieve data, if available, indicating a third-party rating or quantification for the identified customer. This data may be retrieved from any suitable public or private data source. For example, the system may retrieve data relating to a D&B rating or a rating by any suitable agency that quantifies or characterizes payability or creditworthiness for the customer.
At block 308, in some embodiments, the system may retrieve data, if available, indicating industry benchmark data, industry trend data, or the like. This data may be retrieved from any suitable public or private data source. In some embodiments, industry trend data may be generated by the system based on data relating to a plurality of individual organizations in the same industry or sector. For example, the system may retrieve data regarding financial performance of other entities in a same industry as the identified entity.
At block 310, in some embodiments, the system may retrieve data, if available, indicating any news or current events that are related to the identified entity, are related to an industry or sector of the identified entity, and/or could otherwise be expected to impact the identified entity, for example by impacting cash flow. This data may be retrieved from any suitable public or private data source.
At block 312, in some embodiments, the system may retrieve data, if available, indicating past payment behavior for the identified entity. This data may be retrieved from any suitable public or private data source.
At block 314, following from block 302, the system may identify an invoice associate with the identified account receivable. In some embodiments, method 301 may be applied for each invoice in the identified account receivable.
At block 316, in some embodiments, the system may retrieve data, if available, indicating one or more issues and/or disputes related between two or more entities relevant to the identified invoice, including disputes regarding the identified invoice and/or involving other invoices or other interactions.
At block 318, the data retrieved at blocks 306, 308, 310, 312, and/or 316 may be processed via a collectability prediction model. In some embodiments, the collectability prediction model may be configured in accordance with the data retrieved at blocks 306, 308, 310, 312, and/or 316. If data from one or more of blocks 306, 308, 310, 312, and/or 316 is not available, then the model may be configured based on the other data that is available.
In some embodiments, the collectability prediction model may be configured to accept data regarding the invoice identified at block 314 (or another invoice involving the identified entity) and to process the received data to generate an output, wherein the output may include collectability due date 320 and associated confidence level 322. The output data may include a tuple indicating a likelihood of collection and an associated confidence for the collection prediction. As examples, (100%, 90%) may indicate full collection by the due date with 90% confidence on the prediction, while (50%, 65%) may indicate 50% collection by due date with 65% confidence on the prediction.
At block 324, in some embodiments, the system may receive data indicating a collection or payment event associated with the identified invoice for which the outputs of model 318 were previously generated. Based on the data indicating a collection or payment event (and, optionally, in response to receiving said data), the system may apply one or more continuous learning data processing techniques in order process the received data and to update collectability prediction model 318, such that model 318 may be improved for future applications.
ComputerComputer 400 can be a host computer connected to a network. Computer 400 can be a client computer or a server. As shown in
Input device 420 can be any suitable device that provides input, such as a touch screen or monitor, keyboard, mouse, or voice-recognition device. Output device 430 can be any suitable device that provides an output, such as a touch screen, monitor, printer, disk drive, or speaker.
Storage 440 can be any suitable device that provides storage, such as an electrical, magnetic, or optical memory, including a random access memory (RAM), cache, hard drive, CD-ROM drive, tape drive, or removable storage disk. Communication device 460 can include any suitable device capable of transmitting and receiving signals over a network, such as a network interface chip or card. The components of the computer can be connected in any suitable manner, such as via a physical bus or wirelessly. Storage 440 can be a non-transitory computer-readable storage medium comprising one or more programs, which, when executed by one or more processors, such as processor 410, cause the one or more processors to execute methods described herein.
Software 450, which can be stored in storage 440 and executed by processor 410, can include, for example, the programming that embodies the functionality of the present disclosure (e.g., as embodied in the systems, computers, servers, and/or devices as described above). In some embodiments, software 450 can include a combination of servers such as application servers and database servers.
Software 450 can also be stored and/or transported within any computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch and execute instructions associated with the software from the instruction execution system, apparatus, or device. In the context of this disclosure, a computer-readable storage medium can be any medium, such as storage 440, that can contain or store programming for use by or in connection with an instruction execution system, apparatus, or device.
Software 450 can also be propagated within any transport medium for use by or in connection with an instruction execution system, apparatus, or device, such as those described above, that can fetch and execute instructions associated with the software from the instruction execution system, apparatus, or device. In the context of this disclosure, a transport medium can be any medium that can communicate, propagate, or transport programming for use by or in connection with an instruction execution system, apparatus, or device. The transport-readable medium can include but is not limited to, an electronic, magnetic, optical, electromagnetic, or infrared wired or wireless propagation medium.
Computer 400 may be connected to a network, which can be any suitable type of interconnected communication system. The network can implement any suitable communications protocol and can be secured by any suitable security protocol. The network can comprise network links of any suitable arrangement that can implement the transmission and reception of network signals, such as wireless network connections, T1 or T3 lines, cable networks, DSL, or telephone lines.
Computer 400 can implement any operating system suitable for operating on the network. Software 450 can be written in any suitable programming language, such as C, C++, Java, or Python. In various embodiments, application software embodying the functionality of the present disclosure can be deployed in different configurations, such as in a client/server arrangement or through a Web browser as a Web-based application or Web service, for example.
Following is a list of embodiments:
-
- Embodiment 1. A system for classifying documents, the system comprising one or more processors configured to cause the system to:
- receive data representing a document;
- apply one or more natural language processing techniques to the received data to generate a feature vector representing the document;
- identify, based on the feature vector, a second feature vector from a case library based on a similarity to the feature vector;
- apply a plurality of models to the feature vector to compute respective changes for a plurality of characteristics represented by the document; and
- determine, based on the identified second feature vector and based on the computed respective changes for the plurality of characteristics, an adjudication for the document, wherein the adjudication comprises an adjudication classification and an adjudication confidence score.
- Embodiment 2. The system of embodiment 1, wherein:
- the one or more processors are configured to identify, based on the feature vector, a cluster of feature vectors from the case library that has a highest level of similarity to the feature vector amongst feature vector clusters in the case library; and
- wherein the determination of the adjudication is further based on the identified cluster of feature vectors.
- Embodiment 3. The system of any one of embodiments 1-2, wherein the plurality of characteristics comprises one or more of the following: a risk characteristic, a timing characteristic, and an amount characteristic.
- Embodiment 4. The system of any one of embodiments 1-3, wherein applying the plurality of models to the feature vector comprises computing a plurality of characteristic and comparing the plurality of computed characteristics to corresponding baseline characteristics obtained from an ERP data source to compute the respective changes.
- Embodiment 5. The system of any one of embodiments 1-4, wherein computing the respective changes comprises generating a plurality of respective change values and a plurality of respective change confidence levels.
- Embodiment 6. The system of any one of embodiments 1-5, wherein applying the one or more natural language processing techniques to the received data to generate a feature vector comprises:
- applying a plurality of sets of models in parallel to one another, wherein each of the sets of models is configured to process the received data to generate respective output data; and
- storing the output data from each of the models in the feature vector.
- Embodiment 7. The system of embodiment 6, wherein a first set of models of the plurality of sets of models comprises a first sentence classification module and a classification module configured to generate output data relating to a first type of content of the document.
- Embodiment 8. The system of any one of embodiments 6-7, wherein a second set of models of the plurality of sets of models comprises structural classification module, a linguistic modality classification module, and a classification module configured to generate output data relating to a second type of content of the document.
- Embodiment 9. The system of any one of embodiments 6-8, wherein a third set of models of the plurality of sets of models comprises a second sentence classification module and a classification module configured to generate output data relating to a third type of content of the document.
- Embodiment 10. The system of any one of embodiments 1-9, wherein determining the adjudication classification comprises determining whether the document meets commercial substance criteria.
- Embodiment 11. The system of any one of embodiments 1-10, wherein determining the adjudication classification and the adjudication confidence score comprises applying an adjudication reconciliation data processing operation based on data associated with the identified second feature vector and based on the computed respective changes for the plurality of characteristics.
- Embodiment 12. A non-transitory computer-readable storage medium storing instructions for classifying documents, the instructions configured to be executed by one or more processors to cause the system to:
- receive data representing a document;
- apply one or more natural language processing techniques to the received data to generate a feature vector representing the document;
- identify, based on the feature vector, a second feature vector from a case library based on a similarity to the feature vector;
- apply a plurality of models to the feature vector to compute respective changes for a plurality of characteristics represented by the document; and
- determine, based on the identified second feature vector and based on the computed respective changes for the plurality of characteristics, an adjudication for the document, wherein the adjudication comprises an adjudication classification and an adjudication confidence score.
- Embodiment 13. A method for classifying documents, wherein the method is executed by a system comprising one or more processors, the method comprising:
- receiving data representing a document;
- applying one or more natural language processing techniques to the received data to generate a feature vector representing the document;
- identifying, based on the feature vector, a second feature vector from a case library based on a similarity to the feature vector;
- applying a plurality of models to the feature vector to compute respective changes for a plurality of characteristics represented by the document; and
- determining, based on the identified second feature vector and based on the computed respective changes for the plurality of characteristics, an adjudication for the document, wherein the adjudication comprises an adjudication classification and an adjudication confidence score.
- Embodiment 14. A system for identifying related parties within a plurality of databases, the system comprising one or more processors configured to cause the system to:
- receive a data set indicating a first set of parties related to an entity;
- generate, based on the first set of parties, a graph data structure representing a first plurality of relationships between the entity and the first set of parties;
- submit one or more the parties of the first set of parties as one or more input queries to obtain, from a plurality of databases, a second set of parties related to the one or more input queries; and
- update, based on the second set of parties, the graph data structure to represent a second plurality of relationships between the entity and the second set of parties.
- Embodiment 15. The system of embodiment 14, wherein the one or more processors are configured to apply one or more deambiguation models to the second set of parties before updating the graph data structure based on the second set of parties.
- Embodiment 16. A non-transitory computer-readable storage medium storing instructions for identifying related parties within a plurality of databases, the instructions configured to be executed by a system comprising one or more processors configured to cause the system to:
- receive a data set indicating a first set of parties related to an entity;
- generate, based on the first set of parties, a graph data structure representing a first plurality of relationships between the entity and the first set of parties;
- submit one or more the parties of the first set of parties as one or more input queries to obtain, from a plurality of databases, a second set of parties related to the one or more input queries; and
- update, based on the second set of parties, the graph data structure to represent a second plurality of relationships between the entity and the second set of parties.
- Embodiment 17. A method for identifying related parties within a plurality of databases, wherein the method is executed by a system comprising one or more processors, the method comprising:
- receiving a data set indicating a first set of parties related to an entity;
- generating, based on the first set of parties, a graph data structure representing a first plurality of relationships between the entity and the first set of parties;
- submitting one or more the parties of the first set of parties as one or more input queries to obtain, from a plurality of databases, a second set of parties related to the one or more input queries; and
- updating, based on the second set of parties, the graph data structure to represent a second plurality of relationships between the entity and the second set of parties.
- Embodiment 18. A system for anomaly recognition and analysis, the system comprising one or more processors configured to cause the system to:
- receive input data representing a plurality of interactions between a first entity and a plurality of respective entities;
- apply one or more anomaly-recognition models to generate anomaly data representing a first subset of the interactions as anomalous; and
- identify a second subset of the interactions, wherein the second subset is a subset of the first subset, wherein identification of the second subset is based on the anomaly data and based on a data structure representing a plurality of relationships between the first entity and a set of entities related to the entity.
- Embodiment 19. The system of embodiment 18, wherein the input data comprises transaction data.
- Embodiment 20. The system of any one of embodiments 18-19, wherein the second subset of interactions are identified as transactions for which there is an elevated risk of related-party anomalies.
- Embodiment 21. A non-transitory computer-readable storage medium storing instructions for anomaly recognition and analysis, the instructions configured to be executed by a system comprising one or more processors to cause the system to:
- receive input data representing a plurality of interactions between a first entity and a plurality of respective entities;
- apply one or more anomaly-recognition models to generate anomaly data representing a first subset of the interactions as anomalous; and
- identify a second subset of the interactions, wherein the second subset is a subset of the first subset, wherein identification of the second subset is based on the anomaly data and based on a data structure representing a plurality of relationships between the first entity and a set of entities related to the entity.
- Embodiment 22. A method for anomaly recognition and analysis, wherein the method is executed by a system comprising one or more processors, the method comprising:
- receiving input data representing a plurality of interactions between a first entity and a plurality of respective entities;
- applying one or more anomaly-recognition models to generate anomaly data representing a first subset of the interactions as anomalous; and
- identifying a second subset of the interactions, wherein the second subset is a subset of the first subset, wherein identification of the second subset is based on the anomaly data and based on a data structure representing a plurality of relationships between the first entity and a set of entities related to the entity.
- Embodiment 23. A system for behavioral modeling and analysis, the system comprising one or more processors configured to cause the system to:
- receive first input data comprising a data structure representing a relationships amongst a plurality of entities;
- receive second input data representing behavior of one or more of the entities represented in the data structure; and
- apply one or more behavioral models to determine, based on the first input data and the second input data, a risk of related-party anomaly represented by the second input data.
- Embodiment 24. A non-transitory computer-readable storage medium storing instructions for behavioral modeling and analysis, the instructions configured to be executed by a system comprising one or more processors to cause the system to:
- receive first input data comprising a data structure representing a relationships amongst a plurality of entities;
- receive second input data representing behavior of one or more of the entities represented in the data structure; and
- apply one or more behavioral models to determine, based on the first input data and the second input data, a risk of related-party anomaly represented by the second input data.
- Embodiment 25. A method for behavioral modeling and analysis, wherein the method is executed by a system comprising one or more processors, the method comprising:
- receive first input data comprising a data structure representing a relationships amongst a plurality of entities;
- receive second input data representing behavior of one or more of the entities represented in the data structure; and
- apply one or more behavioral models to determine, based on the first input data and the second input data, a risk of related-party anomaly represented by the second input data.
- Embodiment 26. A system for identifying relationships between entities represented within one or more data sets, the system comprising one or more processors configured to cause the system to:
- receive one or more data sets representing a plurality of entities;
- generate, based at least in part on the one or more data sets, a graph data structure representing entities amongst the plurality of entities as nodes and representing relationships between pairs of entities as edges between corresponding pairs of nodes;
- receive input data indicating a pair of query entities; and
- determine, based at least in part on the graph data structure, whether one or more related-entity criteria are satisfied for the pair of query entities.
- Embodiment 27. The system of embodiment 26, wherein generating the graph data structure comprises:
- selecting a first target entity from amongst the plurality of entities;
- identifying a first set of relationships between the target entity and one or more other entities within the plurality of entities; and
- storing data in the graph data structure representing the first set of relationships.
- Embodiment 28. The system of embodiment 27, wherein generating the graph data structure comprises:
- selecting a second target entity from amongst the one or more entities identified in the first set of relationships as related to the first target entity;
- identifying a second set of relationships between the second entity and one or more other entities within the plurality of entities; and
- storing data in the graph data structure representing the second set of relationships.
- Embodiment 29. The system of any one of embodiments 26-28, wherein generating the graph data structure comprises iteratively augmenting the graph data structure until one or more cessation conditions are satisfied.
- Embodiment 30. The system of any one of embodiments 26-29, wherein edges of the graph data structure are weighted according to a relationship score representing a strength of a relationship between the entities represented by the linked nodes.
- Embodiment 31. The system of any one of embodiments 26-30, wherein determining whether one or more related-entity criteria are satisfied for the pair of query entities comprises determining whether the query entities are both represented as nodes in the graph data structure.
- Embodiment 32. The system of any one of embodiments 26-31, wherein determining whether one or more related-entity criteria are satisfied for the pair of query entities comprises determining whether the query entities are separated by a distance within the graph data structure that is less than a predetermined number of hops.
- Embodiment 33. The system of any one of embodiments 26-32, wherein determining whether one or more related-entity criteria are satisfied for the pair of query entities comprises determining whether the query entities are separated by a weighted distance within the graph data structure that is less than a predetermined threshold distance, wherein the weighted distance is computed based on a number of hops between the query entities and based on weights for edges linking the one or more hops between the query entities.
- Embodiment 34. The system of any one of embodiments 26-33, wherein determining whether one or more related-entity criteria are satisfied for the pair of query entities comprises applying a behavioral modeling algorithm to the query entities.
- Embodiment 35. A non-transitory computer-readable medium storing instructions for identifying relationships between entities represented within one or more data sets, the instructions configured to be executed by a system comprising one or more processors to cause the system to:
- receive one or more data sets representing a plurality of entities;
- generate, based at least in part on the one or more data sets, a graph data structure representing entities amongst the plurality of entities as nodes and representing relationships between pairs of entities as edges between corresponding pairs of nodes;
- receive input data indicating a pair of query entities; and
- determine, based at least in part on the graph data structure, whether one or more related-entity criteria are satisfied for the pair of query entities.
- Embodiment 36. A method for identifying relationships between entities represented within one or more data sets, wherein the method is executed by a system comprising one or more processors, the method comprising:
- receiving one or more data sets representing a plurality of entities;
- generating, based at least in part on the one or more data sets, a graph data structure representing entities amongst the plurality of entities as nodes and representing relationships between pairs of entities as edges between corresponding pairs of nodes;
- receiving input data indicating a pair of query entities; and
- determining, based at least in part on the graph data structure, whether one or more related-entity criteria are satisfied for the pair of query entities.
- Embodiment 37. A system for predicting likelihood of collection, the system comprising one or more processors configured to cause the system to:
- receive a first data set comprising endogenous information pertaining to a transaction;
- receive a second data set comprising exogenous information related to one or more parties to the transaction;
- configure a collectability uncertainty model based on the first data set and to the second data;
- receive a third data set comprising information regarding the transaction; and
- provide the information regarding the interaction to the collectability uncertainty model to generate an output indicating a likelihood of collection for the transaction.
- Embodiment 38. The system of embodiment 37, wherein the endogenous information comprises one or more selected from the following: payment history information of a party to the transaction; credit assessment information conducted prior to the initiation of the transaction; and payment history information of one or more parties related to a party to the transaction.
- Embodiment 39. The system of any one of embodiments 37-38, wherein exogenous information comprises one or more selected from the following: economic behavior information of an industry related to a party to the transaction; economic behavior information of a value chain of a party to the transaction; news information related to a party, a related industry, or a related value chain to the transaction; product review information, employee sentiment information; and consumer sentiment information.
- Embodiment 40. The system of any one of embodiments 37-39, wherein the third data set comprises information regarding a prior dispute between a plurality of entities to the transaction.
- Embodiment 41. The system of any one of embodiments 37-40, wherein applying the collectability uncertainty model comprises:
- generating an initial prediction of uncertainty based on the first data set comprising the endogenous information; and
- applying one or more predictive models based on the second data set comprising the exogenous information.
- Embodiment 42. The system of any one of embodiments 37-41, wherein the collectability uncertainty model is validated following the occurrence of a rare event and based on its predictions in response to the rare event.
- Embodiment 43. The system of any one of embodiments 37-42, wherein the collectability uncertainty model is configured to generate output data comprising a collectability due date and an associated confidence level.
- Embodiment 44. The system of any one of embodiments 37-43, wherein the system is configured to:
- receive data regarding a collection event associated with the transaction; and
- apply a continuous learning feedback loop to update the collectability uncertainty model based on the data regarding the collection event.
- Embodiment 45. A non-transitory computer-readable storage medium storing instructions for predicting likelihood of collection, the instructions configured to be executed by a system comprising one or more processors to cause the system to:
- receive a first data set comprising endogenous information pertaining to a transaction;
- receive a second data set comprising exogenous information related to one or more parties to the transaction;
- configure a collectability uncertainty model based on the first data set and to the second data;
- receive a third data set comprising information regarding the transaction; and
- provide the information regarding the interaction to the collectability uncertainty model to generate an output indicating a likelihood of collection for the transaction.
- Embodiment 46. A method for predicting likelihood of collection, wherein the method is executed by system comprising one or more processors, the method comprising:
- receiving a first data set comprising endogenous information pertaining to a transaction;
- receiving a second data set comprising exogenous information related to one or more parties to the transaction;
- configuring a collectability uncertainty model based on the first data set and to the second data;
- receiving a third data set comprising information regarding the transaction; and
- providing the information regarding the interaction to the collectability uncertainty model to generate an output indicating a likelihood of collection for the transaction.
- Embodiment 1. A system for classifying documents, the system comprising one or more processors configured to cause the system to:
This application incorporates by reference the entire contents of the U.S. patent application titled “AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR AUTOMATED ASSESSMENT OF VOUCHING EVIDENCE”, filed Jun. 30, 2022, Attorney Docket no. 13574-20068.00.
This application incorporates by reference the entire contents of the U.S. patent application titled “AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR APPLYING A COMPOSABLE ASSURANCE INTEGRITY FRAMEWORK”, filed Jun. 30, 2022, Attorney Docket no. 13574-20070.00.
This application incorporates by reference the entire contents of the U.S. patent application titled “AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR AUTOMATED DOCUMENT PROCESSING”, filed Jun. 30, 2022, Attorney Docket no. 13574-20071.00.
This application incorporates by reference the entire contents of the U.S. patent application titled “AI-AUGMENTED AUDITING PLATFORM INCLUDING TECHNIQUES FOR PROVIDING AI-EXPLAINABILITY FOR PROCESSING DATA THROUGH MULTIPLE LAYERS”, filed Jun. 30, 2022, Attorney Docket no. 13574-20072.00.
Claims
1. A system for classifying documents, the system comprising one or more processors configured to cause the system to:
- receive data representing a document;
- apply one or more natural language processing techniques to the received data to generate a feature vector representing the document;
- identify, based on the feature vector, a second feature vector from a case library based on a similarity to the feature vector;
- apply a plurality of models to the feature vector to compute respective changes for a plurality of characteristics represented by the document; and
- determine, based on the identified second feature vector and based on the computed respective changes for the plurality of characteristics, an adjudication for the document, wherein the adjudication comprises an adjudication classification and an adjudication confidence score.
2. The system of claim 1, wherein:
- the one or more processors are configured to identify, based on the feature vector, a cluster of feature vectors from the case library that has a highest level of similarity to the feature vector amongst feature vector clusters in the case library; and
- wherein the determination of the adjudication is further based on the identified cluster of feature vectors.
3. The system of claim 1, wherein the plurality of characteristics comprises one or more of the following: a risk characteristic, a timing characteristic, and an amount characteristic.
4. The system of claim 1, wherein applying the plurality of models to the feature vector comprises computing a plurality of characteristic and comparing the plurality of computed characteristics to corresponding baseline characteristics obtained from an ERP data source to compute the respective changes.
5. The system of any one of claim 1, wherein computing the respective changes comprises generating a plurality of respective change values and a plurality of respective change confidence levels.
6. The system of claim 1, wherein applying the one or more natural language processing techniques to the received data to generate a feature vector comprises:
- applying a plurality of sets of models in parallel to one another, wherein each of the sets of models is configured to process the received data to generate respective output data; and
- storing the output data from each of the models in the feature vector.
7. The system of claim 6, wherein a first set of models of the plurality of sets of models comprises a first sentence classification module and a classification module configured to generate output data relating to a first type of content of the document.
8. The system of claim 6, wherein a second set of models of the plurality of sets of models comprises structural classification module, a linguistic modality classification module, and a classification module configured to generate output data relating to a second type of content of the document.
9. The system of claim 6, wherein a third set of models of the plurality of sets of models comprises a second sentence classification module and a classification module configured to generate output data relating to a third type of content of the document.
10. The system of claim 1, wherein determining the adjudication classification comprises determining whether the document meets commercial substance criteria.
11. The system of claim 1, wherein determining the adjudication classification and the adjudication confidence score comprises applying an adjudication reconciliation data processing operation based on data associated with the identified second feature vector and based on the computed respective changes for the plurality of characteristics.
12. A non-transitory computer-readable storage medium storing instructions for classifying documents, the instructions configured to be executed by one or more processors to cause the system to:
- receive data representing a document;
- apply one or more natural language processing techniques to the received data to generate a feature vector representing the document;
- identify, based on the feature vector, a second feature vector from a case library based on a similarity to the feature vector;
- apply a plurality of models to the feature vector to compute respective changes for a plurality of characteristics represented by the document; and
- determine, based on the identified second feature vector and based on the computed respective changes for the plurality of characteristics, an adjudication for the document, wherein the adjudication comprises an adjudication classification and an adjudication confidence score.
13. A method for classifying documents, wherein the method is executed by a system comprising one or more processors, the method comprising:
- receiving data representing a document;
- applying one or more natural language processing techniques to the received data to generate a feature vector representing the document;
- identifying, based on the feature vector, a second feature vector from a case library based on a similarity to the feature vector;
- applying a plurality of models to the feature vector to compute respective changes for a plurality of characteristics represented by the document; and
- determining, based on the identified second feature vector and based on the computed respective changes for the plurality of characteristics, an adjudication for the document, wherein the adjudication comprises an adjudication classification and an adjudication confidence score.
Type: Application
Filed: Jun 30, 2022
Publication Date: Jan 5, 2023
Applicant: PricewaterhouseCoopers LLP (New York, NY)
Inventors: Chung-Sheng LI (Scarsdale, NY), Winnie CHENG (West New York, NJ), Mark John FLAVELL (Madison, NJ), Lori Marie HALLMARK (Xenia, OH), Nancy Alayne LIZOTTE (Saline, MI), Kevin Ma LEONG (Randolph, NJ)
Application Number: 17/854,337